Exploratory Data Analysis – Laboratory December 31th , 2024 Belo Matos – Up202309078 INDEX Abstract............................................................................................................................................................................... 3 Introduction ........................................................................................................................................................................ 3 Pass Network ...................................................................................................................................................................... 4 Croatia ............................................................................................................................................................................ 4 France ............................................................................................................................................................................. 5 Pass Analysis ....................................................................................................................................................................... 6 Pass analysis by player ................................................................................................................................................... 6 Progressions into final third ........................................................................................................................................... 6 Team Pressure .................................................................................................................................................................... 8 Player Pressure Zone .......................................................................................................................................................... 9 France ............................................................................................................................................................................. 9 Croatia ..........................................................................................................................................................................10 Player Events on the ball ..................................................................................................................................................11 Shots analyse ....................................................................................................................................................................12 Shot map ......................................................................................................................................................................13 France ...........................................................................................................................................................................13 Croatia ..........................................................................................................................................................................14 Conclusions .......................................................................................................................................................................15 APPENDIX 1 – Teams lineup..............................................................................................................................................16 APPENDIX 2 – Pass Analysis ..............................................................................................................................................17 APPENDIX 3 – Individual Pressure Zones ..........................................................................................................................21 2 ABSTRACT This assignment is part of an academical exercise, within the Laboratory class of the Master’s program in Data Analytics at Faculdade de Economia do Porto (FEP). This assignment focusses on Exploratory Data Analysis (EDA), summarizing main characteristics of the data through statistical measures and visualizations. This report focuses on exploring and visualizing Statsbomb data related to the 2018 World Cup final between France and Croatia. The aim of the study is to analyse the in-depth match statistics, to identify and visualize key events and trends, that contributed to the match outcome. Using advanced data visualization techniques, the report investigates player performance, team tactics, and critical match events, such as goals, shots, passes, and team and individual player pressure. By leveraging this detailed match data, the analysis aims to provide a deeper understanding of how specific moments and strategies shaped the result, providing insights into the performance dynamics of both teams during the match. Keywords: Exploratory data analysis, python, data analysis, data visualization, football, Statsbomb, mpl soccer INTRODUCTION This report aims to demonstrate an understanding of Exploratory Data Analysis (EDA) by applying the concept to the specific topic of football data analysis. For this reason, the process will slightly diverge from traditional EDA methods, as the dataset, which captures football match events, lends itself to a more tailored approach. Despite this divergence, the analysis will follow core EDA principles to uncover patterns, trends, and insights. The dataset used in this analysis is provided by Hudl Statsbomb, a company founded in 2017 that specializes in collecting detailed data from football matches, covering a wide range of events. Over the past few years, data analysis has gained significant importance across various industries, and football is no exception. Statsbomb, in its effort to shape the future of football data analytics, provides free datasets and analyses to foster a new generation of football analysts and researchers. The expected key findings and results of this assignment are centered around the interpretation and analysis of the data, as well as an understanding of its potential impact on the future of football. By analyzing the dataset, it is possible to uncover valuable insights that could contribute to a deeper understanding of the game. These insights may help to identify patterns, strategies, and key factors influencing match outcomes, ultimately providing new opportunities for improving player performance, team tactics, and decision-making processes in football. Additionally, the analysis of such data can speed up video analysis, as it provides a general understanding of the opposition's strategies and tactics. By examining the data, teams can quickly identify patterns and key moments from the opposition, allowing for more efficient preparation and decision-making. 3 | ENTITY-RELATIONSHIP MODEL PASS NETWORK The pass network diagram provides a quick but deep understanding of the team formation, beyond the basic lineup. Presenting the average position of the player when receiving and passing the ball, also highlighting the most frequent player connections and combinations, we can get an understanding of tactical approach, ball circulation patterns, key players. We will analyze both teams’ diagram, and compare them, getting a good understanding of the game, and tactical approaches and adaptations. Both teams’ lineup, are presented in Appendix 1 – Teams Lineups. Croatia Ilustração 1 Croatia pass network It’s very clear the 4-3-3 formation, with inverted wingers (playing inside). The defensive line’s average position also indicates high pressing and or possession-based approach. • Observing the goalkeeper, we can assume he didn’t have many goal kicks, which can imply game control, as the other team had difficulty reaching their box. • The defensive line is advanced, with the right center back, receiving and passing the ball close to midfield line. • The right and left backs very projected, especially the left-back, creating width. The CDM controlled the game tempo, connecting to the other midfielders, who were both very active during the build up play. The game passed a trough the right side of the Croatian team, with the right back, also being very active. Event ought it appear they dominated and controlled the game, they add difficulty’s on breaking the French defensive line, and connecting to the attackers, specially reaching the striker, who had to drop lower to receive and pass the ball. The relatively sparse connections to the front three suggest France effectively prevented vertical progression 4 | ENTITY-RELATIONSHIP MODEL France Ilustração 2 France pass network We can observe the 4-2-2 formation, with the right midfielder playing very wide, the opposite happens on the left side, with the left midfielder playing a lot more inward. We can observe a lot more compact team, playing with a lower defensive line, with a more direct approach. The goalkeeper participated a lot more, compared to the Croatia goalkeeper, we can observe he was able to find the forwards, or even play the ball directly on the right wing, taking advantage of a left back projected on the flank. Based on this we could assume they played on a counterattack. Playing the ball directly on the left center forward, attracting the center defender, for the right center forward, to look for the second ball, finding empty spaces. The French game passed a lot by the RDM, serving as the primary distributor, who was able to connect to directly with the all the different players. There is one connection, that is very interesting, and that we will dive deeper later, the connection between Pogba and Mbappé. Has mentioned before, having the Croatian left back very projected, there is a lot of space for the right winger to explore, especially with Pogba vision and pass quality, combined with Mbappé’s pace, we can foreshadow many progressive passes or line breaking passes between these two players. As mentioned, the Croatian left back was very projected, but did not participate much in the attack, this could be the reason for that tactical adjustment. The LDM played a lot closer to the left side, and to the center back, what could indicate higher pressure zone from the opposition. 5 PASS ANALYSIS The passing statistics provide clear evidence of the contrasting tactical approaches and further demonstrate and confirm our suspicions relatively to the game. From one side a possession-based approach versus counter-attacking strategy. Croatia with no surprise, completed more than the double successful passes then France, with a very high successful rate, highlighting clear commitment to building play through controlled possession. The high accuracy, a high pass volume, correlates with the passing network, where we observed their advanced positioning and ball circulation emphasis. In contrast, France direct approach, can be reflected on the 202 successful passes, with 71% success percentage. This fact could indicate a more direct approach as mentioned before, and/or difficulty to build up and control the game, due to Croatia high pressure, and tactical superiority. I would like to highlight the difference between incomplete passes is equal to 18, when Croatia had more 266 passes. Tabela 1 Pass outcome by team Pass Outcome Croatia France Incomplete Out Offside Successful Total Passes 91 73 12 9 0 1 448 202 551 285 Successful passes % 81.31% 70.88% Pass analysis by player Observing Table 2 and Table 3, located on Appendix 2 pass analysis, provide more insight into each player role, and both team’s tactical approach. As mentioned previously, Paul Pogba was the primary distributor, ending the match as the French player with more passes attempts and successful passes registered, with 93% complete rate. On the other end, Kanté with only 11 passes tried and only 63% success rate, suggest a more defensive role, and can indicate Croatian pressure on that zone of the pitch. Would like to highlight, Nzonzi, who entered the pitch on the second half, finished with 93% percent rate, conserving ball possession, only missing 1 pass out of 14. On the Croatian side, Brozovic finished the match with 99 passes attempts, completing 87 (most passes completed and attempted of the match), with 88% of success rate, which illustrates his amazing work at controlling the game tempo, and as a safe option to receive and pass the ball. Luka Modric and Rakitic, also where very active in the game, finishing with 66 (87% success rate) and 53 successful passes (82% success rate), respectively. Progressions into final third For the progressions into the final third, we will consider passes done before entering the final third (x<80) and that finish after the final third (x>80). A progressive pass or carry takes a team closer to the opponents, we will not define a minimum pass distance. The progressive passes and carries are presented on Appendix 2 – Pass analysis. Paul Pogba was the French player with most progressive passes (6), illustrating is pass accuracy and vision. We mentioned previously a potential link up, between Pogba and Mbappé, who was the receiver of 6 progressive passes (max progressive passes received alongside with Griezmann for France), 4 of those done by Pogba. Mbappé also finished the match with the most successful carries into final third (3) out of the French players. 6 Ilustração 3 Paul Pogba progressive passes into final third & Mbappé progressive passes received in final third Ivan Rakitic finished the match with the most progressive carries of the match (5) and second most progressive passes (12), one behind Luka Modric (13, further illustrating team and midfield dominance by the Croatian side. Vrsaljko was the most solicitated player of the match, regarding progressive passes, with 11 passes received. Ilustração 4 Rakitic progressive passes into final third & Vrsalijkp progressive passes receive into final third 7 | APPENDIX TEAM PRESSURE Pressure is an event type classified by Stastsbomb as “Applying pressure to an opposing player who’s receiving, carrying or releasing the ball.”. This allows to uncover and analyze team pressure, getting a deeper understanding of the tactical approach, for a given match or team. First, we will present and analyze both teams’ pressure heatmap, for a pitch divided into a six-by-three grid with a central strip as wide as the six yard box, then dividing the pitch even further, providing a more detailed understanding of the pressure applied by both teams. Ilustração 5 Pressure Heatmap (6 by 3 grid) Croatia shows an aggressive, high-pressing approach, which aligns with their possession-based, high line tactical setup. The highest-pressure zone for the Croatian team corresponds to the right attacking zone, where we discovered previously a very participative right back, Vrsalijko, who received the most progressive passes of the match. The Croatian team presents a strong pressing on both flanks on the final third. Defensively having more trouble containing the right side of French game, Kylian Mbappé, very participative on the game, with the most progressive passes received and successful carries into the final third, by the French team. On the other side, France’s pressure pattern reveals a more defensive-minded approach, that supports a counterattacking strategy, pressing deeper. The most intense pressure (14%) is on the left flank, which now confirms our initial suspicions, where we saw previously Kante, having a lower positioning, receiving the ball close to the center back. There is also significant pressure in the left midfield area (13%), on the attacking third the pressure was well distributed. Event ought on the Croatian heatmap, the pressure was applied on both flanks offensively, the pressure by France on the right flank was lower, also indicating higher pressure strategy on one flank, but also emphasizing players characteristics, (Kante, Hernandez and Matuidi are more intense, then Pogba, Mbappé and Pavard). 8 | APPENDIX Ilustração 6 Pressure Heatmap (6 by 4 grid) Croatia confirms the high pressing tactical approach, with notable concentration in advanced areas of the field, with 31% of the team pressure on the final third of the pitch. This fact could emphasize the inverted winger’s approach, not letting France connect through the middle. High pressing on the right midfield zone, where was noticeable a Center back playing close to the midfield line, or the pressure zone for the CDM. France demystifies the pressure zone along the right wing, but highlights a pressure zone on the central midfield, and on the left. Analyzing individual player pressure will help understand exactly were the first phase of the France pressure was done, providing a clear image of Croatia match dominance. PLAYER PRESSURE ZONE Having established a comprehensive understanding of both teams' tactical approaches through pass networks, passing statistics, and team pressure patterns, we can now examine the individual player pressure zones. This analysis will provide deeper insights into how each player's pressing responsibilities contributed to their team's overall strategic framework. The individual pressure zones can be found on Appendix 3 – Individual Pressure Zones. France Forwards • • Midfielders • • 9 Giroud left striker forward, presents moderate pressing across middle left third, having his most intense pressure zone on the defensive midfield. Griezmann right striker forward, intensive pressing in central attacking areas, pressing on the half line, but also close to his own box, demonstrated high pressing capabilities across multiple zones. Matuidi left midfielder, intense pressing along the left flank, particularly in defensive and middle thirds. Kanté left defensive midfielder, strong pressing around midfield, but emphasizing on the felt flank, where we saw most pressure was located. | APPENDIX • • Defenders Pogba right defensive midfielder, higher pressing compared to Kante, pressured more on the right midfield. Mbappé right midfielder, pressured mostly on the right flank, in attacking and middle thirds. Also pressured on his own defensive touch line. • • • • Key patterns: • • • • Hernández left back, strong pressing along left defensive flank, also emphasizing is high pressing capabilities, pressing of the midfield line. Umtiti left center back, pressed on the defensive line, but also managing to press on the midfields. Had to press in the felt touch line. Varane right center back, limited pressing, maintaining defensive position, playing more patiently allowing Umtiti to pressure higher. Pavard right back, concentrated pressing in right defensive zone, pressing lower than Hernandez. Clear asymmetric pressing structure favoring left side (Matuidi-Hernández-Kanté), we can consider these the zone was Croatia pressed higher. Probably a tactical response to Croatia’s right side build up preference. Midfield pressing zones show clear tactical division of responsibilities, with the attackers pressing across the pitch, also pressing low (demonstrating Croatia game control) Mbappe was the player pressing on the right flank and pressing high. Conditioning the first phase of build up, by the right side Umtiti played as a more pressive center back then Varane, who maintained deeper positioning compensating Umtiti aggressiveness. Croatia Forwards • • • Midfielders • • • Perišić left winger, pressing concentrated across all the left flank, more intense around midfield line. Mandžukić striker forward, Active pressing across the front line with emphasis on central areas, more emphasizis on right midfield zone, possible game part where France dominated, pressing initial build up. Rebić right winger, high pressing intensity on the right side of attack, but also pressing in the middle and left side. High pressing capability. Rakitić left central midfielder, Strong pressing in the middle third, particularly in central and right zones. Pressed very high, conditioning the first phase of build up Brozović center defensive midfielder, Focused pressing in the left-central areas of midfield, also controlling and pressing on the right side, where the right back was very projected. Modrić right central midfielder, intensive pressing across the right half-space and central areas acrros the final third, showing his box-to-box role. Defenders • Strinić left back, strong pressing along left defensive flank, also emphasizing is high pressing capabilities, pressing of the midfield line and final third, demonstrating is very projected role. • Vida left center back, limited pressing, maintaining defensive positioning. • Lovren right centerback, pressing in central defensive areas, pressing also very high on the midfield line. • Vrsaljko right back, intense pressing on the right flank, in defensive and middle thirds. Key patterns: • • • • 10 High intensity pressing across all areas, reflecting their possession-dominant approach, with the midfielders pressing high on the final third. Midfield trio showed comprehensive pressing coverage Strong right-side pressing through Vrsaljko-Modrić-Rebic combination Lovren played as a more pressive center back, also being very involved on the first phase build up. | APPENDIX PLAYER EVENTS ON THE BALL Now we will compare players, events on the ball (pass, ball receipt, carry, clearence, foul won, block, ball recovery, duel, dribble, interception, miscontrol and shot). This will further demonstrate game control and tactical plan for both teams and players. Ilustração 7 Player events comparison Kanté and Pogba France midfielders had a different game plan, and approach. Kante acting as a defensive midfielder and Pogba playing more advanced. Kante was on the pressured side of France group of players, trapped on the left flank, where his actions were concentrated where 66% of his actions happened. This is far from his box-to-box typical role, where he can appear pressuring or receiving the ball across all the pitch. Paul Pogba played a freer role, appearing across all the pitch, still played most of the game on his own half, 64% of his actions where on his own half. Due to his vision and passing, this works for him playing as a launcher of counter attacks, with is long ball accuracy, serving in this case Mbappé on the right wing. 11 | APPENDIX Ilustração 8 Player events comparison Rakitic and Modric Rakitić spent most of this time in the central zones near, the halfway line with 18% of his touches in both left central defensive and attacking midfield. Was very present on the left flank (44% of his touches), 14% of those near the attacking midfield zone. Luka Modrić also dominated the right central zones, the highest percentage of his touches in the attacking central midfield area (18%). Appeared on the left midfield zone (12%), playing a lot closer to middle of the pitch, despite also appearing on the wings. The two players contrast with the France approach, where Kante and Pogba, had very different missions, playing on different parts of the pitch. The Croatian duo was very similar, in terms of events on the ball, with slight differences, with Modrić playing almost across all midfield zone, reaching the final third more often than Rakitić, who played a little bit closer to the wing and on build up play. SHOTS ANALYSE The match between France and Croatia ended with a 4-2 victory for France. This section analyzes the goals scored, shot efficiency, visualizing the number of shots for both teams, with the respective expected goals for each team. France Player name Total Shots Goal Antoine Griezmann 2 1 Kylian Mbappé Lottin 2 1 Nabil Fekir 1 0 Olivier Giroud 1 0 Paul Pogba 2 1 Tabela 2 Shots and goals by French player 12 | APPENDIX This aligns with Croatia dominating the game (event ought the result is the opposite), with France creating less chances, this demonstrates effective attack and high shot-to-goal conversion ratio. France only had 8 shots, 6 of them belonging to attacking players, that ended up in 2 goals (Griezmann and Mbappé). The other goals were scored by Pogba, with Mandžukić scoring one own goal. Croatia Player name Total Shots Goal Ante Rebić 3 0.0 Dejan Lovren 2 0.0 Domagoj Vida 2 0.0 Ivan Perišić 2 1.0 Ivan Rakitić 3 0.0 Mario Mandžukić 1 1.0 Šime Vrsaljko 2 0.0 Tabela 3 Shots and goals by Coratian player Croatia dominated the game with 15 shots, only scoring 2 goals through Perišić and Mandžukić. Rakitić and Rebić, were the players with more shots of the match, both with 3 shots. This demonstrates Croatia game dominance but low shot-to-goal conversion ratio. Shot map The shot map below provides a detailed visual analysis of attacking performance, highlighting the location and quality of their attempts on goal during the match. Each marker represents a shot, with the size of the marker representing the expected goal of the shot, also with distinctions for goal or no goal. The accompanying expected goals (xG) value quantifies the quality of these chances, offering insight into how effective the attack was in converting opportunities into goals. France Ilustração 9 France shot map France’s shot distribution reveals a reliance on long-distance attempts, with 5 out of their 8 shots coming from outside the box, highlighting challenges in breaking down Croatia’s defense to create high-quality chances closer to 13 | APPENDIX goal. Despite this, France demonstrated confidence in long-range shooting, converting 2 goals from their 5 longdistance shots, showcasing either exceptional precision or a breakdown in Croatia’s defensive pressure. The expected goals (xG) value of 1.1 reflects the modest quality of chances France created during the match, suggesting they were expected to score approximately one goal. However, they exceeded expectations by scoring four goals, including one via a Mandžukić own goal. This suggests a combination of clinical finishing, fortunate circumstances, and potentially gaps in Croatia’s defensive setup. Including metrics like expected saves by the goalkeeper could provide deeper insights into the goalkeeper’s performance and whether the goals scored were preventable. A critical moment in the match was the penalty, which carried an xG value of 78.35%, representing a high probability scoring opportunity. This penalty shot accounted for a significant portion of France’s xG and contributed to their eventual victory. Croatia Ilustração 10 Croatia shot map The shot map for Croatia demonstrates a more focused attacking approach, with many attempts coming from inside the box (10 out of 5). A significant portion of Croatia's shots were taken from high-probability areas inside the box, with the best chance (56% xG), ending up in goal. The total xG of 1.48 reflects the higher quality of chances created by Croatia compared to France (1.1 xG). This indicates that Croatia's attack was more structured and focused on high-quality opportunities. Despite the higher xG value, Croatia managed to score two goals. Chance Creation: Croatia created better chances with a higher xG (1.48 vs. 1.1), but their conversion rate was lower. Shooting Range: Unlike France, which relied more on long-distance efforts, Croatia focused on close-range opportunities. Outcome vs. xG: France overperformed their xG significantly, a key factor in the match outcome. 14 | APPENDIX CONCLUSIONS The analysis revealed stark contrasts in tactical approaches between France and Croatia. Croatia's possession-based, high-pressing 4-3-3 system (81.31% pass completion, 551 total passes) dominated territorial metrics but proved less efficient than France's pragmatic 4-2-2 counter-attacking strategy. France's clinical finishing, converting 4 goals from 8 shots despite a lower xG (1.1 vs 1.48), demonstrated how tactical efficiency can overcome statistical dominance. This football analytics project provided valuable insights into exploratory data analysis principles: 1. Multi-dimensional Analysis: Examining pass networks, pressure patterns, and shot data demonstrated how different data dimensions can reveal tactical narratives. 2. Sports-Specific Data Challenges: Football data analysis required unique approaches compared to traditional datasets, particularly in spatial analysis and event sequencing. 3. Role-Based Analysis: The pressure analysis revealed how individual player roles contribute to team tactics, showing the importance of granular data examination. Several areas needed deeper investigation that could enhance tactical knowledge and player performance analysis. This project served as an introduction to football analytics, an emerging field where data science principles transform traditional match analysis. The project proved valuable beyond football analysis, enhancing broader data analysis capabilities through: 15 • Practical application of EDA techniques • Development of Python data analysis skills • Understanding of complex data visualization approaches • Experience with multi-dimensional data interpretation | APPENDIX APPENDIX 1 – TEAMS LINEUP Ilustração 11 Croatia Lineup 16 | APPENDIX Ilustração 12 France Lineup APPENDIX 2 – PASS ANALYSIS Pass Outcome 17 Incomplete Out Pass Offside Total Passes Successful % Andrej Kramarić 0.0 0.0 10.0 10.0 100.00 Marcelo Brozović 11.0 1.0 87.0 99.0 87.88 Domagoj Vida 4.0 2.0 42.0 48.0 87.50 Luka Modrić 9.0 1.0 66.0 76.0 86.84 | APPENDIX Dejan Lovren 8.0 1.0 55.0 64.0 85.94 Ivan Rakitić 11.0 1.0 53.0 65.0 81.54 Danijel Subašić 2.0 0.0 8.0 10.0 80.00 Šime Vrsaljko 14.0 1.0 58.0 73.0 79.45 Mario Mandžukić 6.0 1.0 21.0 28.0 75.00 Ivan Strinić 8.0 0.0 20.0 28.0 71.43 Marko Pjaca 0.0 1.0 2.0 3.0 66.67 Ante Rebić 4.0 2.0 8.0 14.0 57.14 Ivan Perišić 14.0 1.0 18.0 33.0 54.55 Tabela 4 Pass outcome by French player Pass Outcome Incomplete Out Successful 0.0 Pass Offside 0.0 14.0 Total Passes 15.0 Successful % 93.33 Steven N''Kemboanza Mike Christopher Nzonzi Paul Pogba 1.0 5.0 0.0 0.0 29.0 34.0 85.29 Samuel Yves Umtiti 2.0 2.0 0.0 17.0 21.0 80.95 Raphaël Varane 4.0 1.0 0.0 18.0 23.0 78.26 Blaise Matuidi 4.0 2.0 0.0 18.0 24.0 75.00 Antoine Griezmann 8.0 0.0 0.0 18.0 26.0 69.23 Lucas Hernández Pi 9.0 1.0 0.0 22.0 32.0 68.75 Hugo Lloris 9.0 0.0 0.0 16.0 25.0 64.00 N''Golo Kanté 4.0 0.0 0.0 7.0 11.0 63.64 Benjamin Pavard 8.0 1.0 1.0 17.0 27.0 62.96 Olivier Giroud 8.0 1.0 0.0 14.0 23.0 60.87 Corentin Tolisso 2.0 0.0 0.0 3.0 5.0 60.00 Kylian Mbappé Lottin 6.0 1.0 0.0 8.0 15.0 53.33 Nabil Fekir 3.0 0.0 0.0 1.0 4.0 25.00 Tabela 5 Pass outcome by Croatian player 18 | APPENDIX Ilustração 13 Progressions into final third by French players Ilustração 14 Progression into final third by French players (Pitch) 19 | APPENDIX Ilustração 15 Progressions into final third by Croatian players Ilustração 16 Progression into final third by Croatian players (Pitch) 20 | APPENDIX APPENDIX 3 – INDIVIDUAL PRESSURE ZONES 21 | APPENDIX 22 | APPENDIX Ilustração 17 France individual player pressure Ilustração 18 Croatia individual player pressure 23