Uploaded by Federico Romano Gargarella

CP 2023-2024 Group 296721

advertisement
[Intro to CP 2023/2024] Group
296721
EUROPEAN
SOCCER DATA
ANALYSIS
Using Python for Data
Insights
Federico Romano Gargarella, Kayihura Herta Keza
OVERVIEW
01
Strategy
02
Approach to
the problem
and solution
03
Query 1, 2 & 3
Objectives
Results
File handling
Understand which
modules and datasets
use
04
Output
Saving data
STRATEGY
DATASET ANALYSIS
Examination of the databases structure
by reading csv files.
UNDERSTANDING QUERIES
Defined functions to solve them.
HANDLING LARGE DATASET
Efficient memory use with line-by-line reading
and selective filtering in large datasets.
MANTAINING DATA
INTEGRITY
Implementing validation checks and error
handling.
DATA SERIALIZATION
Applied ‘pickle’ for saving data, facilitating efficient
storage of processed results.
FILES HANDLING
01
with open (‘Player.csv’, ‘r’) as file:
02
reader = csv.DictReader(file)
03
for row in reader:
04
Open file
Read file
Iterate to extract necessary information
PURPOSE
Parse and extract specific data fields efficiently,
crucial for running our analysis queries
QUERY 1
Write a Python script that calculates the player whose overall rating
improved the most between two consecutive timestamps.
Data analysis
Step 1
Step 2
Result
Extract player data
and ratings over time
Calculate the
improvement
percentage for each
player
Identify the player
with the highest
improvement
percentage
Display the name of
the most improved
player, the date
range, and the
improvement
percentage.
QUERY 1 INSIGHT
QUERY 2
Find the match with the highest number of fouls for each league.
Data analysis
Step 1
Load league
and match data
Calculate the match
with the highest
number of fouls in
each league
Step 2
Mapping league
IDs to names
Result
Prepare output
and save
results
QUERY 2 INSIGHT
QUERY 3
Determine the season winners for each season in the Bundesliga.
Data analysis
Step 1
Step 2
Load Bundesliga
match data
Calculate points for
each team per
season
Determine the team
with the highest
points for each
season.
Result
List the Bundesliga
season winners for
each season.
QUERY 3 INSIGHT
OUTPUT
PICKLE MODULE
- Utilized Python's pickle module for
data serialization from different
queries, ensuring data integrity.
EFFICIENT DATA SERIALIZATION
-Employed ‘pickle.dump’ for effective
serialization and storage of query
results.
BINARY MODE
- Opened files in write-binary ('wb') mode to
accurately save binary data, critical for
maintaining the data structure and format.
DATA MANAGEMENT
-By serializing data into .pkl files,
facilitated access and sharing.
Download