Uploaded by Mounir Khaoulaf

Pandas Exercises: Data Manipulation & Analysis

advertisement
Pandas Exercises
No category
Today 19:56
Got it! Here are exercises purely focused on
practicing **pandas**, covering various
concepts like data manipulation,
aggregation, filtering, and visualization.
These will help you reinforce your crash
course knowledge.
--### **Exercise 1: Data Exploration**
1. Load the dataset `sample_data.csv` into
a DataFrame using pandas.
2. Perform the following operations:
- Display the first 10 rows.
- Check the shape of the DataFrame.
- Get the data types of each column and
check for null values.
3. Generate summary statistics for
numerical and categorical columns
separately.
--### **Exercise 2: Data Cleaning**
1. Load the `dirty_data.csv` dataset, which
contains missing values and duplicates.
2. Perform the following operations:
- Drop duplicate rows.
- Replace missing values in numerical
columns with the mean.
- Fill missing values in categorical
columns with the mode.
3. Rename a column (e.g., rename
`old_column_name` to
`new_column_name`).
4. Drop a specific column that is not useful
for analysis.
--### **Exercise 3: Data Filtering and
Indexing**
1. Load the `sales_data.csv` dataset.
2. Perform the following filtering tasks:
- Extract rows where `sales` are greater
than 5000.
- Filter rows where the `region` column is
`'North'` and `profit` is greater than 10%.
- Select only the `product`, `sales`, and
`profit` columns.
3. Use `.iloc` and `.loc` to:
- Select the first five rows of the
DataFrame.
- Extract a subset of rows and columns
based on conditions.
--### **Exercise 4: Grouping and
Aggregation**
1. Load the `employee_data.csv` dataset.
2. Perform the following tasks:
- Group employees by their department
and calculate the average salary for each
department.
- Count the number of employees in each
job role.
- Find the total salary expenditure for each
department.
3. Reset the index of the grouped results for
easier readability.
--### **Exercise 5: Data Transformation**
1. Load the `transactions.csv` dataset.
2. Perform the following transformations:
- Create a new column `total_price` by
multiplying `quantity` by `price`.
- Convert the `order_date` column to a
datetime format and extract the month.
- Create a column `category` based on
`sales` values:
- `High` for sales > 10,000.
- `Medium` for sales between 5,000 and
10,000.
- `Low` for sales < 5,000.
--### **Exercise 6: Time Series Data**
1. Load a dataset with a date column (e.g.,
`time_series_data.csv`).
2. Perform the following:
- Set the date column as the index.
- Resample the data to monthly or weekly
intervals and calculate the mean for each
interval.
- Filter data for a specific year (e.g., 2023).
3. Plot the data using pandas built-in
`.plot()`.
--### **Exercise 7: Merging and Joining**
1. Load two datasets: `customers.csv` and
`orders.csv`.
2. Perform the following:
- Merge the datasets on the `customer_id`
column to create a combined DataFrame.
- Use an inner join to get only customers
who placed orders.
- Use a left join to get all customers, even
those without orders.
3. Identify customers who haven't placed
any orders.
--### **Exercise 8: Pivot Tables**
1. Load the `sales_data.csv` dataset.
2. Create a pivot table to:
- Summarize the total sales by `region`
and `product`.
- Calculate the average profit by `category`
and `month`.
3. Use different aggregation functions like
`sum`, `mean`, or `count`.
--### **Exercise 9: Visualization with
Pandas**
1. Load any dataset of your choice.
2. Use pandas to create the following plots:
- A line plot for time series data.
- A bar plot showing total sales by
category.
- A histogram of a numerical column (e.g.,
`age`).
3. Customize the plots with titles, labels,
and colors.
--### **Exercise 10: Challenge: EDA with
Pandas**
1. Load a dataset like `titanic.csv` or a
dataset of your choice.
2. Perform an exploratory data analysis
(EDA) that includes:
- Summary statistics.
- Distribution of numerical columns (e.g.,
age, fare).
- Analysis of categorical columns (e.g.,
survival rate by gender).
- Visualization of correlations (using
`.corr()` and a heatmap).
--These exercises focus entirely on
**pandas** functionality, helping you
master its core concepts. Let me know if
you'd like datasets or hints to solve these!
Download