Pandas Exercises No category Today 19:56 Got it! Here are exercises purely focused on practicing **pandas**, covering various concepts like data manipulation, aggregation, filtering, and visualization. These will help you reinforce your crash course knowledge. --### **Exercise 1: Data Exploration** 1. Load the dataset `sample_data.csv` into a DataFrame using pandas. 2. Perform the following operations: - Display the first 10 rows. - Check the shape of the DataFrame. - Get the data types of each column and check for null values. 3. Generate summary statistics for numerical and categorical columns separately. --### **Exercise 2: Data Cleaning** 1. Load the `dirty_data.csv` dataset, which contains missing values and duplicates. 2. Perform the following operations: - Drop duplicate rows. - Replace missing values in numerical columns with the mean. - Fill missing values in categorical columns with the mode. 3. Rename a column (e.g., rename `old_column_name` to `new_column_name`). 4. Drop a specific column that is not useful for analysis. --### **Exercise 3: Data Filtering and Indexing** 1. Load the `sales_data.csv` dataset. 2. Perform the following filtering tasks: - Extract rows where `sales` are greater than 5000. - Filter rows where the `region` column is `'North'` and `profit` is greater than 10%. - Select only the `product`, `sales`, and `profit` columns. 3. Use `.iloc` and `.loc` to: - Select the first five rows of the DataFrame. - Extract a subset of rows and columns based on conditions. --### **Exercise 4: Grouping and Aggregation** 1. Load the `employee_data.csv` dataset. 2. Perform the following tasks: - Group employees by their department and calculate the average salary for each department. - Count the number of employees in each job role. - Find the total salary expenditure for each department. 3. Reset the index of the grouped results for easier readability. --### **Exercise 5: Data Transformation** 1. Load the `transactions.csv` dataset. 2. Perform the following transformations: - Create a new column `total_price` by multiplying `quantity` by `price`. - Convert the `order_date` column to a datetime format and extract the month. - Create a column `category` based on `sales` values: - `High` for sales > 10,000. - `Medium` for sales between 5,000 and 10,000. - `Low` for sales < 5,000. --### **Exercise 6: Time Series Data** 1. Load a dataset with a date column (e.g., `time_series_data.csv`). 2. Perform the following: - Set the date column as the index. - Resample the data to monthly or weekly intervals and calculate the mean for each interval. - Filter data for a specific year (e.g., 2023). 3. Plot the data using pandas built-in `.plot()`. --### **Exercise 7: Merging and Joining** 1. Load two datasets: `customers.csv` and `orders.csv`. 2. Perform the following: - Merge the datasets on the `customer_id` column to create a combined DataFrame. - Use an inner join to get only customers who placed orders. - Use a left join to get all customers, even those without orders. 3. Identify customers who haven't placed any orders. --### **Exercise 8: Pivot Tables** 1. Load the `sales_data.csv` dataset. 2. Create a pivot table to: - Summarize the total sales by `region` and `product`. - Calculate the average profit by `category` and `month`. 3. Use different aggregation functions like `sum`, `mean`, or `count`. --### **Exercise 9: Visualization with Pandas** 1. Load any dataset of your choice. 2. Use pandas to create the following plots: - A line plot for time series data. - A bar plot showing total sales by category. - A histogram of a numerical column (e.g., `age`). 3. Customize the plots with titles, labels, and colors. --### **Exercise 10: Challenge: EDA with Pandas** 1. Load a dataset like `titanic.csv` or a dataset of your choice. 2. Perform an exploratory data analysis (EDA) that includes: - Summary statistics. - Distribution of numerical columns (e.g., age, fare). - Analysis of categorical columns (e.g., survival rate by gender). - Visualization of correlations (using `.corr()` and a heatmap). --These exercises focus entirely on **pandas** functionality, helping you master its core concepts. Let me know if you'd like datasets or hints to solve these!