Lab-6 Data Visualization / Matplotlib Library Usage for Data Visualization 6.1: Objective To understand Data Visualization To understand different Types of Charts To use Matplotlib library for plotting graphs 6.2: Data Visualization: 6.3: Types of Graphs 6.3.1: Histogram 6.3.2: Pie Chart 6.3.3: Bar Chart 6.3.4: Line Graph 6.3.5: Bubble Chart 6.4: Usage of Matplotlib library: Simple Plot: x = np.arange(0, math.pi*2, 0.05) y = np.sin(x) plt.plot(x,y) plt.xlabel("angle") plt.ylabel("sine") plt.title('sine wave') plt.show() Screenshot: Pie Chart: Code: from matplotlib import pyplot as plt import numpy as np fig = plt.figure() ax = fig.add_axes([0,0,1,1]) ax.axis('equal') # Array of Languages langs = ['C', 'C++', 'Java', 'Python', 'PHP'] students = [23,17,35,29,12] ax.pie(students, labels = langs,autopct='%1.2f%%') # show command for ploting plt.show() Screenshot: Output: Histogram: Code: matplotlib import pyplot as plt import numpy as np fig,ax = plt.subplots(1,1) a = np.array([22,87,5,43,56,73,55,54,11,20,51,5,79,31,27]) ax.hist(a, bins = [0,25,50,75,100]) ax.set_title("histogram of result") ax.set_xticks([0,25,50,75,100]) ax.set_xlabel('marks') ax.set_ylabel('no. of students') plt.show() Screenshot: Output: Line Graph: Code: import matplotlib.pyplot as plt import numpy as np import math x = np.arange(0, math.pi*2, 0.05) y = np.sin(x) #plt.plot(x,y) plt.plot(x, np.sin(x)) plt.plot(x, np.cos(x)); plt.xlabel("angle") plt.ylabel("sine and Cos") plt.title('sine and Cos wave') plt.show() Screenshort Output: Task 1: 1- Draw histogram any column of the dataset you have selected in previous lab using Matplotlib command Code: Screenshot: Task 2 Why are scatter plots used? Provide example in the form of code using Matplotlib library? Show the screenshots of the results Scatter Plot: A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two different numeric variables. The position of each dot on the horizontal and vertical axis indicates values for an individual data point. Scatter plots are used to observe relationships between variables. Code import matplotlib.pyplot as plt x = [5,7,8,7,2,17,2,9,4,11,12,9,6] y = [99,86,87,88,111,86,103,87,94,78,77,85,86] plt.scatter(x, y) plt.show() Screenshot: Task 4 What is a Box-whisker plot? Provide an example of the area of your interest? Use python code to draw a box-whisker plots. Box-Whisker plot: You can use "box and whisker plot" in the real world when you are trying to compare something with another. A box and whisker plot ends the upper and lower quartiles, so the box spans the inter quartile range. The median is marked by a vertical line inside the box. Code: # Import libraries import matplotlib.pyplot as plt import numpy as np # Creating dataset np.random.seed(10) data = np.random.normal(100, 20, 200) fig = plt.figure(figsize =(10, 7)) # Creating plot plt.boxplot(data) # show plot plt.show() Screenshot: Lab#5 Data manipulation in Python using Pandas 5.1: Objective To understand usage of Pandas Use of Pandas on dataset in CSV file To use pandas for data manipulation To use Pandas for data analysis 5.2: Pandas Introduction: Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures. The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data. 5.3: Using Pandas with CSV files CSV (comma-separated value) files are a common file format for transferring and storing data. The ability to read, manipulate, and write data to and from CSV files using Python is a key skill to master for any data scientist or business analysis. In this post, we’ll go over what CSV files are, how to read CSV files into Pandas DataFrames, and how to write DataFrames back to CSV files post analysis. Pandas is the most popular data manipulation package in Python, and DataFrames are the Pandas data type for storing tabular 2D data. 5.4: Statistics using Pandas Descriptive statistics is about describing and summarizing data. It uses two main approaches: The quantitative approach describes and summarizes data numerically. The visual approach illustrates data with charts, plots, histograms, and other graphs. You can apply descriptive statistics to one or many datasets or variables. When you describe and summarize a single variable, you’re performing univariate analysis. When you search for statistical relationships among a pair of variables, you’re doing a bivariate analysis. Similarly, a multivariate analysis is concerned with multiple variables at once. 5.5: Data Manipulation using Pandas Python is fast becoming the preferred language in data science – and for good reason(s). It provides the larger ecosystem of a programming language and the depth of good scientific computation libraries. If you are starting to learn Python, have a look at learning path on Python. Among its scientific computation libraries, I found Pandas to be the most useful for data science operations. Pandas, along with Scikit-learn provides almost the entire stack needed by a data scientist. This article focuses on providing 12 ways for data manipulation in Python Task Download a dataset of your own choice related to medical studies and apply following operations using Pandas library: 1- Show Basic statistics about dataset #!/usr/bin/env python # coding: utf-8 # In[93]: #Import Pandas import pandas as pd #Loading the data df = pd.read_csv("healthcare-dataset-strokedata.csv") df health = df # In[94]: health = df.copy() health.head() # In[95]: health.mean() # In[96]: health.median() # In[97]: health.describe() print(health) 2- Remove two columns of your choice, Show rest of the dataset Code: # drop function which is used in removing or deleting rows or columns from the CSV Ŝles health.drop('avg_glucose_level', inplace=True, axis=1) health.drop('Residence_type', inplace=True, axis=1) # display print("\nCSV Data after deleting the column :\n") print(health) # In[111]: #Handling nan values #First we will add some nan values in our iris dataset using a constant from numpy library import numpy as np health.iloc[2:4, 2:4] = np.nan health.head() # In[112] 3- Fill the non-assigned values with zeros Code: #To fill the nan entries, fillna function is used #There are several ways to fill the nan entries health.age.fillna(0, inplace=True) 4. Store your resulting CSV file using name Output in the same folder from where you picked the initial file. Code: