Uploaded by sawera.naseer171780276

lab 6

advertisement
Lab-6
Data Visualization / Matplotlib Library Usage for Data Visualization
6.1: Objective

To understand Data Visualization

To understand different Types of Charts

To use Matplotlib library for plotting graphs
6.2: Data Visualization:
6.3: Types of Graphs

6.3.1: Histogram

6.3.2: Pie Chart

6.3.3: Bar Chart

6.3.4: Line Graph

6.3.5: Bubble Chart
6.4: Usage of Matplotlib library:
Simple Plot:
x = np.arange(0, math.pi*2, 0.05)
y = np.sin(x)
plt.plot(x,y)
plt.xlabel("angle")
plt.ylabel("sine")
plt.title('sine wave')
plt.show()
Screenshot:
Pie Chart:
Code:
from matplotlib import pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
ax.axis('equal')
# Array of Languages
langs = ['C', 'C++', 'Java', 'Python', 'PHP']
students = [23,17,35,29,12]
ax.pie(students, labels = langs,autopct='%1.2f%%')
# show command for ploting
plt.show()
Screenshot:
Output:
Histogram:
Code:
matplotlib import pyplot as plt
import numpy as np
fig,ax = plt.subplots(1,1)
a = np.array([22,87,5,43,56,73,55,54,11,20,51,5,79,31,27])
ax.hist(a, bins = [0,25,50,75,100])
ax.set_title("histogram of result")
ax.set_xticks([0,25,50,75,100])
ax.set_xlabel('marks')
ax.set_ylabel('no. of students')
plt.show()
Screenshot:
Output:
Line Graph:
Code:
import matplotlib.pyplot as plt
import numpy as np
import math
x = np.arange(0, math.pi*2, 0.05)
y = np.sin(x)
#plt.plot(x,y)
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x));
plt.xlabel("angle")
plt.ylabel("sine and Cos")
plt.title('sine and Cos wave')
plt.show()
Screenshort
Output:
Task 1:
1- Draw histogram any column of the dataset you have selected in previous lab
using Matplotlib command
Code:
Screenshot:
Task 2
Why are scatter plots used? Provide example in the form of code using Matplotlib library? Show
the screenshots of the results
Scatter Plot:
A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two different
numeric variables. The position of each dot on the horizontal and vertical axis indicates values for
an individual data point. Scatter plots are used to observe relationships between variables.
Code
import matplotlib.pyplot as plt
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
plt.scatter(x, y)
plt.show()
Screenshot:
Task 4
What is a Box-whisker plot? Provide an example of the area of your interest? Use python code to
draw a box-whisker plots.
Box-Whisker plot:
You can use "box and whisker plot" in the real world when you are trying to compare something
with another. A box and whisker plot ends the upper and lower quartiles, so the box spans the
inter quartile range. The median is marked by a vertical line inside the box.
Code:
# Import libraries
import matplotlib.pyplot as plt
import numpy as np
# Creating dataset
np.random.seed(10)
data = np.random.normal(100, 20, 200)
fig = plt.figure(figsize =(10, 7))
# Creating plot
plt.boxplot(data)
# show plot
plt.show()
Screenshot:
Lab#5
Data manipulation in Python using Pandas
5.1: Objective




To understand usage of Pandas
Use of Pandas on dataset in CSV file
To use pandas for data manipulation
To use Pandas for data analysis
5.2: Pandas Introduction:
Pandas is an open-source Python Library providing high-performance data manipulation and
analysis tool using its powerful data structures. The name Pandas is derived from the word Panel
Data – an Econometrics from Multidimensional data.
5.3: Using Pandas with CSV files
CSV (comma-separated value) files are a common file format for transferring and storing data.
The ability to read, manipulate, and write data to and from CSV files using Python is a key skill to
master for any data scientist or business analysis. In this post, we’ll go over what CSV files are,
how to read CSV files into Pandas DataFrames, and how to write DataFrames back to CSV files
post analysis.
Pandas is the most popular data manipulation package in Python, and DataFrames are the
Pandas data type for storing tabular 2D data.
5.4: Statistics using Pandas
Descriptive statistics is about describing and summarizing data. It uses two main approaches:

The quantitative approach describes and summarizes data numerically.

The visual approach illustrates data with charts, plots, histograms, and other graphs.
You can apply descriptive statistics to one or many datasets or variables. When you describe and
summarize a single variable, you’re performing univariate analysis. When you search for
statistical relationships among a pair of variables, you’re doing a bivariate analysis. Similarly, a
multivariate analysis is concerned with multiple variables at once.
5.5: Data Manipulation using Pandas
Python is fast becoming the preferred language in data science – and for good reason(s). It
provides the larger ecosystem of a programming language and the depth of good scientific
computation libraries. If you are starting to learn Python, have a look at learning path on Python.
Among its scientific computation libraries, I found Pandas to be the most useful for data science
operations. Pandas, along with Scikit-learn provides almost the entire stack needed by a data
scientist. This article focuses on providing 12 ways for data manipulation in Python
Task
Download a dataset of your own choice related to medical studies and apply following
operations using Pandas library:
1- Show Basic statistics about dataset
#!/usr/bin/env python
# coding: utf-8 #
In[93]:
#Import Pandas import pandas as pd #Loading the
data df = pd.read_csv("healthcare-dataset-strokedata.csv") df health = df
# In[94]:
health = df.copy()
health.head() #
In[95]:
health.mean() #
In[96]:
health.median() #
In[97]:
health.describe()
print(health)
2- Remove two columns of your choice, Show rest of the dataset Code:
# drop function which is used in removing or deleting rows or columns from the CSV Ŝles
health.drop('avg_glucose_level', inplace=True, axis=1) health.drop('Residence_type',
inplace=True, axis=1)
# display
print("\nCSV Data after deleting the column
:\n") print(health) # In[111]:
#Handling nan values
#First we will add some nan values in our iris dataset using a constant from numpy
library import numpy as np health.iloc[2:4, 2:4] = np.nan health.head() # In[112]
3-
Fill the non-assigned values with zeros Code:
#To fill the nan entries, fillna function is used
#There are several ways to fill the nan entries
health.age.fillna(0, inplace=True)
4. Store your resulting CSV file using name Output in the same folder from where you
picked the initial file. Code:
Download