Python Review Code Formatting • • • • Python uses indentation. Indenting incorrectly will cause an error. # is used to start a comment In some constructs, colons start a new block. For example, defining functions, if-then clause, for, while Example Whitespaces are ignored inside () and []. Example x=(1+2+ 3+4) list = [[1, 1, 1], [2, 5, 4], [2, 1, 5]] Import • Some Python features need to be imported by importing the libraries that contain them. Example: import matplotlib.pyplot as plt # Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python import numpy as np # It offers high-level mathematical functions and a multidimensional structure (know as ndarray) for manipulating large data sets Variables and objects Just assign a value to create a variable- there is no need to declare type Calling a name before creating it causes an error • A= 3 • A= [1, 2, 3] • A = ‘Text’ Assignment creates references, not copies • A = [1, 2, 3] • B= A • A[0] = 5 • Print (B) # B is [5,2, 3] You can have multiple assignments at the same time x, y = 2, 3 You can use the multiple assignments to swap values x, y = y, x You can have a chain of assignments x=y=z=3 Arithmetic Operations x = 1 + 2 # x is 3 y = 1 – 3 # y is -2 z = 2 * 2 # z is 4 l = 3**2 # l is 9 f = 5 % 2 # f is 1 g = 5 / 2 # g is 2.5 h = 5 // 2 # h is 2 m = 5 / float(2) # m is 2.5 n = int(5 / 2) # n is 2 Numerical types: int, float, complex Comparison < less than x = [0, 1, 2, 3, 4] y = x z = x[:] <= less than or equal x == y #True > greater than x is y #True >= greater than or equal == equal != not equal is object identity is not negated object identity Operation Meaning x == z #True Bitwise operators: & (AND), | (OR), ~(NOT) Math Command name Description abs(a) absolute value ceil(a) rounds up cos(a) cosine, in radians floor(a) rounds down log(a) logarithm, base e log10(a) logarithm, base 10 max(a, b) larger of two values min(a,b) smaller of two values round(a) nearest whole number sin(a) sine, in radians sqrt(a) square root Constant Description e 2.7182818... pi 3.1415926... Strings • single or double quotation marks • triple quotes for multi line strings x = ‘statistics for data science‘ #single quotes y = “statistics for data science “ #double quotes Z="My name is \"Farah\"" #escaped string -- helps you escape characters that are not allowed V = ‘super long string \ that has more than one part, \ but are all written in one line.‘ # very long string V = ’’’super long string that has more than one part, written in many lines.’’’ # very long string Strings cont. "\t" # tab character len(string) # length of string • Strings are concatenated with + and repeated with * s = 3 * ‘bla' + ‘umm' # s is ‘blablablaumm' • Two or more strings next to each other are automatically concatenated a = 'Py' 'thon' b = a + ‘3' Lists int_list = [1, 2, 3] mixed_list = [“Farah", 1, FALSE] list_of_lists = [ int_list, mixed_list] len(int_list) # length of the list is 3 list_sum = sum(int_list) # sum of integers in the list is 6 Get the i-th element of a list x = [0, 1,2,3,4,5,6,7,8,9] First_element = x[0] # is 0, lists are 0-indexed Second_element = x[1] # is 1 Ninth_element = x[-1] # is 9 – returns the last element Eighth_element = x[-2] # is 8 second to last element Get a part of a list a= b= c= d= e= f= x[1:4] # [1, 2, 3] x[:4] # [0, 1, 2,3] x[-2:] # [8, 9] x[3:] # [3, 4, ..., 9] x[1:-1] # [1, 2, ..., 8] without_first_and_last x[:] # [0, 1, 2, ..., 9] Lists cont Checking for elements in a list Concatenating lists 2 in [1, 2, 3] 5 in [1, 2, 3] a = [3, 2, 1] b = [6, 5, 4] a.extend(b) # a is[3,2,1,6,5,4] a b c a = [3, = [6, = a + stays 2, 1] 5, 4] b # c is [3,2,1,6,5,4]& the same. Modifying lists a = [0, 1, 2, 3, 4, a[2] = a[2] * 2 a[-1] = 0 # a is a[2:4] = a[2:4] * 2 del a[:2] # a is del a[:] # a is 5, 6, 7, 8,9,10] # a is [0, 1, 4, 3, 4, 5, 6, 7, 8,9,10] [0, 1, 4, 3, 4, 5, 6, 7, 8,9,0] # a is [0, 1, 4, 6, 4, 5, 6, 7,8,9 0] [4, 6, 4, 5, 6, 7,8,9 0] [] Lists cont. • Strings can also be accessed like lists. But they can’t be modified (immutable) a = ‘class' b = a[0] # ‘c' c = a[:2] # ‘cl' d = a[-2:] # ‘ss' a[:2] = ‘Cl' # error because they are immutable a = ‘Cl' + a[2:]# a is now Class Functions-range() x= list(range(2, 5)) , print(x) #[2,3,4] for i in range(3): print (i) # prints 0, 1, 2 for i in range(2, 6): print (i) # prints 2, 3, 4,5 for i in range(0, 20, 2): print (i) # prints 0, 2, 4, 6, 8,10,12,14,16,18 for i in range(10, 2, -1): print (i) # prints 9,8,7,6.. x = [‘I', ‘love', ‘Python’] for i in range(len(x)): ... print(i, x[i]) ... 0 I 1 love 2 Python Functions-sort () sorted(a): returns a new sorted list without changing the original list a.sort: sorts the original list a a = [4,1,2,3] b = sorted(a) a.sort() # b is [1,2,3,4], a is same #a is changed to [1,2,3,4] Modifying sorted() # sort the list by absolute value from largest to smallest a = [4,-1, 2,-3] b = sorted(a, key=abs, reverse=True) # is [4,-3, 2,-1] #key is a function that will be called to transform the collection's items before they are compared. if-else if c > 2: #if 1 is greater than 2 a = “cool" elif c > 3: # elif stands for 'else if'" a = ”coool" else: # when all else fails use else a = “Meh" print (a) Loops i = 0 while i < 10: print (i, "is below 10“) #Error if we forget to indent i += 1 for i in range(10): if i == 2: continue # go to next iteration if i == 6: break # quit the loop print (i) Important python libraries for data science • Numpy – Handling multi dimensional arrays • pandas – DataFrame – Handling labeled tabular data • Matplotlib: plotting Reading and Writing Data From .csv file import pandas as pd data=pd.read_csv(r'C:\Users\Farah\Desktop\MATH37198 FALL2022\Week1\1B_Height_in_inches.csv') From excel- use pd.read_excel() Saving data to .csv data.to_csv('C:/Users/Farah/Desktop/Farah.csv') To excel- use df.to_excel() Plotting Line graph. • Good for trends. • use plt.plot • You can specify different marker and line styles, colors, etc. import numpy as np import matplotlib.pyplot as plt #plot graphics will appear in your notebook %matplotlib inline years = list(range(2000, 2022, 2)) Population_In_M = [100, 105, 120, 125, 135, 150, 159,164,171,180,186,196] # create a line chart, years on x-axis, population on y-axis plt.plot(years, Population_In_M, color='green', marker='o', linestyle='solid') # add a title plt.title(“Population Growth") # label to the y-axis plt.ylabel(“Population in Million") # add a label to the x-axis plt.xlabel("Year") plt.show() Scatterplots • visualizing the relationship between two paired sets of data # create a scatter plot, years on x-axis, population on y-axis plt.scatter(years, Population_In_M) # add a title plt.title("Population Growth") # label to the y-axis plt.ylabel("Population in Million") # add a label to the x-axis plt.xlabel("Year") plt.show() import matplotlib.pyplot as plt #plot graphics will appear in your notebook %matplotlib inline # create a line chart, years on x-axis, population on y-axis plt.plot(years, Population_In_M,'green', years, Population_In_M2, 'red') # add a title plt.title("Population Growth") # label to the y-axis plt.ylabel("Population in Million") # add a label to the x-axis plt.xlabel("Year") plt.legend(['Polar Bears','Brown Bears']) plt.show() plt.show() Bar charts • Good for presenting/comparing numbers in discrete set of items # create a bar chart, years on x-axis, population on y-axis plt.bar(years, Population_In_M) # add a title plt.title("Population Growth") # label to the y-axis plt.ylabel("Population in Million") # add a label to the x-axis plt.xlabel("Year") plt.show() Histogram Graphical representation of a frequency distribution table Create Frequency Distribution #read csv data=pd.read_csv(r'C:\Users\Farah\Desktop\MATH37198 FALL2022\Week1\1B_Height_in_inches.csv') #explore the dataset data.head() #creating frequency distribution table freqdist=data.value_counts(), print(freqdist) #plotting histogram plt.hist(data) # add a title plt.title("Heights") # label to the y-axis plt.ylabel("Count") # add a label to the x-axis plt.xlabel("Height in inches") plt.show() 67 65 69 66 68 71 64 63 73 72 61 62 74 6 5 5 5 4 4 4 3 2 2 1 1 1