Comp 116 | 8/19/21 | Day 1 Notes: ● Instance gets launched- google cloud, consumes resources, unc has to pay for it, so don’t leave it running-- EduHelx after 2 hours of inactivity will “harvest” it- can’t reconnect. So run daily to get latest ● Watch the zoom recording once access granted ● Notes in jupyter ● Go through and highlight syllabus ● (find campus printers) (get gg sheets certified) ● Once a week go to comp 116 folder and download the folder for the week to backup to laptop ● ***Worksheet 1***- go to eduhelx files- go to worksheet unlocker- password given in password file- (1st password = ‘ my-first-worksheet ‘ )- run in worksheet unlocker to unlock worksheet- then go back to files and your worksheet should be available to fetch. Ws1 due 8/25/21 (W). Run cells to complete. Checks exist at certain levels to help check answers. “Run” is near top. Submit cell at end. Then go to Unlocker file to submit. ● (Read . Read . Read . Don’t . Bounce .) ● ● Tests may be able to fetch early but need password so can’t open yet import=importing libraries to use ● ****turning things in early (TWO DAYS) gives more credit ● if you collaborate on the solution of a problem set, we expect you to list your collaborators in the space provided at the top of the assignment. COMP 116 | 24 AUG 2021 | Notes-03 Integers, Strings, and Lists ● Everything in python is a class or object ● Ways of manipulating information to get an answer ● Floating-point number Boolean representation ● Everything is a zero or a 1 Jupyter Notebooks ● Consists of cells ● 3 types of cells but we edit in 2- markdown cells and code cells ○ Markdown cells have instructions on how to display text, images, ect-> not really code, just information, such as what’s in the next cell ○ Cells with blue line means u can’t edit it, but if you click in the cell and the line turns green you can edit it ○ Every cell is usually editable but in this class some may be uneditable ○ Pounds make size- more stars makes smaller ○ Star ○ Create a new cell by clicking a or b when the cell you are in is blue/uneditable. A adds an empty cell above, b adds an empty cell below ○ To delete a cell, exit edit mode (make sure line is blue) and hit d-d ○ Undo delete cell from the toggle bar ○ Many notebooks have autosave, but you should SAVE FREQUENTLY w/ floppy disk ○ Hit z to undo a deleted cell ○ Use esc to exit edit mode Backup w/ File Browser ● Select file and download zip file Python Numerical Types ● ● ● ● ● ● ● ● First cell will always say import comp 116- a library that we will use a LOT. run this cell Use numbers- integers- to set variables. ○ Example: x=1 ○ Variables might be named- such as “bankBalance” Everything after a pound is just a comment. Always use comments Use print(content) to make information appear below cell when ran ○ Example, if x=1, print(x) will make 1 appear Floating-point number or “floats”- a number with a fraction or decimal In the print line, use ‘ ’, to include notes type() :determine the variable’s type ○ float or integer for numbers, but can be used on other variables Use multiplication or exponential notation to shortcut large numbers ○ * means multiply ○ ** means raise to the power of (like the carrot) ○ Can also use scientific notation/exponential notation w/ e ■ 3.5*10**12 = 3.5e12 = 3.5 trillion ○ Will do sig figs i think Strings ● ● Collection of letters and numbers Use quotes or single quotes (not both). Will include single quotes in output COMP 116 | 25 AUG 2021 | Recitation Notes ● ● ● ● ● ● ● ● Compilers translate code to another, intermediate language to be interpreted by interpreters to the computer’s own language (differs by chip- linux, mac, ect) Python written by dutch physicist who needed language to manipulate lots of data Open source movements are funded by various groups Variables○ Ex. when u say “x=1”, you're putting the boolean statement of 1 and sending it to an address in the operating system (long software). Asking the operating system to find a place to put the 1 When you just type in a string the singular quotes will show, but if you use a variable in place of just the string, and then print the variable, the quotes disappear Double equal signs means “are these really equal” ○ But not always completely reliable- when you ask if 1==1.00...01, it may say false or true depending on how many 0’s are included. Always limited, but reliability differs between different numbers True and False are python keywords, and the capital letter matters!!! https://www.w3schools.com/python/python_ref_keywords.asp String Math ● ● ● ● ● You can add operators between strings- concatenates (link (things) together in a chain or series) the strings. example○ x=’string1’ ○ y=’string2’ ○ print(x+y) ○ string1string2 Have to translate errors sometimes len(variable) tells you the length of the string Don’t assign strings to a variable of “len” or other functions https://docs.python.org/3/library/functions.html Indexing ● ● ● ● ● Get stuff out of a collection of stuff Computer science counts starting at 0! brackets index -> print(x[0]) tells you the first character of the variable x (aka the character at index 0) [-1] gives you the last character in the string You can use len to get that number indexed ○ So print(x[len(x)]) will get you the nth character when n=length of the string Slicing ● Slice parts of a string using numbers separated by a colon. 1st index=starting position, 2nd=stoping index. Doesn’t include the stopping character itself ○ Remember it starts from 0! So the 3rd index is the 4th letter ○ first_ten_integers[:6:2] goes from the 1st character (0 index) to the 7th (6rd index), skipping by 2 ○ print(first_ten_integers[:3:2]) → 1 3 5 7 *slightest difference between letter “l” and number“1- the letter is only vertical and straight lines while the number has a sloped hat Easter Egg: run import this’ (no quotes) Lists ● ● ● ● ● Like strings but not Both: ○ Have length ○ Can select elements ○ Can be sliced But! ○ You can update elements of a list (aka lists are mutable, strings are immutable) Lists are created with brackets w/ elements separated by commas Empty list is just [ ] (assign this to a variable and print) COMP 116 | 26 AUG 2021 | Notes-4 Some interesting operations: ● ● ● ● The `/` division returns a float (real) number. `3/6` returns 0.5 The `//` division returns an integer (whole) number and truncates the result. `3//6` returns 0, 8//6 returns 1 ○ Aka cuts off to whole number- not rounding, so 11//6 is still 1, even though 11/6 would round to 2 Also, the `%` modulo operator returns the remainder `3 % 6` returns 3 /= aka Division Assignment: Divides the variable by a value and assigns the result to that variable. Method vs Function: methods are functions meant to apply to objects Find Method ● ● ● Use string method find to find an index Find can search for a substring x.find(‘thing you want to find’) Count Method ● ● Count amount of times a substring occurs in a string ○ Ex- how many periods?= how many sentences ○ x.count(‘.’) 3 apostrophes allows a long string to continue over multiple lines .format(id(x)) ● Finds the address of x in a certain format Strings Again ● ● ● String slicing start:stop:step index Example, to print something backwards do print(x[::-1]) ***THIS WILL BE ON THE MIDTERM!** PLUCKING OUT CHARACTERS _____Notes 5____ ● TypeError is raised whenever an operation is performed on an incorrect/unsupported object type ○ Ex. using () instead of [] ● At this point, it is not important to know a lot about the differences between these two, but we will explain more as we progress. ● Important to have line numbering on, read the message, and be able to correct it ● Unless you're an expert typist, assume you'll make typographical errors on the quizzes and exams! Comp 116 | 31 AUG. 2021 ● The three primitive Python built-in types of ○ numbers (integers and floats) but will not discuss complex, octal, or hexadecimal numbers. ■ We will also discuss scientific notation format numbers. ● ○ string (letters, numbers, etc enclosed in single or double quotes) ○ booleans (True or False) Data structure: form of collecting, storing, organizing, managing, and using a collection of stuff Lists vs. Strings ● Similarities: ● Lists and strings hold many "objects" in one variable ● Indexing - you can access individual objects using an index ● Slicing - you can access a subset of the objects using index slicing ● Differences: ● Lists can hold any kind of object in the same variable, strings only hold characters ● Lists are mutable - can be changed after creation, but strings are immutable - cannot be changed after creation Strings ● ● Strings are immutable- can’t overwrite the element and retain the same string ○ Aka if you assign a new string to the same name as an older string it changes its id Can only hold characters Lists ● ● ● ● ● ● ● Have length You can select elements Can be sliced Can hold many things like: integers, strings, other lists, ect/ *mutable- elements can be mutable So if you print a list, and then change it below and print the new list too (don’t erase first list print), then both the old and new list will print Del can be used to delete elements **Basic List Operations: FunctionsMethods-Functions carry out task and use function() -Methods are similar but are applied to objects/ classes. Use the .method() format Lists cont. ● Lists made w/ square brackets w/ elements separated by commas ○ Or: list((element1, element2)) ● empty_list=[] ○ use to fill later, reserve space, ect ○ Ex. fill an empty list with a bunch of zeros to reserve a bunch of space **Remember that we start from 0** ● Add elements to a list using the method append ○ Example: x.append(6) adds the element 6 to the list x ■ Only adds 1 element though ○ Or use + to add on ■ Ex. first_ten_positive_integers_list=11,12] adds 11 and 12 to the list ● The len function returns the number of elements in a list ● The range function uses start:stop:step to create a range of numbers ○ Does not include second number ○ Only thing required is stop ○ The list function can take the range and make a list ○ ● ● (example, use when you don’t want to type out a bunch of numbers, don’t want to miss any, ect) ■ So: list(range(1,11)) creates a list of the first 10 positive integers Multiplying a list makes that list appear that many times ○ Ex. print([0]*5) prints: [0,0,0,0,0} If you set two lists equal to each other, and change one of them, you change both of them ○ Unless you make a copy with the .copy method- then only the original or copy will change (depending on what you do) ■ Ex: x=lst1.copy() makes a copy of the list titled “lst1” Other Stuff ● Indexing- start:stop:step should become second nature before the exam ● print out intermediate results to validate your progress ● Fun fact, the origin of debugging cam from a dead moth that fired itself in an early computer ● Offset: where an element starts ● abs(): function to take the absolute value of a number Notes-07 - NumPy array ● ● ● ● ● ● ● ● ● my_array=np.array(lists_of_lists) Build on the capabilities of strings and lists, but are ○ Faster ○ Handle only homogeneous elements (all elements of the same type) ○ Each list must also have the same number of elements ○ Have multiple ways to initialize ○ Allow Boolean selection of elements (this is important, highly useful, but sometimes confusing!) ○ External library (called module) so must import ■ Import numpy as np ● np bc shorter Module: external content libraries with built-in definitions and functions np.array([ , ]) creates an array np.arange(star,stop,step): numpy function that generates a numpy array ○ arr = np.arange(10) # Creates an nd-array [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ] ○ Notice how the syntax is start,stop,step with *commas*, as opposed to colons used in lists *length* counts the number of elements, so starts at 1 Np-arrays are mutable We style all numPy functions with np.function( ) Since numpy needs things to be the same type, will convert when possible ○ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Ex. in a list of integers, tring to change an element to be 37.5 will convert that element to 37 instead ○ Some elements can’t be converted Np.zeros: numPy function that creates an array of zeros ○ zero_arr = np.zeros(10) creates an array of 10 zeros NumPy arrays have space between elements while lists have commas Dtype: argument of type ○ zero_arr = np.zeros(10, dtype=int) NumPy Zeros takes an optional parameter- can make an array of boolean Falses ○ Use variable five_falses to create numPy array of 5 Falses ○ Can use two equal equal statements to create an array of true ■ five_falses=np.zeros(5, dtype=bool) ■ another_five_falses=np.array([False] * 5) ■ print(five_falses==another_five_falses) ● == tests if each element is equal Can initialize a numPy array w/ a Python list np.any() Will tell you if any element of a Numpy array evaluates to true np.all() Will tell you if all elements of a Numpy array evaluates to true np.sum() takes the sum of items in the array ○ np.sum(array, axis = 0) takes the sum of the columns (aka collapses the rows into 1 row) ○ np.sum(y, axis = 1) takes the sum of the rows (aka collapses the columns into 1 column np.mean() takes the mean of the items in the array. Note that there is also a np.average() np.average() takes the average of the items in the array np.std() takes the standard deviation of the items in the array np.max() finds the maximum value element in the array np.min() finds the minimum value element in the array np.prd() computes the product of all the elements in the array np.argmax() returns the offset of the maximum value np.argmin() returns the offset of the minimum value . np.count_nonzero() will return the values that are non-zero. Operations on arrays- You can add two arrays of the same size or add a constant to all elements of an array np.diff(): makes an array of the differences of the elements in the original array Notes 08 - Matplotlib.pyplot for plotting ● ● ● Use mtplotlib to plot Numpy arrays **x and y values are arrays*** PyPlot plt function plots a line from a numpy array of y-values ○ Can use x-values too, but don’t have to when plotting lines of equally spaced data ○ The plot command can take x-values as an argument ○ Plots print out with the figure id so it can be reused ● ● ● ● ● ● ● ● ● ● just type pyplot.x and then hit <tab> to see the list of functions available that effect the x values. plt.xlabel takes a string that will be the x-axis label plt.xticks takes an array of labels on the ticks on the x-axis ○ plt.xticks(range(len(list_of_labels)), list__of_labels, rotation=’vertical’) plt.ylabel takes a string that will be the y-axis label plt.yticks takes an array of labels on the ticks on the y-axis plt.title takes a string that will be the plot's title Plt.legend makes a legend ○ plt.legend(('Coal', 'Natural gas', 'Petroleum'), loc='upper right') The optional parameter fmt is a convenient way for defining basic formatting like color, marker and linestyle. ○ >>> ply.plot(x, y) # plot x and y using default line style and color ● >>> ply.plot(x, y, 'bo') # plot x and y using blue circle markers ● >>> ply.plot(y) # plot y using x as index array 0..N-1 ● >>> ply.plot(y, 'r+') # ditto, but with red plusses ● plt.plot(x_values, y_values, ‘r.’) yields red dots ○ ‘b’ for blue, ‘y. For yellow, ‘g’ for green ○ ‘.’ for dot, ‘D’ for rhombus, ‘d’ for diamond, ‘+’ for plus, ‘o’ for circle ● To overlay dots with a line, just call plt.plot again Gcf (Get Current Figure) plt.bar() creates a bar graph ○ Sometimes, use x_value=np.arange(len(y_value) Change and Relative Change ● ● Change or difference =the 2nd number minus the 1st number ○ The change from 100 to 101 is 1. The change from 101 to 102 is 1 ○ f-i Relative change =the change divided by the first value ○ The relative change from 100 to 101 is 0.01 Notes-09 R-Plotting ● ● ● Np.sine: No.cosine: Np.pi: outputs pi Notes-10 Multi-Arrays ● ● ● Aka N-dimensional arrays Spreadsheet can be considered a 2-d array, and any table with rows and columns /= (Division Assignment): Divides the variable by a value and assigns the result to that variable Referencing individual elements in 2-d arrays ● ● ● ● ● ● Colons for start:stop:step in any dimension/axis Commas to separate references to the dimensions/axes (aka to signify a certain row or column) ○ print(array_variable[0,4]) gives you the row w/ offset 0 (1st row) and column w/ offset 4 (5th column) ○ But sometimes you want to instead make a variable equal to the row offset you want, and use that instead of counting the rows- aka use a design pattern so you can reuse it ○ print(array_row[4,]) gives you the 5th row ○ print(array_column[:,0]) gives you the 1st column .shape property that describes the number of elements in each dimension ○ Comma delimited ○ Outputs as a tuple- immutable array formatted w/ parenthesis and commas ○ Ex. four_by_five_array.shape → (4, 5 ) .shape[0] gives number of elements in first dimension, # of rows .shape[1] gives number of elements in the second dimension, # of columns Property- just reference it ○ No parenthesis ● Careful, Axis 0 goes down each column, Axis 1 goes through each row Notes-11 Boolean Selectors ● ● ● ● ● ● Boolean logical operators: and, or, and not Booleans resulting from comparisons: ==, !=, >, <, >=, <= Boolean element by element operations: &, |, and ~ Booleans for array indexing: use an array of booleans that is the size of the dimension being indexed bool() : function that outputs the boolean value Conditional Operators ● ● ● ● ● ● greater than: > less than: < greater than or equal to: >= less than or equal to: <= equal to: == Not equal to: != Boolean Review ● ● Either True or False ○ May be treated as zeros (F) and ones (T) Count the number of times something is true using np.sum (since True=1) ○ Ex. no_years_decrease=np.sum(nc_coal_change<0) where nc_coal_change is an array of the differences between nc_coal from year to year Logical Operations and and or evaluates expression from left to right. ● Logical AND: True if both the operands are true ○ with and, if all values are True, returns the last evaluated value. If any value is False, returns the first one. ● Logical OR: True if either of the operands is true ○ or returns the first True value. If all are False, returns the last value Membership Operators ● in evaluates to True if it finds a variable in a specified sequence and False otherwise. ● not in evaluates to False if it finds a variable in a sequence, True otherwise. Identity Operators ● is evaluates to True if the variables on either side of the operator point to the same object and False otherwise ● is not evaluates to False if the variables on either side of the operator point to the same object and True otherwise Finding stuff ● Use np.max() to find True bc it is the higher value ● Use np.min() to find False element Operators ● & : and ● | : or ● ~ : complement (aka reverse True and False ○ If a and b are np. arrays : a=np.array([False, True, False, True]) b=np.array([False, False, True, True]) ■ a&b = a and b ● [False False False True] ■ a|b ● [False True True True ■ ~a ● [True False True False] ● Boolean Selectors/Masks ● use an array of booleans that is the size of the dimension being indexed to index an array ○ abc_boolean_mask = (a_thru_z_arr=='a’) | (a_thru_z_arr=='b') | (a_thru_z_arr=='c') is a boolean mask that makes only a,b, and c True ○ a_thru_z_by_numbers[:,abc_boolean_mask] prints only the a, b, and c columns ■ If a_thru_z_arr were the rows, a_thru_z_by_numbers[abc_boolean_mask] prints just the a,b, and c rows Notes 12 Numpy Examples range() vs np.arange() ● range(): python sequence type ○ Generates integers from one value to another in steps (start:stop:step) ● ● ● ● ● ● ● np.arange(): numpy function generating a list of numbers (floats or integers) from one value to another in steps (start:stop:step) np.median(): returns the middle number of an array ○ Returns average of two middle values if there’s an even number of elements .split(): method that makes a list out of a variable where a specified separator marks each new element ○ whatever ‘s in the () after “.split” is called the separator ○ Ex.: my_text_as_list = my_text.split(“.”) makes a list of each sentence in the text .pop(): method that cuts off the last element from a list .squeeze(): method that makes an array one dimensional (aka if it’s previously x1 or 1x) ○ ex. if array.shape() is (10,1), the shape of array.squeeze() would be (10,) ○ Apply when setting the variable np.diff(): makes an array of the differences of the elements in the original array Count the number of times something is true using np.sum (since True=1) ○ Ex. no_years_decrease=np.sum(nc_coal_change<0) where nc_coal_change is an array of the differences between nc_coal from year to year Notes 13- Functions Stuff from WS5 ● ● ● Parts of a function: name(argument) Common ones: len, range Writing own function syntax: Def functionname (argument1, argument2) : ● ● Return function_output ^ must be indented because it is under the code. More code may be indented more in more complicated functions Might want to set the output equation to a variable, or just put it in next to return Ex: def functionname(number): output1 = number*5 output2 = number return (output1, output2) print(functionname(56)[0]) print(functionname(56)[1]) ● ● Global variable: variable defined outside of a function Local function: variable only defined inside of a function Stuff from Class ● Functions: named/labeled chunks of code that are designed to do a specific job ○ Designed to be reused ○ Are “called” when used in expressions Defining a function ● ● ● ● ● Signature: the 1st line of a function that defines its name, return type, and parameters (optional) ○ Starts with the def keyword, then the name ○ Parameters: inputs to the functioned entered in parentheses after the function name ■ def name(parameters): ■ Parameters are not always necessary ■ Functions may be defined with default parameters ■ Even if no parameters, still need empty () After the signature is usually a comment documenting what the function does ○ Docstring: A special comment located at the beginning of a function saying what it does (helps documentation) ■ ‘ ’ ’ this is a docstring‘ ’ ’ Returns ○ Functions can have any number of return statements- including 0 ○ Return values are written after the keyword return ○ The return statement can return a value or just return ■ print() is a function that just returns ■ np.array() is a function that returns an array *all statements executed when a function is called are indented wrt the signature ○ Wrt: with regards to Function ends when indentation returns to same level as the def statement or execute a return statement ***Never use global variables in a function!!!**** *the variable used when calling a function doesn’t have to match the name of the parameter* Notes 14- Functions Practice Adding Text Files ● Text file: file that ends in .txt ● After the code shown right-> do: my_text = load_text_file() print(my_text) Iterables Iterable: any object, like a list or a string, that can have its members returned one at a time Notes 15 - For Loops ● for loops are used when you want to repeatedly execute a piece of code a fixed number of times Control Flow ● ● ● ● ● Looping in python: ○ For loops ○ While loops Conditional statements in python ○ if-elif-else In many computations, you want to perform a set of operations on a number of items or a given number of times. ex○ Find the squares of a list of numbers ○ Compute the factorial of n by multiplying the first n integers ○ Read line-by-line from a file until you finish reading all lines in the file. A For loop is the control structure that is used to accomplish this. ○ Iterate over a sequence like a list, array, string, lines in a file etc. ○ Think of the for loop as repeating the same process for each member of a list, array, string, etc The general syntax for a For loop is: ○ for <var> in <iterable>: ○ block of code ■ #Note the indentation. It is necessary. ■ ■ ● ● <var> = the iteration variable that is updated every time through the loop. <iterable> = a python object called an iterable. Anything that has a sequence is an iterable. ■ The indentation marks the block of code to be executed as part of the For loop. Loop invariant:logical assertion about a program loop that is true before (and after) each iteration ○ Useful to prove correctness of loops (we won’t do that) General ideas about for loop creation: ○ You will typically set some things up at the beginning of the loop. ○ ○ ● ● ● ● ● ● ● You will loop a finite number of times. You will compute something that you will use later. Factorial Example: ○ The loop invariant for the factorial for loop is fact is equal to factorial of iteration variable i You can nest for loops inside each other One word at a time: .readline( ): a method that reads a line of text from a file ○ Each time you run it it reads the next line ○ So if you do a for loop for i range(n): print(i), it will print n lines of the poem Continue statement: terminates this iteration of the loop and the next statement that gets executed is the top of the loop. ○ Aka, the next part of the loop won’t apply the stuff above it but will keep the variables in the same place ○ (so if j was originally 0 but over the course of the loop is j+=1, whatever j equals when the continue statement runs will carry over to the next statement) break statement: terminates the loop and the next statement that gets executed is the one immediately following the loop body. With nested for loops, the break statement terminates only the loop in which it occurs. Factorial: that number multiplied by all of the positive numbers that precede it. (for non-negative numbers) ----> Notes 16- If-Else ● ● ● If statements: statement executed if True Else statements: statement executed if the if statement executes as false Elif statement: additional if statements after the initial if. ○ If more than one apply and no and statement is used, the statement higher up in the code will be executed Notes 17- Axes3D-PDB ● ● ● ● To plot a three dimensional plot on a two dimensional Matlab plot use Axes3d: ○ create a pyplot figure, ○ pass that to Axes3d, and ○ use that to plot x, y, and z axes. fig = plt.figure(figsize=(4,4) ax = plt.axes(projection='3d') ax.plot(xline, yline, zline, c= 'r', marker = 'o') # x, y, and z are 1D arrays Notes 18- Read CVS Reading files of data separated by commas (or other separators) ● ● ● ● ● ● ● Many types of data are tabular. ○ each row can be an observation, and each column is something that is measured Many such data sets are stored in a text file with one row of data in each line. Each of the values in a row is separated by a separator. ○ comma, ",". ○ tab Typically each row in the text file has the same number of values If a value is missing, the separator for that value is still included Such files are called, CSV files for Comma Separated Values file ○ If the Tab is the separator, such files are called TSV files ○ But CSV is often used generically for all separators Very frequently, CSV files include a number of header lines at the beginning of the file ○ provide information about the data ○ A very common header line provides a brief name for each column of data NumPy loadtxt ● ● np.loadtxt: numpy function that reads data ○ Data in file must be the same type ○ np.loadtxt(csv_filename, delimiter=',') If the data is separated by commas, specify delimiter=',' Other ways to read data ● .split() with a separator specified to pull the fields apart can pull out each line at a time Data Science Process 1. 2. 3. 4. 5. Ask an interesting questions Get the data Explore the Data Model the data Communicate and visualize the results Notes 19- More PDB ● ● .strip(): returns a copy of the string by removing both the leading and the trailing characters message = ' Learn Python ' # remove leading and trailing whitespaces print('Message:', message.strip()) # Output: Message: Learn Python PDB files are fixed bracket Assignment 2 .lower(): converts a string to lowercase .isalnum(): returns true if all the characters in a string are letters or numbers .isalpha(): returns true if all the characters in a string are letters .endswith(): returns true if the string ends with the specified value Notes-20 Date Objects ● Object-Oriented Programming (OOP) ○ focus is on creating reusable patterns of code, in contrast to procedural programming, which focuses on explicit sequenced instructions. ● ● ● ● Class — is a template for an object. This defines a set of attributes that will characterize any object that is instantiated from this class. Object — is an instance of a class. This is the realized version of the class. .reshape(): method to set the dimensions of an array ○ (rows, columns) ○ Ex. arr=np.arange(12) arr= arr.reshape((3,4)) print(arr) -> [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] dir(): function that gives a list of the properties and methods associated with an object Dates ● ● ● ● ● ● ● ● ● from fime import date : keywords that give us access to the date object date(): function that creates a date object ○ (Year, month, day) .weekday(): method that gives the day of the week ○ 0 is Monday, 1 is Tuesday, etc ○ If using this, can use to find the right index in a list of the days of the week ■ day=array_of_day_names[datetime.weekday()] ○ Easier to just use .strftime(‘%A) .year: method that gives the year of the datetime .month: method that gives the month .day: method that gives the year .strftime("%A"): method that turns a tuple into a string ○ A=weekday ○ B=month ○ D= MM/DD/YYYY ○ F=yyyy-mm-dd ○ G=year ○ Y=year How many times a certain date occurred→ How many days until→ Notes 21- Try Except ● ● Run-time errors occur during execution Try: Except: Blocks to determine whether there is a run-time error in your code ● ● ● ● ○ try: # code you want to execute ○ ○ except: # code to execute if there is an error in the `try` code block. ○ Print the type of error by making e an object and finding its type ○ except Exception as e: ○ print('\tError while computing 1/i :', e) ○ print(type(e)) Might also put a break at the end to stop the code if there is an error input(): function that outputs an entry box that allows the user to input content into the code To test if something is an integer, do: ○ Try: x= int(x) Except ValueError: print(‘Not an integer’) “/” aka the escape character ● ● “/n” creates a new line in a string “/t” creates a tab in a string Notes 23- Sets ● Sets: python data structure, unordered collection of unique items ○ Usually explained as venn diagrams- some set members are in both sets (set intersection), some are only in one set, and some can be in either (aka can be in one or the other or both) (set union) ○ Can only have ONE of each item ○ immutable Creating Lists ● ● ● ● Can instantiate a python set ○ set(): function that creates a set .add(): method that adds items to a set { , } : the syntax for a set ○ Ex. my_set={0,1,2,3} Can turn other data structures into sets ○ Lists, np.arrays, strings ● ● ● ● ● ● ● ● ● ● Remember, it’s unordered, may come out in unexpected ways ○ See method 4, how the string got all messy? Fix this by putting it in brackets like a list Not immutable- therefore can’t subscript into a set, but you “look through” a set- aka can be used in a for loop ○ Set[2] will NOT work ○ For ch in my_set: ‘ does work sorted(): overloaded function that sorts objects Set union: the set of every element that is in either or both sets ○ AUB ○ .union() ■ set_union_set1_or_set2 = set1.union(set2) ■ print("Union, whales or fish:", fish.union(whales)) ● Prints what is in fish, whale, or both Set intersection: the set of all the elements that are common to both sets ○ A⋂B ○ .intersection() ■ set1_and_set2 = set1.intersection(set2) ■ print("Intersection, whales and fish:", whales_and_fish) ● Prints what is in both whale and fish .difference(): set method that returns element that are only in the one set ○ print(whales.difference(fish)) ■ Prints only what is in whales but is not in fish / : lets you continue code down a line so you don’t have to scroll a bunch .remove(): removes item from a set ○ If the item isn’t in the set, an error occurs .discard(): removes item from a set ○ If the item isn’t in the set, nothing happens np.unique(): only keeps unique items in an array Unpacking ● *: unpacks things in a data set ○ arr = np.arange(10) ○ print(arr) ■ [0 1 2 3 4 5 6 7 8 9] ○ print(*arr) ■ 0123456789 Comprehensions ● Comprehensions: constructs that allow sequences to be built from other sequences ○ Aka, data structures made from other data structures, using things like for and if functions ○ List comprehensions: Square brackets containing an expression followed by a for clause, then zero or more for or if clauses ■ Ex. variable = [out_exp for out_exp in input_list if out_exp == 2] Notes 24- Sets Counting with-as ● with: keyword used within a function that ● as: ○ with open(filenm, ‘r’) as csvfile: ○ csv.reader(): method that reads csv files A3 #strings have several methods that you will find useful for this assignment ○ s = 'aBc' ○ t = s.upper() # make it all upper case ○ u = t.lower() # make that all lower case ● # we can convert from strings to integers or floats ○ s = '123' # a string with the characters 1, 2, and 3 ○ i = int(s) # now it is the number 123 (one hundred twenty three) ○ # note that they look the same when we print them but their types are different ● t = '3.14' # a string with characters 3, period, 1, and 4 ● f = float(t) # a floating point number 3.14 Notes 24- Dictionaries ● ● ● ● ● Dictionary: collection of objects which are indexed by a key ○ Data is organized as a collection of key:value pair ○ Unordered ○ Changeable Key: any immutable (unchangeable) object ○ Ex. string, tuple, date ○ Must be immutable bc they are hashed to speed things up Dictionaries are created by: ○ Curly brackets: x = { } ■ Ex. dict1 = {'apple':'fruit', 'carrot':'vegetable'} ○ dict( ): function that creates a dictionary Keys can be used to index the dictionary: You can also create an empty dictionary and add the key value pairs later○ class_dict = dict() ○ class_dict['semester'] = 'Fall' ○ ○ ○ ● ● ● ● ● ● class_dict['year'] = 2020 class_dict['course no'] = 'COMP116' print('class_dict=',class_dict) ■ class_dict= {'semester': 'Fall', 'year': 2020, 'course no': 'COMP116'} .update(): method that lets you add multiple items to a dictionary or a set .keys(): method that lists the keys Tuple: objects grouped by parenthesis ○ Immutable **In assignment 4 you might make the city name be the key to a dictionary of where complaints were issued. **Also in assignment 4 you might make the coordinates the key to where a blocked driveway occurred. .get(keyname, value): method that returns the value of the specified key, or, if that key doesn’t exist, a value to return instead ○ Including a default value is optional-- if none added, will return None if the key doesn’t exist Notes 26- Sets and Dictionaries ● ● ● ● Dict.fromkeys: creates a new dictionary using keys from the provided list and setting all associated values to the provided value Shift + Tab : trick that brings up function and definition Command + / :comments out big swatch of code Camelback style: naming convention where first word is lowercase and the first letter of subsequent words are uppercase Notes 28- API ● ● You can put dictionaries in dictionaries using lists ○ data = '{"employees":[{"firstName":"John", "lastName":"Doe"},{"firstName":"Anna", "lastName":"Smith"},{"firstName":"Peter", "lastName":"Jones"}]}' .load(): turns a JSON formatted value into a python object Notes 29- Pandas Intro Pandas ● ● Pandas: library for DataScience that builds on NumPy Data structures- Series, DataFrames, Index Series ● ● ● ● Series: 1-D np.array with axis labels Generalization of both 1-d numpy arrays and dictionaries Can have flexible the indices (aka you can easily change the indices using lists): data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd']) print(data) a 0.25 b 0.50 c 0.75 d 1.00 dtype: float64 pd.Series(dictionary): turns a different data structure (like a dictionary) into a Series DataFrame ● ● ● ● ● ● ● ● ● DataFrame: 2-D labeled array with both flexible row indices and flexible column names DataFrame can be thought of either as a generalization of a NumPy array, or as a specialization of a Python dictionary A 2-D numpy array is an ordered sequence of aligned 1-D columns, a DataFrame is a sequence of aligned Series objects ○ Aligned: share the same index pd.DataFrame(): pandas method that creates a DataFrame DataFrame=pd.DataFrame9{ ‘1st column label’: 1st column series, ‘2nd column label’: 2nd column series}) ○ The row labels are the indices of the 2 series, which should be the same You can also create a dataframe from 2-d np.array: dataframe= pd.DataFrame(array, columns=[‘label 1’, ‘label 2’], index=[‘index 1’,‘index 2’]) Use the column labels as the index to get the column Use the loc indexer attribute to get the rows: ○ Dataframe.loc[‘index name’] .iloc[]: gets the rows using implicit numbering (aka instead of using the labels as indices, use the default number indices) Pandas Index Object ● ● ● Index object can be thought of as either an immutable array or an ordered set Index Object: immutable sequence used for indexing and assignment, the basic object storing axis labels for all pandas objects Can find intersections and unions ○ Reminder: ○ .intersection(): method that returns only the items that exist in both/all sets ○ .union(): method that returns all the items, excluding duplicates (items in one, the other, or both) More Pandas ● ● ● ● ● ● ● ● ● pd.read_csv(): function that reads csv data df.head(n): returns the first n rows of the dataframe ○ Defaults to 5 data_a=dataframe[dataframe['column label 1'] == 'data a in that column'] data_a.head() Outputs the row that includes column data a .get_loc(): get integer location, slice or boolean mask for requested label ○ Ex: og_dataframe = pd.read_csv('Datasets.csv’) ○ abc = og_dataframe[og_dataframe['Column label 1] == 'abc'] ○ data_start_index = abc.columns.get_loc('a') Outputs index of ‘a’ in the row containing ‘abc’ Convert dataframe to array using np.array(), might have to use .squeeze() .set_index(): pandas method to change the index from a sequence of numbers to labels from a column in the dataset ○ It’s okay if the labels are no unique for each row ○ When that column becomes the index it is no longer a column in the dataframe .diff(): pandas method for finding the difference .to_numpy(): pandas method for converting a dataframe to numpy .pct_change(): pandas method that finds the relative change between the current and a prior element .idxmax(): pandas method for finding maximum