Uploaded by jaboyer2003

UNC Comp 116 (Fall 2021) Notes

advertisement
Comp 116 | 8/19/21 | Day 1
Notes:
● Instance gets launched- google cloud, consumes resources, unc has to pay for it, so
don’t leave it running-- EduHelx after 2 hours of inactivity will “harvest” it- can’t
reconnect. So run daily to get latest
● Watch the zoom recording once access granted
● Notes in jupyter
● Go through and highlight syllabus
● (find campus printers) (get gg sheets certified)
● Once a week go to comp 116 folder and download the folder for the week to backup to
laptop
●
***Worksheet 1***- go to eduhelx files- go to worksheet unlocker- password given in
password file- (1st password = ‘ my-first-worksheet ‘ )- run in worksheet unlocker to
unlock worksheet- then go back to files and your worksheet should be available to fetch.
Ws1 due 8/25/21 (W). Run cells to complete. Checks exist at certain levels to help check
answers. “Run” is near top. Submit cell at end. Then go to Unlocker file to submit.
●
(Read . Read . Read . Don’t . Bounce .)
●
●
Tests may be able to fetch early but need password so can’t open yet
import=importing libraries to use
●
****turning things in early (TWO DAYS) gives more credit
●
if you collaborate on the solution of a problem set, we expect you to list your
collaborators in the space provided at the top of the assignment.
COMP 116 | 24 AUG 2021 | Notes-03
Integers, Strings, and Lists
● Everything in python is a class or object
● Ways of manipulating information to get an answer
● Floating-point number
Boolean representation
● Everything is a zero or a 1
Jupyter Notebooks
● Consists of cells
● 3 types of cells but we edit in 2- markdown cells and code cells
○
Markdown cells have instructions on how to display text, images, ect-> not really
code, just information, such as what’s in the next cell
○ Cells with blue line means u can’t edit it, but if you click in the cell and the line
turns green you can edit it
○ Every cell is usually editable but in this class some may be uneditable
○ Pounds make size- more stars makes smaller
○ Star
○ Create a new cell by clicking a or b when the cell you are in is blue/uneditable. A
adds an empty cell above, b adds an empty cell below
○ To delete a cell, exit edit mode (make sure line is blue) and hit d-d
○ Undo delete cell from the toggle bar
○ Many notebooks have autosave, but you should SAVE FREQUENTLY w/ floppy
disk
○ Hit z to undo a deleted cell
○ Use esc to exit edit mode
Backup w/ File Browser
● Select file and download zip file
Python
Numerical Types
●
●
●
●
●
●
●
●
First cell will always say import comp 116- a library that we will use a LOT. run this cell
Use numbers- integers- to set variables.
○ Example: x=1
○ Variables might be named- such as “bankBalance”
Everything after a pound is just a comment. Always use comments
Use print(content) to make information appear below cell when ran
○ Example, if x=1, print(x) will make 1 appear
Floating-point number or “floats”- a number with a fraction or decimal
In the print line, use ‘ ’, to include notes
type() :determine the variable’s type
○ float or integer for numbers, but can be used on other variables
Use multiplication or exponential notation to shortcut large numbers
○ * means multiply
○ ** means raise to the power of (like the carrot)
○ Can also use scientific notation/exponential notation w/ e
■ 3.5*10**12 = 3.5e12 = 3.5 trillion
○ Will do sig figs i think
Strings
●
●
Collection of letters and numbers
Use quotes or single quotes (not both). Will include single quotes in output
COMP 116 | 25 AUG 2021 | Recitation Notes
●
●
●
●
●
●
●
●
Compilers translate code to another, intermediate language to be interpreted by
interpreters to the computer’s own language (differs by chip- linux, mac, ect)
Python written by dutch physicist who needed language to manipulate lots of data
Open source movements are funded by various groups
Variables○ Ex. when u say “x=1”, you're putting the boolean statement of 1 and sending it to
an address in the operating system (long software). Asking the operating system
to find a place to put the 1
When you just type in a string the singular quotes will show, but if you use a variable in
place of just the string, and then print the variable, the quotes disappear
Double equal signs means “are these really equal”
○ But not always completely reliable- when you ask if 1==1.00...01, it may say false
or true depending on how many 0’s are included. Always limited, but reliability
differs between different numbers
True and False are python keywords, and the capital letter matters!!!
https://www.w3schools.com/python/python_ref_keywords.asp
String Math
●
●
●
●
●
You can add operators between strings- concatenates (link (things) together in a chain or
series) the strings. example○ x=’string1’
○ y=’string2’
○ print(x+y)
○ string1string2
Have to translate errors sometimes
len(variable) tells you the length of the string
Don’t assign strings to a variable of “len” or other functions
https://docs.python.org/3/library/functions.html
Indexing
●
●
●
●
●
Get stuff out of a collection of stuff
Computer science counts starting at 0!
brackets index -> print(x[0]) tells you the first character of the variable x (aka the
character at index 0)
[-1] gives you the last character in the string
You can use len to get that number indexed
○ So print(x[len(x)]) will get you the nth character when n=length of the string
Slicing
●
Slice parts of a string using numbers separated by a colon. 1st index=starting position,
2nd=stoping index. Doesn’t include the stopping character itself
○ Remember it starts from 0! So the 3rd index is the 4th letter
○ first_ten_integers[:6:2] goes from the 1st character (0 index) to the 7th (6rd
index), skipping by 2
○ print(first_ten_integers[:3:2]) → 1 3 5 7
*slightest difference between letter “l” and number“1- the letter is only vertical and straight lines
while the number has a sloped hat
Easter Egg: run import this’ (no quotes)
Lists
●
●
●
●
●
Like strings but not
Both:
○ Have length
○ Can select elements
○ Can be sliced
But!
○ You can update elements of a list (aka lists are mutable, strings are immutable)
Lists are created with brackets w/ elements separated by commas
Empty list is just [ ] (assign this to a variable and print)
COMP 116 | 26 AUG 2021 | Notes-4
Some interesting operations:
●
●
●
●
The `/` division returns a float (real) number. `3/6` returns 0.5
The `//` division returns an integer (whole) number and truncates the result. `3//6` returns 0,
8//6 returns 1
○ Aka cuts off to whole number- not rounding, so 11//6 is still 1, even though 11/6
would round to 2
Also, the `%` modulo operator returns the remainder `3 % 6` returns 3
/= aka Division Assignment: Divides the variable by a value and assigns the result to that
variable.
Method vs Function: methods are functions meant to apply to objects
Find Method
●
●
●
Use string method find to find an index
Find can search for a substring
x.find(‘thing you want to find’)
Count Method
●
●
Count amount of times a substring occurs in a string
○ Ex- how many periods?= how many sentences
○ x.count(‘.’)
3 apostrophes allows a long string to continue over multiple lines
.format(id(x))
●
Finds the address of x in a certain format
Strings Again
●
●
●
String slicing start:stop:step index
Example, to print something backwards do print(x[::-1])
***THIS WILL BE ON THE MIDTERM!** PLUCKING OUT CHARACTERS
_____Notes 5____
● TypeError is raised whenever an operation is performed on an
incorrect/unsupported object type
○ Ex. using () instead of []
● At this point, it is not important to know a lot about the differences between
these two, but we will explain more as we progress.
● Important to have line numbering on, read the message, and be able to correct it
● Unless you're an expert typist, assume you'll make typographical errors on the
quizzes and exams!
Comp 116 | 31 AUG. 2021
●
The three primitive Python built-in types of
○ numbers (integers and floats) but will not discuss complex, octal, or hexadecimal
numbers.
■ We will also discuss scientific notation format numbers.
●
○ string (letters, numbers, etc enclosed in single or double quotes)
○ booleans (True or False)
Data structure: form of collecting, storing, organizing, managing, and using a collection
of stuff
Lists vs. Strings
● Similarities:
● Lists and strings hold many "objects" in one variable
● Indexing - you can access individual objects using an index
● Slicing - you can access a subset of the objects using index slicing
● Differences:
● Lists can hold any kind of object in the same variable, strings only hold
characters
● Lists are mutable - can be changed after creation, but strings are
immutable - cannot be changed after creation
Strings
●
●
Strings are immutable- can’t overwrite the element and retain the same string
○ Aka if you assign a new string to the same name as an older string it changes its
id
Can only hold characters
Lists
●
●
●
●
●
●
●
Have length
You can select elements
Can be sliced
Can hold many things like: integers, strings, other lists, ect/
*mutable- elements can be mutable
So if you print a list, and then change it below and print the new list too (don’t
erase first list print), then both the old and new list will print
Del can be used to delete elements
**Basic List Operations:
FunctionsMethods-Functions
carry out task
and use
function()
-Methods are
similar but are
applied to
objects/
classes. Use
the .method()
format
Lists cont.
●
Lists made w/ square brackets w/ elements
separated by commas
○ Or: list((element1, element2))
● empty_list=[]
○ use to fill later, reserve space, ect
○ Ex. fill an empty list with a bunch of zeros to reserve a bunch of space
**Remember that we start from 0**
● Add elements to a list using the method append
○ Example: x.append(6) adds the element 6 to the list x
■ Only adds 1 element though
○ Or use + to add on
■ Ex. first_ten_positive_integers_list=11,12] adds 11 and 12 to the list
● The len function returns the number of elements in a list
● The range function uses start:stop:step to create a range of numbers
○ Does not include second number
○ Only thing required is stop
○ The list function can take the range and make a list
○
●
●
(example, use when you don’t want to type out a bunch of numbers, don’t want to
miss any, ect)
■ So: list(range(1,11)) creates a list of the first 10 positive integers
Multiplying a list makes that list appear that many times
○ Ex. print([0]*5) prints: [0,0,0,0,0}
If you set two lists equal to each other, and change one of them, you change both of
them
○ Unless you make a copy with the .copy method- then only the original or copy will
change (depending on what you do)
■ Ex: x=lst1.copy() makes a copy of the list titled “lst1”
Other Stuff
● Indexing- start:stop:step should become second nature before the exam
● print out intermediate results to validate your progress
● Fun fact, the origin of debugging cam from a dead moth that fired itself in an
early computer
● Offset: where an element starts
●
abs(): function to take the absolute value of a number
Notes-07 - NumPy array
●
●
●
●
●
●
●
●
●
my_array=np.array(lists_of_lists)
Build on the capabilities of strings and lists, but are
○ Faster
○ Handle only homogeneous elements (all elements of the same type)
○ Each list must also have the same number of elements
○ Have multiple ways to initialize
○ Allow Boolean selection of elements (this is important, highly useful, but sometimes
confusing!)
○ External library (called module) so must import
■ Import numpy as np
● np bc shorter
Module: external content libraries with built-in definitions and functions
np.array([ , ]) creates an array
np.arange(star,stop,step): numpy function that generates a numpy array
○ arr = np.arange(10) # Creates an nd-array [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]
○ Notice how the syntax is start,stop,step with *commas*, as opposed to colons
used in lists
*length* counts the number of elements, so starts at 1
Np-arrays are mutable
We style all numPy functions with np.function( )
Since numpy needs things to be the same type, will convert when possible
○
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Ex. in a list of integers, tring to change an element to be 37.5 will convert that
element to 37 instead
○ Some elements can’t be converted
Np.zeros: numPy function that creates an array of zeros
○ zero_arr = np.zeros(10) creates an array of 10 zeros
NumPy arrays have space between elements while lists have commas
Dtype: argument of type
○ zero_arr = np.zeros(10, dtype=int)
NumPy Zeros takes an optional parameter- can make an array of boolean Falses
○ Use variable five_falses to create numPy array of 5 Falses
○ Can use two equal equal statements to create an array of true
■ five_falses=np.zeros(5, dtype=bool)
■ another_five_falses=np.array([False] * 5)
■ print(five_falses==another_five_falses)
● == tests if each element is equal
Can initialize a numPy array w/ a Python list
np.any() Will tell you if any element of a Numpy array evaluates to true
np.all() Will tell you if all elements of a Numpy array evaluates to true
np.sum() takes the sum of items in the array
○ np.sum(array, axis = 0) takes the sum of the columns (aka collapses the rows into 1
row)
○ np.sum(y, axis = 1) takes the sum of the rows (aka collapses the columns into 1
column
np.mean() takes the mean of the items in the array. Note that there is also a np.average()
np.average() takes the average of the items in the array
np.std() takes the standard deviation of the items in the array
np.max() finds the maximum value element in the array
np.min() finds the minimum value element in the array
np.prd() computes the product of all the elements in the array
np.argmax() returns the offset of the maximum value
np.argmin() returns the offset of the minimum value
. np.count_nonzero() will return the values that are non-zero.
Operations on arrays- You can add two arrays of the same size or add a constant to all
elements of an array
np.diff(): makes an array of the differences of the elements in the original array
Notes 08 - Matplotlib.pyplot for plotting
●
●
●
Use mtplotlib to plot Numpy arrays
**x and y values are arrays***
PyPlot plt function plots a line from a numpy array of y-values
○ Can use x-values too, but don’t have to when plotting lines of equally spaced
data
○ The plot command can take x-values as an argument
○ Plots print out with the figure id so it can be reused
●
●
●
●
●
●
●
●
●
●
just type pyplot.x and then hit <tab> to see the list of functions available that effect the x
values.
plt.xlabel takes a string that will be the x-axis label
plt.xticks takes an array of labels on the ticks on the x-axis
○ plt.xticks(range(len(list_of_labels)), list__of_labels, rotation=’vertical’)
plt.ylabel takes a string that will be the y-axis label
plt.yticks takes an array of labels on the ticks on the y-axis
plt.title takes a string that will be the plot's title
Plt.legend makes a legend
○ plt.legend(('Coal', 'Natural gas', 'Petroleum'), loc='upper right')
The optional parameter fmt is a convenient way for defining basic formatting like color,
marker and linestyle.
○ >>> ply.plot(x, y)
# plot x and y using default line style and color
● >>> ply.plot(x, y, 'bo') # plot x and y using blue circle markers
● >>> ply.plot(y)
# plot y using x as index array 0..N-1
● >>> ply.plot(y, 'r+') # ditto, but with red plusses
● plt.plot(x_values, y_values, ‘r.’) yields red dots
○ ‘b’ for blue, ‘y. For yellow, ‘g’ for green
○ ‘.’ for dot, ‘D’ for rhombus, ‘d’ for diamond, ‘+’ for plus, ‘o’ for circle
● To overlay dots with a line, just call plt.plot again
Gcf (Get Current Figure)
plt.bar() creates a bar graph
○
Sometimes, use x_value=np.arange(len(y_value)
Change and Relative Change
●
●
Change or difference =the 2nd number minus the 1st number
○ The change from 100 to 101 is 1. The change from 101 to 102 is 1
○ f-i
Relative change =the change divided by the first value
○ The relative change from 100 to 101 is 0.01
Notes-09 R-Plotting
●
●
●
Np.sine:
No.cosine:
Np.pi: outputs pi
Notes-10 Multi-Arrays
●
●
●
Aka N-dimensional arrays
Spreadsheet can be considered a 2-d array, and any table with rows and columns
/= (Division Assignment): Divides the variable by a value and assigns the result to that
variable
Referencing individual elements in 2-d arrays
●
●
●
●
●
●
Colons for start:stop:step in any dimension/axis
Commas to separate references to the dimensions/axes (aka to signify a certain row or
column)
○ print(array_variable[0,4]) gives you the row w/ offset 0 (1st row) and column w/
offset 4 (5th column)
○ But sometimes you want to instead make a variable equal to the row offset you
want, and use that instead of counting the rows- aka use a design pattern so you
can reuse it
○ print(array_row[4,]) gives you the 5th row
○ print(array_column[:,0]) gives you the 1st column
.shape property that describes the number of elements in each dimension
○ Comma delimited
○ Outputs as a tuple- immutable array formatted w/ parenthesis and commas
○ Ex. four_by_five_array.shape → (4, 5 )
.shape[0] gives number of elements in first dimension, # of rows
.shape[1] gives number of elements in the second
dimension, # of columns
Property- just reference it
○ No parenthesis
●
Careful, Axis 0 goes down each column, Axis 1 goes through each row
Notes-11 Boolean Selectors
●
●
●
●
●
●
Boolean logical operators: and, or, and not
Booleans resulting from comparisons: ==, !=, >, <, >=, <=
Boolean element by element operations: &, |, and ~
Booleans for array indexing: use an array of booleans that is the size of the dimension
being indexed
bool() : function that outputs the boolean value
Conditional Operators
●
●
●
●
●
●
greater than: >
less than: <
greater than or equal to: >=
less than or equal to: <=
equal to: ==
Not equal to: !=
Boolean Review
●
●
Either True or False
○ May be treated as zeros (F) and ones (T)
Count the number of times something is true using np.sum (since True=1)
○ Ex. no_years_decrease=np.sum(nc_coal_change<0) where nc_coal_change is
an array of the differences between nc_coal from year to year
Logical Operations
and and or evaluates expression from left
to right.
● Logical AND: True if both the operands are
true
○ with and, if all values are True, returns the last evaluated value. If
any value is False, returns the first one.
● Logical OR: True if either of the operands is true
○ or returns the first True value. If all are False, returns the last value
Membership Operators
● in evaluates to True if it finds a variable
in a specified sequence and False
otherwise.
●
not in evaluates to False if it finds a variable in a sequence, True
otherwise.
Identity Operators
● is evaluates to True if the variables on
either side of the operator point to the
same object and False otherwise
● is not evaluates to False if the variables
on either side of the operator point to the same object and True otherwise
Finding stuff
● Use np.max() to find True bc it is the higher value
● Use np.min() to find False
element Operators
● & : and
● | : or
● ~ : complement (aka reverse True and False
○ If a and b are np. arrays : a=np.array([False, True, False, True])
b=np.array([False, False, True, True])
■ a&b = a and b
● [False False False True]
■ a|b
● [False True True True
■ ~a
● [True False True False]
●
Boolean Selectors/Masks
●
use an array of booleans that is the size of the dimension being indexed to index an
array
○ abc_boolean_mask = (a_thru_z_arr=='a’) | (a_thru_z_arr=='b') |
(a_thru_z_arr=='c') is a boolean mask that makes only a,b, and c True
○ a_thru_z_by_numbers[:,abc_boolean_mask] prints only the a, b, and c columns
■ If a_thru_z_arr were the rows, a_thru_z_by_numbers[abc_boolean_mask]
prints just the a,b, and c rows
Notes 12 Numpy Examples
range() vs np.arange()
●
range(): python sequence type
○ Generates integers from one value to another in steps (start:stop:step)
●
●
●
●
●
●
●
np.arange(): numpy function generating a list of numbers (floats or integers) from one
value to another in steps (start:stop:step)
np.median(): returns the middle number of an array
○ Returns average of two middle values if there’s an even number of elements
.split(): method that makes a list out of a variable where a specified separator marks
each new element
○ whatever ‘s in the () after “.split” is called the separator
○ Ex.: my_text_as_list = my_text.split(“.”) makes a list of each sentence in the text
.pop(): method that cuts off the last element from a list
.squeeze(): method that makes an array one dimensional (aka if it’s previously x1 or 1x)
○ ex. if array.shape() is (10,1), the shape of array.squeeze() would be (10,)
○ Apply when setting the variable
np.diff(): makes an array of the differences of the elements in the original array
Count the number of times something is true using np.sum (since True=1)
○ Ex. no_years_decrease=np.sum(nc_coal_change<0) where nc_coal_change is
an array of the differences between nc_coal from year to year
Notes 13- Functions
Stuff from WS5
●
●
●
Parts of a function: name(argument)
Common ones: len, range
Writing own function syntax:
Def functionname (argument1, argument2) :
●
●
Return function_output
^ must be indented because it is under the code. More code may be indented
more in more complicated functions
Might want to set the output equation to a variable, or just put it in next to return
Ex:
def functionname(number):
output1 = number*5
output2 = number
return (output1, output2)
print(functionname(56)[0])
print(functionname(56)[1])
●
●
Global variable: variable defined outside of a function
Local function: variable only defined inside of a function
Stuff from Class
●
Functions: named/labeled chunks of code that are designed to do a specific job
○ Designed to be reused
○ Are “called” when used in expressions
Defining a function
●
●
●
●
●
Signature: the 1st line of a function that defines its name, return type, and parameters
(optional)
○ Starts with the def keyword, then the name
○ Parameters: inputs to the functioned entered in parentheses after the function
name
■ def name(parameters):
■ Parameters are not always necessary
■ Functions may be defined with default parameters
■ Even if no parameters, still need empty ()
After the signature is usually a comment documenting what the function does
○ Docstring: A special comment located at the beginning of a function saying what
it does (helps documentation)
■ ‘ ’ ’ this is a docstring‘ ’ ’
Returns
○ Functions can have any number of return statements- including 0
○ Return values are written after the keyword return
○ The return statement can return a value or just return
■ print() is a function that just returns
■ np.array() is a function that returns an array
*all statements executed when a function is called are indented wrt the signature
○ Wrt: with regards to
Function ends when indentation returns to same level as the def statement or execute a
return statement
***Never use global variables in a function!!!****
*the variable used when calling a function doesn’t have to match the name of the parameter*
Notes 14- Functions Practice
Adding Text Files
●
Text file: file that ends in .txt
●
After the code shown right->
do:
my_text = load_text_file()
print(my_text)
Iterables
Iterable: any object, like a list or a string, that can have its members returned one at a time
Notes 15 - For Loops
●
for loops are used when you want to repeatedly execute a piece of code a fixed number
of times
Control Flow
●
●
●
●
●
Looping in python:
○ For loops
○ While loops
Conditional statements in python
○ if-elif-else
In many computations, you want to perform a set of operations on a number of items or
a given number of times. ex○ Find the squares of a list of numbers
○ Compute the factorial of n by multiplying the first n integers
○ Read line-by-line from a file until you finish reading all lines in the file.
A For loop is the control structure that is used to accomplish this.
○ Iterate over a sequence like a list, array, string, lines in a file etc.
○ Think of the for loop as repeating the same process for each member of a list,
array, string, etc
The general syntax for a For loop is:
○
for <var> in <iterable>:
○
block of code
■ #Note the indentation. It is necessary.
■
■
●
●
<var> = the iteration variable that is updated every time through the loop.
<iterable> = a python object called an iterable. Anything that has a
sequence is an iterable.
■ The indentation marks the block of code to be executed as part of the For
loop.
Loop invariant:logical assertion about a program loop that is true before (and after) each
iteration
○ Useful to prove correctness of loops (we won’t do that)
General ideas about for loop creation:
○ You will typically set some things up at the beginning of the loop.
○
○
●
●
●
●
●
●
●
You will loop a finite number of times.
You will compute something that
you will use later.
Factorial Example:
○ The loop invariant for the
factorial for loop is fact is equal
to factorial of iteration variable i
You can nest for loops inside each other
One
word at
a time:
.readline( ): a method that reads a line
of text from a file
○ Each time you run it it reads
the next line
○ So if you do a for loop for i
range(n): print(i), it will print n
lines of the poem
Continue statement: terminates this
iteration of the loop and the next
statement that gets executed is the
top of the loop.
○ Aka, the next part of the loop won’t apply the stuff above it but will keep the
variables in the same place
○ (so if j was originally 0 but over the course of the loop is j+=1, whatever j equals
when the continue statement runs will carry over to the next statement)
break statement: terminates the loop and the next statement that gets executed is the
one immediately following the loop body. With nested for loops, the break statement
terminates only the loop in which it occurs.
Factorial: that number multiplied by all of the positive numbers that precede it. (for
non-negative numbers) ---->
Notes 16- If-Else
●
●
●
If statements: statement executed if True
Else statements: statement executed if the if statement executes as false
Elif statement: additional if statements
after the initial if.
○ If more than one apply and no and
statement is used, the statement
higher up in the code will be
executed
Notes 17- Axes3D-PDB
●
●
●
●
To plot a three dimensional plot on a two
dimensional Matlab plot use Axes3d:
○ create a pyplot figure,
○ pass that to Axes3d, and
○ use that to plot x, y, and z axes.
fig = plt.figure(figsize=(4,4)
ax = plt.axes(projection='3d')
ax.plot(xline, yline, zline, c= 'r', marker = 'o') # x, y, and z are 1D arrays
Notes 18- Read CVS
Reading files of data separated by commas (or other separators)
●
●
●
●
●
●
●
Many types of data are tabular.
○ each row can be an observation, and each column is something that is measured
Many such data sets are stored in a text file with one row of data in each line.
Each of the values in a row is separated by a separator.
○ comma, ",".
○ tab
Typically each row in the text file has the same number of values
If a value is missing, the separator for that value is still included
Such files are called, CSV files for Comma Separated Values file
○ If the Tab is the separator, such files are called TSV files
○ But CSV is often used generically for all
separators
Very frequently, CSV files include a number of
header lines at the beginning of the file
○ provide information about the data
○ A very common header line provides a
brief name for each column of data
NumPy loadtxt
●
●
np.loadtxt: numpy function that reads data
○ Data in file must be the same type
○ np.loadtxt(csv_filename, delimiter=',')
If the data is separated by commas, specify delimiter=','
Other ways to read data
●
.split() with a separator specified to pull the fields
apart can pull out each line at a time
Data Science Process
1.
2.
3.
4.
5.
Ask an interesting questions
Get the data
Explore the Data
Model the data
Communicate and visualize the results
Notes 19- More PDB
●
●
.strip(): returns a copy of the string by removing both the leading and the trailing
characters
message = ' Learn Python '
# remove leading and trailing whitespaces
print('Message:', message.strip())
# Output: Message: Learn Python
PDB files are fixed bracket
Assignment 2
.lower(): converts a string to lowercase
.isalnum(): returns true if all the characters in a string are letters or numbers
.isalpha(): returns true if all the characters in a string are letters
.endswith(): returns true if the string ends with the specified value
Notes-20 Date Objects
●
Object-Oriented Programming (OOP)
○ focus is on creating reusable patterns of code, in contrast to procedural
programming, which focuses on explicit sequenced instructions.
●
●
●
●
Class — is a template for an object. This defines a set of attributes that will characterize
any object that is instantiated from this class.
Object — is an instance of a class. This is the realized version of the class.
.reshape(): method to set the dimensions of an array
○ (rows, columns)
○ Ex. arr=np.arange(12)
arr= arr.reshape((3,4))
print(arr)
-> [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
dir(): function that gives a list of the properties and methods associated with an object
Dates
●
●
●
●
●
●
●
●
●
from fime import date : keywords that give us access to the date object
date(): function that creates a date object
○ (Year, month, day)
.weekday(): method that gives the day of the week
○ 0 is Monday, 1 is Tuesday, etc
○ If using this, can use to find the right index in a list of the days of the week
■ day=array_of_day_names[datetime.weekday()]
○ Easier to just use .strftime(‘%A)
.year: method that gives the year of the datetime
.month: method that gives the month
.day: method that gives the year
.strftime("%A"): method that turns a tuple into a string
○ A=weekday
○ B=month
○ D= MM/DD/YYYY
○ F=yyyy-mm-dd
○ G=year
○ Y=year
How many times a certain date
occurred→
How many days until→
Notes 21- Try Except
●
●
Run-time errors occur during
execution
Try: Except: Blocks to determine
whether there is a run-time error
in your code
●
●
●
●
○ try:
# code you want to execute
○
○ except:
# code to execute if there is an error in the `try` code block.
○
Print the type of error by making e an object and
finding its type
○ except Exception as e:
○
print('\tError while computing 1/i :', e)
○
print(type(e))
Might also put a break at the end to stop the code
if there is an error
input(): function that outputs an entry box that
allows the user to input content into the code
To test if something is an integer, do:
○ Try:
x= int(x)
Except ValueError:
print(‘Not an integer’)
“/” aka the escape character
●
●
“/n” creates a new line in a string
“/t” creates a tab in a string
Notes 23- Sets
●
Sets: python data structure, unordered collection of unique
items
○ Usually explained as venn diagrams- some set
members are in both sets (set intersection), some
are only in one set, and some can be in either (aka
can be in one or the other or both) (set union)
○ Can only have ONE of each item
○ immutable
Creating Lists
●
●
●
●
Can instantiate a python set
○ set(): function that creates a set
.add(): method that adds items to a set
{ , } : the syntax for a set
○ Ex. my_set={0,1,2,3}
Can turn other data structures into sets
○ Lists, np.arrays, strings
●
●
●
●
●
●
●
●
●
●
Remember, it’s unordered, may come out in unexpected ways
○ See method 4, how the string got all messy? Fix this by putting it in brackets like
a list
Not immutable- therefore can’t subscript into a set, but you “look through” a set- aka can
be used in a for loop
○ Set[2] will NOT work
○ For ch in my_set: ‘ does work
sorted(): overloaded function that sorts objects
Set union: the set of every element that is in either or both sets
○ AUB
○ .union()
■ set_union_set1_or_set2 = set1.union(set2)
■ print("Union, whales or fish:", fish.union(whales))
● Prints what is in fish, whale, or both
Set intersection: the set of all the elements that are common to both sets
○ A⋂B
○ .intersection()
■ set1_and_set2 = set1.intersection(set2)
■ print("Intersection, whales and fish:", whales_and_fish)
● Prints what is in both whale and fish
.difference(): set method that returns element that are only in the one set
○ print(whales.difference(fish))
■ Prints only what is in whales but is not in fish
/ : lets you continue code down a line so you don’t have to scroll a bunch
.remove(): removes item from a set
○ If the item isn’t in the set, an error occurs
.discard(): removes item from a set
○ If the item isn’t in the set, nothing happens
np.unique(): only keeps unique items in an array
Unpacking
●
*: unpacks things in a data set
○ arr = np.arange(10)
○ print(arr)
■ [0 1 2 3 4 5 6 7 8 9]
○ print(*arr)
■ 0123456789
Comprehensions
●
Comprehensions: constructs that allow sequences to be built from other sequences
○ Aka, data structures made from other data structures, using things like for and if
functions
○
List comprehensions: Square brackets containing an expression followed by a for
clause, then zero or more for or if clauses
■ Ex. variable = [out_exp for out_exp in input_list if out_exp == 2]
Notes 24- Sets Counting
with-as
● with: keyword used within a function that
● as:
○ with open(filenm, ‘r’) as csvfile:
○ csv.reader(): method that reads csv files
A3
#strings have several methods that you will find useful for this assignment
○ s = 'aBc'
○ t = s.upper() # make it all upper case
○ u = t.lower() # make that all lower case
● # we can convert from strings to integers or floats
○ s = '123' # a string with the characters 1, 2, and 3
○ i = int(s) # now it is the number 123 (one hundred twenty three)
○ # note that they look the same when we print them but their types are different
● t = '3.14' # a string with characters 3, period, 1, and 4
● f = float(t) # a floating point number 3.14
Notes 24- Dictionaries
●
●
●
●
●
Dictionary: collection of objects which are indexed by a key
○ Data is organized as a collection of key:value pair
○ Unordered
○ Changeable
Key: any immutable (unchangeable) object
○ Ex. string, tuple, date
○ Must be immutable bc they are hashed to speed things up
Dictionaries are created by:
○ Curly brackets: x = { }
■ Ex. dict1 = {'apple':'fruit', 'carrot':'vegetable'}
○ dict( ): function that creates a dictionary
Keys can be used to index the dictionary:
You can also create an empty dictionary and add the key value pairs later○ class_dict = dict()
○ class_dict['semester'] = 'Fall'
○
○
○
●
●
●
●
●
●
class_dict['year'] = 2020
class_dict['course no'] = 'COMP116'
print('class_dict=',class_dict)
■ class_dict= {'semester': 'Fall', 'year': 2020, 'course no': 'COMP116'}
.update(): method that lets you add multiple items to a dictionary or a set
.keys(): method that lists the keys
Tuple: objects grouped by parenthesis
○ Immutable
**In assignment 4 you might make the city name be the key to a dictionary of where
complaints were issued.
**Also in assignment 4 you might make the coordinates the key to where a blocked driveway
occurred.
.get(keyname, value): method that returns the value of the specified key, or, if that key
doesn’t exist, a value to return instead
○ Including a default value is optional-- if none added, will return None if the key
doesn’t exist
Notes 26- Sets and Dictionaries
●
●
●
●
Dict.fromkeys: creates a new dictionary using keys from the provided list and setting all
associated values to the provided value
Shift + Tab : trick that brings up function and definition
Command + / :comments out big swatch of code
Camelback style: naming convention where first word is lowercase and the first letter of
subsequent words are uppercase
Notes 28- API
●
●
You can put dictionaries in dictionaries using lists
○ data = '{"employees":[{"firstName":"John",
"lastName":"Doe"},{"firstName":"Anna", "lastName":"Smith"},{"firstName":"Peter",
"lastName":"Jones"}]}'
.load(): turns a JSON formatted value into a python object
Notes 29- Pandas Intro
Pandas
●
●
Pandas: library for DataScience that builds on NumPy
Data structures- Series, DataFrames, Index
Series
●
●
●
●
Series: 1-D np.array with axis labels
Generalization of both 1-d numpy arrays and dictionaries
Can have flexible the indices (aka you can easily change the indices using lists):
data = pd.Series([0.25, 0.5, 0.75, 1.0],
index=['a', 'b', 'c', 'd'])
print(data)
a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64
pd.Series(dictionary): turns a different data structure (like a dictionary) into a Series
DataFrame
●
●
●
●
●
●
●
●
●
DataFrame: 2-D labeled array with both flexible row indices and flexible column names
DataFrame can be thought of either as a generalization of a NumPy array, or as a
specialization of a Python dictionary
A 2-D numpy array is an ordered sequence of aligned 1-D columns, a DataFrame is a
sequence of aligned Series objects
○ Aligned: share the same index
pd.DataFrame(): pandas method that creates a DataFrame
DataFrame=pd.DataFrame9{ ‘1st column label’: 1st column series, ‘2nd column label’:
2nd column series})
○ The row labels are the indices of the 2 series, which should be the same
You can also create a dataframe from 2-d np.array:
dataframe= pd.DataFrame(array, columns=[‘label 1’, ‘label 2’], index=[‘index
1’,‘index 2’])
Use the column labels as the index to get the column
Use the loc indexer attribute to get the rows:
○ Dataframe.loc[‘index name’]
.iloc[]: gets the rows using implicit numbering (aka instead of using the labels as indices,
use the default number indices)
Pandas Index Object
●
●
●
Index object can be thought of as either an immutable array or an ordered set
Index Object: immutable sequence used for indexing and assignment, the basic object
storing axis labels for all pandas objects
Can find intersections and unions
○ Reminder:
○ .intersection(): method that returns only the items that exist in both/all sets
○
.union(): method that returns all the items, excluding duplicates (items in one, the
other, or both)
More Pandas
●
●
●
●
●
●
●
●
●
pd.read_csv(): function that reads csv data
df.head(n): returns the first n rows of the dataframe
○ Defaults to 5
data_a=dataframe[dataframe['column label 1'] == 'data a in that column']
data_a.head()
Outputs the row that includes column data a
.get_loc(): get integer location, slice or boolean mask for requested label
○ Ex: og_dataframe = pd.read_csv('Datasets.csv’)
○ abc = og_dataframe[og_dataframe['Column label 1] == 'abc']
○ data_start_index = abc.columns.get_loc('a')
Outputs index of ‘a’ in the row containing ‘abc’
Convert dataframe to array using np.array(), might have to use .squeeze()
.set_index(): pandas method to change the index from a sequence of numbers to labels
from a column in the dataset
○ It’s okay if the labels are no unique for each row
○ When that column becomes the index it is no longer a column in the dataframe
.diff(): pandas method for finding the difference
.to_numpy(): pandas method for converting a dataframe to numpy
.pct_change(): pandas method that finds the relative change between the current and a
prior element
.​​idxmax(): pandas method for finding maximum
Download