lOMoARcPSD|10729405 CSC148Notes - Lecture notes all Introduction to Computer Science (University of Toronto) StuDocu is not sponsored or endorsed by any college or university Downloaded by michael ayad (michael.maged2014@gmail.com) lOMoARcPSD|10729405 1/9/2021 CSC148 Lecture Notes CSC148 Lecture Notes Diane Horton and David Liu See the Lectures page on Quercus for the readings assigned each week. You will eventually be responsible for all readings listed here, unless we clearly indicate otherwise. 1. Recapping and Extending Some Key Prerequisite Material 1.1 1.1The ThePython PythonMemory MemoryModel: Model:Introduction Introduction 1.1 The Python Memory Model: Introduction 1.2 1.2The ThePython PythonMemory MemoryModel: Model:Functions Functionsand andParameters Parameters 1.2 The Python Memory Model: Functions and Parameters 1.3 1.3The TheFunction FunctionDesign DesignRecipe Recipe 1.3 The Function Design Recipe 1.4 1.4Python PythonType TypeAnnotations Annotations 1.4 Python Type Annotations 1.5 1.5Testing TestingYour YourWork Work 1.5 Testing Your Work 1.6 1.6Choosing ChoosingTest TestCases Cases 1.6 Choosing Test Cases 1.7 1.7Introduction Introductionto toProperty-based Property-basedTesting Testing 1.7 Introduction to Property-based Testing 2. Object-Oriented Programming 2.1 2.1Introduction Introductionto toObject-Oriented Object-OrientedProgramming Programming 2.1 Introduction to Object-Oriented Programming 2.2 2.2Representation RepresentationInvariants Invariants 2.2 Representation Invariants 2.3 2.3Designing DesigningClasses Classes 2.3 Designing Classes 2.4 Inheritance: Introduction and Methods 2.4 2.4Inheritance: Inheritance:Introduction Introductionand andMethods Methods 2.5 2.5Inheritance: Inheritance:A A ributes ributesand andInitializers Initializers 2.5 Inheritance: A ributes and Initializers 2.6 2.6Inheritance: Inheritance:Thoughts Thoughtson onDesign Design 2.6 Inheritance: Thoughts on Design 2.7 2.7The Theobject object objectClass Classand andPython PythonSpecial SpecialMethods Methods 2.7 The object Class and Python Special Methods 3. Abstract Data Types 3.1 3.1Introduction Introductionto toAbstract AbstractData DataTypes Types 3.1 Introduction to Abstract Data Types 3.2 3.2Stacks Stacksand andQueues Queues 3.2 Stacks and Queues 3.3 3.3Exceptions Exceptions 3.3 Exceptions 3.4 3.4Analysing AnalysingProgram ProgramRunning RunningTime Time 3.4 Analysing Program Running Time 4. Linked Lists 4.1 4.1Introduction Introductionto toLinked LinkedLists Lists 4.1 Introduction to Linked Lists 4.2 4.2Traversing TraversingLinked LinkedLists Lists 4.2 Traversing Linked Lists 4.3 4.3Mutating MutatingLinked LinkedLists Lists 4.3 Mutating Linked Lists 4.4 4.4Linked LinkedLists Listsand andRunning RunningTime Time 4.4 Linked Lists and Running Time https://www.teach.cs.toronto.edu/~csc148h/winter/notes/ Downloaded by michael ayad (michael.maged2014@gmail.com) 1/2 lOMoARcPSD|10729405 1/9/2021 CSC148 Lecture Notes 5. Introduction to Recursion 5.1 5.1Motivation: Motivation:Adding AddingUp UpNumbers Numbers 5.1 Motivation: Adding Up Numbers 5.2 5.2Nested NestedLists: Lists:A ARecursive RecursiveData DataStructure Structure 5.2 Nested Lists: A Recursive Data Structure 6. Trees and Binary Search Trees 6.1 6.1Introduction Introductionto toTrees Trees 6.1 Introduction to Trees 6.2 6.2A ATree TreeImplementation Implementation 6.2 A Tree Implementation 6.3 6.3Mutating MutatingTrees Trees 6.3 Mutating Trees 6.4 Introduction to Binary Search Trees 6.4 6.4Introduction Introductionto toBinary BinarySearch SearchTrees Trees 6.5 6.5Binary BinarySearch SearchTree TreeImplementation Implementationand andSearch Search 6.5 Binary Search Tree Implementation and Search 6.6 6.6Mutating MutatingBinary BinarySearch SearchTrees Trees 6.6 Mutating Binary Search Trees 6.7 6.7Binary BinarySearch SearchTrees Treesand andRunning RunningTime Time 6.7 Binary Search Trees and Running Time 6.8 6.8Expression ExpressionTrees Trees 6.8 Expression Trees 7. Recursion Wrap-up 7.1 7.1Recursive RecursiveSorting SortingAlgorithms Algorithms 7.1 Recursive Sorting Algorithms 7.2 7.2Efficiency Efficiencyof ofRecursive RecursiveSorting SortingAlgorithms Algorithms 7.2 Efficiency of Recursive Sorting Algorithms CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/ Downloaded by michael ayad (michael.maged2014@gmail.com) 2/2 lOMoARcPSD|10729405 1/9/2021 1.1 The Python Memory Model: Introduction 1.1 The Python Memory Model: Introduction Before we dive into the CSC148 material proper, we’ll review a few fundamental concepts from CSC108. We start with one of the most important ones: how the Python programming language represents data. Data All data in a Python program is stored in objects that have three components: id, type, and value. We normally think about the value when we talk about data, but the data’s type and id are also important. The id of an object is a unique identifier, meaning that no other object has the same identifier. Often Python uses the memory address of the object as its id, but it doesn’t have to; it just has to guarantee uniqueness. We can see the id of any object by calling the id function: >>> id(3) 1635361280 >>> id('words') 4297547872 We can see the type of any object by calling the type function: >>> type(3) <class 'int'> >>> type('words') <class 'str'> An object’s type determines what functions can operate on it. For example, we can call the function round on numeric types (such as int and float), but not on strings: >>> round(2) 2 >>> round(3.1419) 3 >>> round('hello!') Traceback (most recent call last): File "<input>", line 1, in <module> TypeError: type str doesn't define __round__ method Types also determine the objects on which we can use built-in Python operators. 1 For https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part1.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/14 lOMoARcPSD|10729405 1/9/2021 1.1 The Python Memory Model: Introduction 1 We’ll see later in the course that most of Python’s operators are actually implemented using functions. example, the + operator works on two integers, and even on two strings, but is not defined for adding an integer and a string together: >>> 3 + 4 7 >>> 'hey' + 'hello' 'heyhello' >>> 3 + 'hello' Traceback (most recent call last): File "<input>", line 1, in <module> TypeError: unsupported operand type(s) for +: 'int' and 'str' >>> 'hello' + 'goodbye' 'hellogoodbye' Finally, you are already familiar with accessing the value of an object, which we call evaluating the object. For example, this is what happens when we type an object into the Python terminal: >>> 3 3 >>> 'hello' 'hello' Variables All programming languages have the concept of variables. In Python, a variable is not an object, and so does not actually store data; it stores an id that refers to an object that stores data. This is the case whether the data is something very simple like an int or more complex like a str. Consider this code: >>> x = 3 >>> x 3 >>> type(x) <class 'int'> >>> id(x) 1635361280 >>> word = 'bonjour' >>> type(word) <class 'str'> >>> id(word) 4385008808 The state of memory after the above piece of code executes is this: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part1.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/14 lOMoARcPSD|10729405 1/9/2021 1.1 The Python Memory Model: Introduction We write the id and type of each object in its upper-left corner and upper-right corner, respectively. The actual object id reported by the id function has many digits, and its true value isn’t important; we just need to know that each object has a unique identifier. So for our drawings we make up short identifiers such as id92. Notice that there is no 3 inside the box for variable x. Instead, there is the id of an object whose value is 3. We say that x refers to this object, or that x references this object. The same holds for variable word; it references an object whose value is 'bonjour'. Here are a couple of other things to notice: Since we did not write the code for the class that defines the str type, we know nothing about what data members it uses to store its contents. So we just write the value 'bonjour' inside the box. This is a perfectly fine abstraction. We didn’t draw any arrows. Programmers often draw an arrow when they want to show that one thing references another. This is great once you are very confident with a language and how references work. But in the early stages, you are much more likely to make correct predictions if you write down references (you can just make up id values) rather than arrows. We can reassign a variable so that it refers to a new value. For instance: >>> cat = 'Gilbert' >>> id(cat) 4348845448 >>> cat = 'Chairman Meow' https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part1.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/14 lOMoARcPSD|10729405 1/9/2021 1.1 The Python Memory Model: Introduction >>> id(cat) 4355636784 In this example, there is only one variable named cat. At first it contains the id of a str object containing the string Gilbert, and then the id in it changes to be that of a str object containing the string Chairman Meow. These examples are extremely simple, but having an accurate image will be necessary in order to avoid bugs in the much more complex code that we will write this term. Objects have a type, but variables don’t We saw above that Python will report to us what type(word) is. But it is really reporting the type of the object that word refers to. The variable word itself has no type. 2 In fact, Python 2 This is different from many other languages, such as Java and C, where every variable has a type. doesn’t mind if we make word refer now to a different type of object, although this is almost surely a bad idea. >>> word = 'adieu' >>> type(word) <class 'str'> >>> word = 42 >>> type(word) <class 'int'> A brief aside on assignment statements and evaluating expressions You’ve wri en code much more complex than what’s above, but may not have had to think in detail about all the small steps that Python has to undertake to execute even a simple assignment statement. These details are foundational for writing and debugging the more complex code you will work on in csc148. So let’s pause for a moment and be explicit about two things. Executing an assignment statement This is what Python does when an assignment statement is executed: 1. Evaluate the expression on the right-hand side, yielding the id of an object. 2. If the variable on the left-hand-side doesn’t already exist, create it. 3. Store the id from the expression on the right-hand-side in the variable on the lefthand side. Evaluating an expression https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part1.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/14 lOMoARcPSD|10729405 1/9/2021 1.1 The Python Memory Model: Introduction An assignment statement always has an expression on the right-hand side. Expressions can occur in other places also, for instance as arguments to a function call. When an expression is encountered, it must be evaluated. This always yields a value, which is the id of an object. This is what Python does when an expression is evaluated: If the expression is a variable, find the variable. If it doesn’t exist, this is an error. If it does exist, the value of the expression is the id stored in that variable. If the expression is a “literal value”, such as 176.4 or 'hello', create an object of the appropriate type to hold it. The value of the expression is the id of that object. If the expression is an operator, such as + or %, evaluate its two operands, apply the operator to them, and create a new object of the appropriate type to hold the result. The value of the expression is the id of that object. There are additional rules for other types of expression, but these will do for now. Mutability and aliasing Immutable data types Some data types in Python (e.g., integers, strings, and booleans) are immutable, meaning that the value stored in an object of that type cannot change. For example, suppose we have the following code: >>> prof = 'Diane' >>> id(prof) 4405312456 >>> prof = prof + ' Horton' >>> prof 'Diane Horton' >>> # The old str object couldn't change, so Python made a new >>> # str object for the variable prof to refer to. Since it's >>> # a new object, it has a different id. >>> id(prof) 4405308016 We did not change the value stored in the object—we couldn’t, since strings are immutable —but rather changed what prof refers to, as shown here: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part1.html Downloaded by michael ayad (michael.maged2014@gmail.com) 5/14 lOMoARcPSD|10729405 1/9/2021 1.1 The Python Memory Model: Introduction We will use the convention of drawing a double box around objects that are immutable. Think of it as signifying that you can’t get in there and change anything. Notice that in the example above we reassigned the variable prof—that is, we made it refer to a new str object— and we could do this even though strings are immutable. Regardless of the mutability of any objects, we can always reassign a variable. Mutable data types More complex data structures in Python are mutable, including lists, dictionaries, and user-defined classes. Let’s see what this means with a list: >>> x = [1, 2, 3] >>> x [1, 2, 3] >>> type(x) <class 'list'> >>> id(x) 50706312 Below, we perform two mutating operations on x, and check that its id hasn’t changed. Note that even changing the list’s size doesn’t change its id! >>> x[0] = 1000000 >>> x [1000000, 2, 3] >>> id(x) 50706312 https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part1.html Downloaded by michael ayad (michael.maged2014@gmail.com) 6/14 lOMoARcPSD|10729405 1/9/2021 1.1 The Python Memory Model: Introduction >>> x.extend([10, 20, 30]) >>> x [1000000, 2, 3, 10, 20, 30] >>> id(x) 50706312 Here’s what’s going on in memory: The lines x[0] = 1000000 and x.extend([10, 20, 30]) changed the value of the list object that x refers to. We say that these lines mutate the object that x refers to. (They also cause the creation of four new objects of type int.) Aliasing When two variables refer to the same object, we say that the variables are aliases of each other. 3 3 My dictionary says that the word “alias” is used when a person is also known under a different name. For example, we might say “Eric Blair, alias George Orwell.” We have two names for the same thing, in this case a person. Consider the following Python code: >>> x = [1, 2, 3] >>> y = [1, 2, 3] >>> z = x https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part1.html Downloaded by michael ayad (michael.maged2014@gmail.com) 7/14 lOMoARcPSD|10729405 1/9/2021 1.1 The Python Memory Model: Introduction x and z are aliases, as they both reference the same object. As a result, they have the same id. You should think of the assignment statement z = x as saying “make z refer to the object that x refers to.” After doing so, they have the same id. >>> id(x) 4401298824 >>> id(z) 4401298824 In contrast, x and y are not aliases. They each refer to a list object with [1, 2, 3] as its value, but they are two different list objects, stored separately in your computer’s memory. This is again reflected in their different ids. >>> id(x) 4401298824 >>> id(y) 4404546056 Here is the state of memory after the code executes: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part1.html Downloaded by michael ayad (michael.maged2014@gmail.com) 8/14 lOMoARcPSD|10729405 1/9/2021 1.1 The Python Memory Model: Introduction Aliasing and mutation Aliasing is often a source of confusion for beginners, because it allows “action at a distance”: the modification of a variable’s value without explicitly mentioning that variable. Here’s an example: >>> >>> >>> >>> x = [1, 2, 3] z = x z[0] = -999 x # What is the value? The third line mutates the value of z. But without ever mentioning x, it also mutates the value of x! We call this a side effect. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part1.html Downloaded by michael ayad (michael.maged2014@gmail.com) 9/14 lOMoARcPSD|10729405 1/9/2021 1.1 The Python Memory Model: Introduction Imprecise language can lead us into misunderstanding the code. We said above that “the third line mutates the value of z”. To be more precise, the third line mutates the object that z refers to. Of course we can also say that it mutates the object that x refers to—they are the same object! A clear diagram like this can really help: The key thing to notice about this example is that just by looking at the third line of code, z[0] = -999, you can’t tell that x has changed; you need to know that on a previous line, z was made an alias of x. This is why you have to be careful when aliasing occurs. Contrast the previous code with this: >>> >>> >>> >>> x = [1, 2, 3] y = [1, 2, 3] y[0] = -999 x # What is the value? Can you predict the value of x on the last line? Here, the third line mutates the object that y refers to, but because it is not the same object that x refers to, we still see [1, 2, 3] if we evaluate x. Here’s the state of memory after these lines execute: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part1.html Downloaded by michael ayad (michael.maged2014@gmail.com) 10/14 lOMoARcPSD|10729405 1/9/2021 1.1 The Python Memory Model: Introduction Aliasing also exists for immutable data types, but in this case there is never any “action at a distance”, precisely because immutable values can never change. For example, a tuple is an ordered sequence like a list, but it is immutable. In the example below, x and z are aliases of a tuple object; but it is impossible to create a side effect on x by mutating the object that z refers to, since we can’t mutate tuples at all. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part1.html Downloaded by michael ayad (michael.maged2014@gmail.com) 11/14 lOMoARcPSD|10729405 1/9/2021 1.1 The Python Memory Model: Introduction >>> x = (1, 2, 3) >>> z = x >>> z[0] = -999 Traceback (most recent call last): File "<input>", line 1, in <module> TypeError: 'tuple' object does not support item assignment Changing a reference is not the same as mutating a value What if we did this instead? >>> >>> >>> >>> x = (1, 2, 3) z = x z = (1, 2, 3, 40) x # What is the value? Again, we have made x and z refer to the same object. So when we change z on the third line, does x also change? This time, the answer is an emphatic no, and it is because of the kind of change we make on the third line. Instead of mutating the object that z refers to, we make z refer to a new object. This obviously can have no effect on the object that x refers to (or any object). Even if we switched the example from using immutable tuples to using mutable lists, x would be unchanged. In general, a statement of the form my_var = _____ never mutates the object that my_var refers to; all it can ever do is set my_var to refer to a different object. Keep this rule in mind when you’re writing your own code, as it’s often easy to confuse mutating values with changing references. Making a copy to avoid side effects Sometimes it makes sense to make a copy of a data structure so that changes can be made to it without any side effect on the original. Keep in mind, though, that this consumes both space and time resources, and is often unnecessary. Two types of equality Let’s look one more time at this code: >>> x = [1, 2, 3] >>> y = [1, 2, 3] >>> z = x >>> id(x) 4401298824 >>> id(y) 4404546056 >>> id(z) 4401298824 https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part1.html Downloaded by michael ayad (michael.maged2014@gmail.com) 12/14 lOMoARcPSD|10729405 1/9/2021 1.1 The Python Memory Model: Introduction What if we wanted to see whether x and y, for instance, were the same? Well, we’d need to define precisely what we mean by “the same.” We can use the == operator to compare the values stored in the objects they reference. This is called value equality. >>> x == y True >>> x == z True Or, we can use the is operator to compare the ids of the objects they reference. With is, we are asking whether two variables reference the exact same object. This is called identity equality. >>> x is y False >>> x is z True All built-in types have an implementation for == so that we can check for value equality; we’ll later see how to define == for our own classes. A special case with immutable objects Because ints are immutable, there isn’t much point in Python creating a separate int object every time your variable needs to refer to, say, 0. They can all refer to the very same object and no harm can be done since the object can never change. This explains the following code: >>> x = 43 >>> y = 43 >>> z = x >>> # Of course we see that all three variables have value >>> # equality. They all reference an int object containing >>> # 43. Whether or not they are the same int object is >>> # irrelevant to "==". >>> x == y True >>> x == z True >>> # But "is" checks identity equality. We wouldn't have >>> # expected x and y to reference the same int object. >>> # But now we know that Python feels free to take a >>> # short-cut and not create a second int object holding >>> # the value 43, and in this case it did: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part1.html Downloaded by michael ayad (michael.maged2014@gmail.com) 13/14 lOMoARcPSD|10729405 1/9/2021 1.1 The Python Memory Model: Introduction >>> x is y True >>> x is z True >>> # We can confirm that x and y have the same id: >>> id(x) 4331557184 >>> id(y) 4331557184 Python can take this short-cut with any value of any immutable type. For example, here we can observe the short-cut with strings: >>> x = 'foo' >>> y = 'foo' >>> x is y True But in this example, Python doesn’t take the short-cut: >>> x = "ice cream" >>> y = "ice cream" >>> x is y False It turns out that when Python does and doesn’t take the short-cut is quite complex, and it could even change from one version of Python to the next. But it makes no difference to our code’s behaviour; the only reason we need to be aware of it is so that we are not surprised when we see that two variables unexpectedly have identity equality. CSC148 CSC148 CSC148Notes Notes NotesTable Table Tableof of ofContents Contents Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part1.html Downloaded by michael ayad (michael.maged2014@gmail.com) 14/14 lOMoARcPSD|10729405 1/9/2021 1.2 The Python Memory Model: Functions and Parameters 1.2 The Python Memory Model: Functions and Parameters Terminology Let’s use this simple example to review some terminology that should be familiar to you: # Example 1. def mess_about(n: int, s: str) -> None: message = s * n print(message) if __name__ == '__main__': count = 13 word = 'nonsense' mess_about(count, word) In the function declaration, each variable in the parentheses is called a parameter. Here, n and s are parameters of function mess_about. When we call a function, each expression in the parentheses is called an argument. The arguments in our one call to mess_about are count and word. How function calls are tracked Python must keep track of the function that is currently running, and any variables defined inside of it. It stores this information in something called a stack frame, or just “frame” for short. Every time we call a function, the following happens: 1. A new frame is created and placed on top of any frames that may already exist. We call this pile of frames the call stack. 2. Each parameter is defined inside that frame. 3. The arguments in the function call are evaluated, in order from left to right. Each is an expression, and evaluating it yields the id of an object. Each of these ids is assigned to the corresponding parameter. Then the body of the function is executed. In the body of the function there may be assignment statements. We know that if the variable on the left-hand-side of the assignment doesn’t already exist, Python will create it. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part2.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/8 lOMoARcPSD|10729405 1/9/2021 1.2 The Python Memory Model: Functions and Parameters But with the awareness that there may be a stack of frames, we need a slightly more detailed rule: If the variable on the left-hand-side of the assignment doesn’t already exist in the top stack frame, Python will create it in that top stack frame. For example, if we stop our above sample code right before printing message, this is the state of memory: Notice that the top stack frame, for our call to mess_about, includes the new variable message. We say that any new variables defined inside a function are local variables; they are local to a call to that function. When a function returns, either due to executing a return statement or ge ing to the end of the function, the frame for that function call is deleted. All the variables defined in it—both parameters and local variables—disappear. If we try to refer to them after the function has https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part2.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/8 lOMoARcPSD|10729405 1/9/2021 1.2 The Python Memory Model: Functions and Parameters returned, we get an error. For example, when we are about to execute the final line in this program, # Example 2. (Same as Example 1, but with a print statement added.) def mess_about(n: int, s: str) -> None: message = s * n print(message) if __name__ == '__main__': count = 13 word = 'nonsense' mess_about(count, word) print(n) this is the state of memory, which explains why the final line produces the error NameError: name 'n' is not defined. Passing an argument creates an alias https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part2.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/8 lOMoARcPSD|10729405 1/9/2021 1.2 The Python Memory Model: Functions and Parameters What we often call “parameter passing” can be thought of as essentially variable assignment. In the example above, it is as if we wrote n = count s = word before the body of the function. If an argument to a function is a variable, what we assign to the function’s parameter is the id of the object that the variable references. This creates an alias. As you should expect, what the function can do with these aliases depends on whether or not the object is mutable. Passing a reference to an immutable object If we pass a reference to an immutable object, we can do whatever we want with the parameter and there will be no effect outside the function. Here’s an example: # Example 3. def emphasize(s: str) -> None: s = s + s + '!' if __name__ == '__main__': word = 'moo' emphasize(word) print(word) This code prints plain old moo. The reason is that, although we set up an alias, we don’t (and can’t) change the object that both word and s reference; we make a new object. Here’s the state of memory right before the function returns: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part2.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/8 lOMoARcPSD|10729405 1/9/2021 1.2 The Python Memory Model: Functions and Parameters Once the function is over and the stack frame is gone, the string object we want (with moomoo!) will be inaccessible. The net effect of this function is nothing at all. It doesn’t change the object that s refers to, it doesn’t return anything, and it has no other effect such as taking user rinput or printing to the screen. The one thing it does do, making s refer to something new, doesn’t last beyond the function call. If we want to use this function to change word, the solution is to return the new value and then, in the calling code, assign that value to word: # Example 4. def emphasized(s: str) -> str: return s + s + '!' if __name__ == '__main__': word = 'moo' word = emphasized(word) print(word) This code prints out moomoo!. Notice that we changed the function name from emphasize to emphasized. This makes sense when we consider the context of the function call: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part2.html Downloaded by michael ayad (michael.maged2014@gmail.com) 5/8 lOMoARcPSD|10729405 1/9/2021 1.2 The Python Memory Model: Functions and Parameters word = emphasized(word) Our function call is not merely performing some action, it is returning a value. So the expression on the right-hand side has a value: it is the emphasized word. Passing a reference to a mutable object If we wrote code analogous to the broken code in Example 3, but with a mutable type, it wouldn’t work either. For example: # Example 5. def emphasize(lst: List[str]) -> None: lst = lst + ['believe', 'me!'] if __name__ == '__main__': sentence = ['winter', 'is', 'coming'] emphasize(sentence) print(sentence) This code prints ['winter', 'is', 'coming'] for the same reason we saw in Example 3. Changing a reference (in this case, making lst refer to something new) is not the same as mutating a value (in this case, mutating the list object whose id was passed to the function). This model of memory illustrates: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part2.html Downloaded by michael ayad (michael.maged2014@gmail.com) 6/8 lOMoARcPSD|10729405 1/9/2021 1.2 The Python Memory Model: Functions and Parameters The code below, however, correctly mutates the object: # Example 6. def emphasize(lst: List[str]) -> None: lst.extend(['believe', 'me!']) if __name__ == '__main__': sentence = ['winter', 'is', 'coming'] emphasize(sentence) print(sentence) This is the state of memory immediately before function emphasize returns: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part2.html Downloaded by michael ayad (michael.maged2014@gmail.com) 7/8 lOMoARcPSD|10729405 1/9/2021 1.2 The Python Memory Model: Functions and Parameters Here are some things to notice: When we begin this program, we are executing the module as a whole. We make an initial frame to track its variables, and put the module name in the upper-left corner. When we call emphasize, a new frame is added to the call stack. In the upper-left corner of the frame, we write the function name. The parameter lst exists in the stack frame. It comes into being when the function is called. And when the function returns, this frame will be discarded, along with everything in it. At that point, lst no longer exists. When we pass argument sentence to emphasize, we assign it to lst. In other words, we set lst to id29, which creates an alias. id29 is a reference to a list object, which is mutable. When we use lst to access and change that object, the object that sentence references also changed. Of course it does: they are the same object! Moral of the story The situation gets trickier when we have objects that contain references to other objects, and you’ll see examples of this in the work you do this term. The bo om line is this: know whether your objects are mutable—at each level of their structure. Memory model diagrams offer a concise visual way to represent that. CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/memory_model_part2.html Downloaded by michael ayad (michael.maged2014@gmail.com) 8/8 lOMoARcPSD|10729405 1/9/2021 1.3 The Function Design Recipe 1.3 The Function Design Recipe Often when beginners are tasked with writing a program to solve a problem, they jump immediately to writing code. It doesn’t ma er whether the code is correct or not, or even if they fully understand the problem: somehow the allure of filling up the screen with text is too tempting. In CSC108, we teach the Function FunctionDesign DesignRecipe Recipe as a principled way of approaching Function Design Recipe problem-solving in Python. It delays writing any code at all until a complete docstring has been wri en, ensuring that we have thought through exactly what the function needs to do in all circumstances before we set about ge ing it to do that. Get into the habit of following the design recipe, and your teammates and even your future self will thank you later! While the Function Design Recipe was taught in CSC108 (and we assume that you will review it on your own time if needed), we want to expand on some important aspects that will be incorporated more heavily in this course: preconditions in the function docstring, type contracts, and testing methodologies. Preconditions One of the most important purposes of a function docstring is to let others know how to use the function. After all, we don’t just write code for ourselves, but for other members of our development team or company, or even the world at large if we’re writing a library we think is useful to anyone. The docstring of a function describes not only what the function does—through text and examples—but also the requirements necessary to use the function. One such requirement is the type contract: this requires that when someone calls the function, they do so with arguments of a specified type. For example, given this function docstring: def decreases_at(numbers: List[int]) -> int: """Return the index of the first number that is less than its predecessor. >>> decreases_at([3, 6, 9, 12, 2, 1, 8, 5]) 4 """ We know that decreases_at expects to be called on a list of integers; if we violate the type contract, say by calling it on a single integer or a dictionary, we cannot expect it to work properly. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/design_recipe.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/2 lOMoARcPSD|10729405 1/9/2021 1.3 The Function Design Recipe In practice, we often want to extend this idea beyond specifying the required type of arguments. For example, we might want to say that “this function must be given numbers between 1 and 10” or “the first argument must be greater than the second argument.” A precondition of a function is any property that the function’s arguments must satisfy to ensure that the function works as described. They are included in a function’s docstring, and form a crucial part of the function’s interface. As a user of a function, preconditions are extremely important, since they tell you what you have to do to use the function properly. They limit how a function can be used. On the flip side, preconditions are freeing to the implementor of a function: by specifying a certain property in a precondition, the person writing the body of the function can go ahead and assume that this property is satisfied, which often leads to a simpler or more efficient implementation. Consider a method for searching a list. Binary search is efficient, but depends on having a sorted list. If the search method had to confirm this, the added work would make it slower than linear search! In this case, it makes sense to simply require the caller to provide a sorted list. The bo om line is that specifying preconditions is part of the design of a function. It is a ma er of specifying precisely what service we want to provide to the users of our functions —and what restrictions we want to impose upon them. CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/design_recipe.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/2 lOMoARcPSD|10729405 1/9/2021 1.4 Python Type Annotations 1.4 Python Type Annotations In many programming languages, we cannot use a variable until we have declared its type, which determines the values that can be assigned to it; furthermore, a variable’s type can never change. Python takes a very different approach: only objects have a type, not the variables that refer to those objects; and in fact, a variable can refer to any type of object. Nonetheless, we can’t use a Python variable unless we know what type of object it refers to at the moment—how would we know what we can do with it? Since we need to be aware of the types we are using at any point in our code, it is good practise to document this. In this course, we will document the types of all functions and class instance a ributes. We’ll use Python’s relatively new type annotation syntax to do so. Before we can begin documenting types, we need to learn how to name them. Primitive types For primitive types, we can just use their type names. The table below gives the names of the common primitive types that are built into Python. There are other built-in types that are omi ed because we tend not to use them in this course. Type name Sample values int 0, 148, -3 float 4.53, 2.0, -3.49 str 'hello world', '' bool True, False None None Note that None is a bit special, as we refer to it as both a value and its type. Compound types For compound types like lists, dictionaries, and tuples, we can also just use their type names: list, dict, and tuple. But often we need to be more specific. For example, often we want to say that a function takes in not just any list, but only a list of integers; we might also want to say that this function returns not just any tuple, but a tuple containing one string and one boolean value. If we import the typing module, it provides us with a way of expressing these more detailed types. The table below shows three complex types from the typing module; the capitalized words in square brackets could be substituted with any type. Note that we use square brackets, not round ones, for these types. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/type_annotations.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/6 lOMoARcPSD|10729405 1/9/2021 1.4 Python Type Annotations Type Description Example List[T] a list whose elements are all of type T [1, 2, 3] has type List[int] Dict[T1, a dictionary whose keys are of type T1 and whose values are of type T2 type Dict[str, int] T2] Tuple[T1, T2, ...] {'a': 1, 'b': 2, 'c': 3} has a tuple whose first element has type T1, second ('hello', True, 3.4) has type element has type T2, etc. Tuple[str, bool, float] We can nest these type expressions within each other; for example, the nested list [[1, 2, 3], [-2]] has type List[List[int]]. Sometimes we want to be flexible and say that a value must be a list, but we don’t care what’s in the list (e.g. it could be a list of strings, a list of integers, a list of strings mixed with integers, etc.). In such cases, we can simply use the built-in types list, dict, and tuple for these types. Annotating functions Now that we know how to name the various types, let’s see how we can use this to annotate the type of a function. Suppose we have the following function: def can_divide(num, divisor): """Return whether num is evenly divisible by divisor.""" return num % divisor == 0 This function takes in two integers and returns a boolean. We annotate the type of a function parameter by writing a colon and type after it: def can_divide(num: int, divisor: int): We annotate the return type of the function by writing an arrow and type after the close parenthesis, and before the final colon: def can_divide(num: int, divisor: int) -> bool: We can use any of the type expressions discussed above in these function type annotations, including types of lists and dictionaries. Just remember to import the typing module! from typing import List, Tuple def split_numbers(numbers: List[int]) -> Tuple[List[int], List[int]]: """Return a tuple of lists, where the first list contains the numbers that are >= 0, and the second list contains the numbers that are < 0. """ pos = [] https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/type_annotations.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/6 lOMoARcPSD|10729405 1/9/2021 1.4 Python Type Annotations neg = [] for n in numbers: if n >= 0: pos.append(n) else: neg.append(n) return pos, neg Annotating instance a ributes To annotate the instance a ributes of a class, we list each a ribute along with its type directly in the body of the class. By convention, we usually list these at the very top of the class, after the class docstring and before any methods. from typing import Dict, Tuple class Inventory: """The inventory of a store. Keeps track of all of the items available for sale in the store. Attributes: size: the total number of items available for sale. items: a dictionary mapping an id number to a tuple with the item's description and number in stock. """ size: int items: Dict[int, Tuple[str, int]] ... # Methods omitted Annotating methods Annotating the methods of a class is the same as annotating any other function, with two notable exceptions: 1. By convention, we do not annotate the first parameter self. Its type is always understood to be the class that this method belongs to. 2. Sometimes we need to refer to the class itself, because it is the type of some other parameter or the return type of a method. Because of a quirk of Python, we can only do so by including a special import statement at the very top of our Python file. Here is an example (for brevity, method bodies are omi ed): https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/type_annotations.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/6 lOMoARcPSD|10729405 1/9/2021 1.4 Python Type Annotations # This is the special import we need for class type annotations. from __future__ import annotations class Inventory: # The type of self is omitted. def __init__(self) -> None: ... def add_item(self, item: str, quantity: int) -> None: ... def get_stock(self, item: str) -> int: ... def compare(self, other: Inventory) -> bool: ... def copy(self) -> Inventory: ... def merge(self, others: List[Inventory]) -> None: ... Four advanced types Here are four more advanced types that you will find useful throughout the course. All four of these types are imported from the typing module. Any Sometimes we want to specify the that the type of a value could be anything (e.g., if we’re writing a function that takes a list of any type and returns its first element). We annotate such types using Any: from typing import Any # This function could return a value of any type def get_first(items: list) -> Any: return items[0] Warning: beginners often get lazy with their type annotations, and tend to write Any even when a more specific type annotation is appropriate. While this will cause code analysis tools (like PyCharm or python_ta) to be satisfied and not report errors, overuse of Any completely defeats the purpose of type annotations! Remember that we use type annotations as a form of communication, to tell other programmers how to use our function or class. With this goal in mind, we should always prefer giving specific type annotations to convey the most information possible, and only use Any when absolutely necessary. Union https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/type_annotations.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/6 lOMoARcPSD|10729405 1/9/2021 1.4 Python Type Annotations We sometimes want to express in a type annotation that a value could be one of two different types; for example, we might say that a function can take in either an integer or a float. To do so, we use the Union type. For example, the type Union[int, float] represents the type of a value that could be either an int or a float. from typing import Union def cube_root(x: Union[int, float]) -> float: return x ** (1/3) Optional One of the most common uses of a “union type” is to say that a value could be a certain type, or None. For example, we might say that a function returns an integer or None, depending on some success or failure condition. Rather than write Union[int, None], there’s a slightly shorter version from the typing module called Optional. The type expression Optional[T] is equivalent to Union[T, None] for all type expressions T. Here is an example: from typing import Optional def find_pos(numbers: List[int]) -> Optional[int]: """Return the first positive number in the given list. Return None if no numbers are positive. """ for n in numbers: if n > 0: return n Callable Finally, we sometimes need to express that the type of a parameter, return value, or instance a ribute is itself a function. To do so, we use the Callable type from the typing module. This type takes two expressions in square brackets: the first is a list of types, representing the types of the function’s arguments; the second is its return type. For example, the type Callable[[int, str], bool] is a type expression for a function that takes two arguments, an integer and a string, and returns a boolean. Below, the type annotation for compare_nums declares that it can take any function that takes two integers and returns a boolean: from typing import Callable https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/type_annotations.html Downloaded by michael ayad (michael.maged2014@gmail.com) 5/6 lOMoARcPSD|10729405 1/9/2021 1.4 Python Type Annotations def compare_nums(num1: int, num2: int, comp: Callable[[int, int], bool]) -> int: if comp(num1, num2): return num1 else: return num2 def is_twice_as_big(num1: int, num2: int) -> bool: return num1 >= 2 * num2 >>> compare_nums(10, 3, is_twice_as_big) 10 >>> compare_nums(10, 6, is_twice_as_big) 6 CSC148 CSC148 CSC148Notes Notes NotesTable Table Tableof of ofContents Contents Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/python-recap/type_annotations.html Downloaded by michael ayad (michael.maged2014@gmail.com) 6/6 lOMoARcPSD|10729405 1/9/2021 1.5 Testing Your Work 1.5 Testing Your Work The last step of the Function FunctionDesign DesignRecipe Recipe is to test your code—but how? In this section, Function Design Recipe we discuss the different strategies for testing code that you’ll use during the term, and beyond. As you write more and more complex programs in this course, it will be vital to maintain good habits to support you in your programming. One of these habits is developing good tests that will ensure your code is correct, and— often overlooked—using good tools to make those tests as easy to run as possible. You want to get in the habit of writing tests early in the process of programming, and running them as often as possible to detect coding errors as soon as you make them. Doctests: basic examples in docstrings Often, beginners test their code by importing their function into the Python interpreter, and then manually copy-and-pasting their examples one at a time and comparing the output with the expected output in the docstring. This approach is both time-consuming and error-prone. It may be good for a quick sanity check, but we can certainly do be er. Our first improvement is to use the Python library doctest, which looks for examples in docstrings and converts them automatically into runnable tests! To use doctest, you can add the following code to the bo om of any Python file: if __name__ == '__main__': import doctest # import the doctest library doctest.testmod() # run the tests Then when you run the file, all of the doctest examples are automatically run, and you receive a report about which tests failed. Creating test suites with pytest The problem with doctest and pu ing examples in our docstrings is that we can’t include all of the test cases we want to without making the docstrings far too long for the reader. So while you should continue to put in a few basic doctests inside docstrings, in this course you will primarily use the pytest library to test your code. This library allows us to write our tests in a separate file, and so include an exhaustive set of tests without clu ering our code files. You see an example of pytest in your first lab, and will be seeing plenty more throughout the term. There are two important points we want to remind you of when using pytest: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/testing/how_to_test.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/4 lOMoARcPSD|10729405 1/9/2021 1.5 Testing Your Work Each function whose name starts with “test” is a separate test. They are all run independently of each other, and in a random order. Tests use the assert statement as the actual action that verifies the correctness of your code. The assert statement is used as follows: assert <expression> The <expression> should be a boolean expression (e.g., x == 3) that tests something about your function. We say that an assertion succeeds (or passes) when its expression evaluates to True, and it fails when its expression evaluates to False. A single test function in pytest can contain multiple assert statements; the test passes if all of the assert statements pass, but it fails when one or more of the assert statements fail. Choosing test cases We said earlier that keeping our tests in separate files from our source code enables us to write an exhaustive set of tests without worrying about length. But what exactly do we mean by “exhaustive?” In general, it is actually a pre y hard problem to choose test cases to verify the correctness of your program. You want to capture every possible scenario, while avoiding writing redundant tests. A good rule of thumb is to structure your tests around properties of the inputs. For example: integers: 0, 1, positive, negative, “small”, “large” lists: empty, length 1, no duplicates, duplicates, sorted, unsorted strings: empty, length 1, alphanumeric characters only, special characters like punctuation marks For functions that take in multiple inputs, we often also choose properties based on the relationships between the inputs. For example, for a function that takes two numbers as input, we might have a test for when the first is larger than the second, and another for when the second is larger than the first. For an input of one object and a list, we might have a test for when the object is in the list, and another for when the object isn’t. And finally, keep in mind that these are rules of thumb only; none of these properties will always be relevant to a given function. For a complete set of tests, you must understand exactly what the function does, to be able to identify what properties of the inputs really ma er. Property-based testing The kinds of tests we’ve discussed so far involve defining input-output pairs: for each test, we write a specific input to the function we’re testing, and then use assert statements to verify the correctness of the corresponding output. (For a function that mutates its input, https://www.teach.cs.toronto.edu/~csc148h/winter/notes/testing/how_to_test.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/4 lOMoARcPSD|10729405 1/9/2021 1.5 Testing Your Work we use assert statements to verify the correctness of the new state of the input after the function executes.) These tests have the advantage that writing any one individual test is usually straightforward, but the disadvantage that choosing and implementing test cases can be challenging and time-consuming. There is another way of constructing tests that we will explore in this course: property-based testing, in which a single test typically consists of a large set of possible inputs that is generated in a programmatic way. Such tests have the advantage that it is usually straightforward to cover a broad range of inputs in a short amount of code (using a library like hypothesis, as we’ll see); but it isn’t always easy to specify exactly what the corresponding outputs should be. If we were to write code to compute the correct answer, how would we know that that code is correct? So instead, property-based tests use assert statements to check for properties that the function tested should satisfy. In the simplest case, these are properties that every output of the function should satisfy, regardless of what the input was. For example: The type of the output: “the function str should always return a string.” Allowed values of the output: “the function len should always return an integer that is greater than or equal to zero.” Relationships between the input and output: “the function max(x, y) should return something that is greater than or equal to both x and y.” These properties may seem a li le strange, because they do not capture precisely what each function does; for example, str should not just return any string, but a string that represents its input. This is the trade-off that comes with property-based testing: in exchange for being able to run our code on a much larger range of inputs, we write tests which are imprecise characterizations of the function’s inputs. The challenge with property-based testing, then, is to come up with good properties that narrow down as much as possible the behaviour of the function being tested. Pu ing it all together Ideally, we use all three of these types of testing in combination: doctest is used to test basic functionality, as well as to communicate what the correct behaviour of the function is. test suites (developed using a tool like pytest) are used to fully assess the correctness of our function in a range of carefully chosen test cases that we generate by hand. property-based tests (developed using a tool like hypothesis) are used for a more shallow assessment of correctness but on a much larger number of automatically generated test cases. CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/testing/how_to_test.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/4 lOMoARcPSD|10729405 1/9/2021 1.5 Testing Your Work https://www.teach.cs.toronto.edu/~csc148h/winter/notes/testing/how_to_test.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/4 lOMoARcPSD|10729405 1/9/2021 1.6 Choosing Test Cases 1.6 Choosing Test Cases Testing is incredibly important. Software on its own, without strong evidence of its correctness, is of no value. In fact, in many workplaces, the tools used by professionals to manage groups of software developers working on a shared code base won’t accept a contribution of new or modified code unless it contains—and passes—a thorough test suite. We’ve talked about using a combination of three strategies for testing: doctest, unit tests (we’ll use pytest to implement these) and property-based tests (we’ll use hypothesis to implement these). We’ve also talked a bit about how to choose test cases for a test suite. Let’s look at this more closely. An example Suppose max didn’t exist in Python and we were writing a function to find the largest element in a list of integers. Suppose that we have tested the function on the following test cases, and that it passes them all: Would you be confident that the function works? Maybe not—we only checked 7 cases. What if you were shown that it passed 20 more tests? How about 100 more? Even if it passes 1,000 test cases, you should be skeptical. That may be a lot of tests, but think about how many possible ways there are to call this function. How do we know the tests don’t omit a scenario that could cause failure? https://www.teach.cs.toronto.edu/~csc148h/winter/notes/testing/choosing_tests.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/4 lOMoARcPSD|10729405 1/9/2021 1.6 Choosing Test Cases The fundamental problem is that we want to be sure that the code works in all cases but there are too many possible cases to test. In this Venn diagram, each circle represents a possible call to the function (of course there are many more than we could draw). Some of them have been tested. Making a convincing argument We may not be able to test every case, but we can still make a convincing argument as follows: Divide all possible calls to the function into meaningful categories. Pick a representative call from each category. Our Venn diagram now looks more organized: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/testing/choosing_tests.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/4 lOMoARcPSD|10729405 1/9/2021 1.6 Choosing Test Cases If we choose the categories well, for each category it will be reasonable to extrapolate from that one tested call to all the calls in the category: We now have either demonstrated or reasonably inferred correctness in every case. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/testing/choosing_tests.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/4 lOMoARcPSD|10729405 1/9/2021 1.6 Choosing Test Cases How to choose the relevant properties This kind of argument depends heavily on choosing appropriate categories. We base the categories on properties of the inputs. For example, extending what we saw in an earlier reading, here are some properties and some values for each property: the size of an object (could be a list, string, etc.): 0, 1, larger, even, odd the position of a value in an ordered sequence (such as a list or string): beginning, ending, elsewhere the relative position of two values in an ordered sequence: adjacent, separated the presence of duplicates: yes, no ordering: unsorted, non-decreasing, non-increasing the value of an integer: 0, 1, positive, negative, “small”, “large”, even, odd the value of a string: alphanumeric characters only, special characters like punctuation marks the location of whitespace in a string: beginning, ending, elsewhere, multiple occurrences, multiple adjacent whitespace characters, different types of whitespace characters and more! Depending on the parameters of a function, there could be many other properties. Not all of these properties are relevant to any particular function. We decide which are relevant based on knowing what the function does. If we also know how the function does it, that can influence our choices as well. For instance, if the function divides a list in half, odd vs. even size is pre y important! Judgment is also required in choosing which combinations of these properties to test. There is no right or wrong answer here, but a great way to think of it is this: Try to break the code. If you use a good strategy and can’t break it, you have a good argument that it truly works. CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/testing/choosing_tests.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/4 lOMoARcPSD|10729405 1/9/2021 1.7 Introduction to Property-based Testing 1.7 Introduction to Property-based Testing Hypothesis is a Python testing library that we’ll use occasionally in this course for exercises and assignments. It’s already available on the Teaching Lab machines, and you should have installed it on your own computer when you went through the steps of the Software Guide on Quercus. When writing tests, we often try to identify key properties on the inputs to the function being tested. We then pick representative inputs that meet these properties, and use these inputs to write tests. We can extend this idea to trying to identify key properties of the function itself: central relationships between their inputs and outputs that must hold for all possible inputs. This type of testing is called property-based testing, and the most famous implementation of this type of testing in Python is the hypothesis library. An example Let’s see a concrete example of what these property tests might look like. Consider the following function: def insert_after(lst: List[int], n1: int, n2: int) -> None: """After each occurrence of <n1> in <lst>, insert <n2>. >>> >>> >>> [5, """ lst = [5, 1, 2, 1, 6] insert_after(lst, 1, 99) lst 1, 99, 2, 1, 99, 6] We’ll test two properties of this function, which should hold for any valid input: 1. `insert_after` always returns `None`. 2. `insert_after` increases the length of `lst` by the number of times that `n1` occurs in that list. Our first test is the following: from typing import List from hypothesis import given from hypothesis.strategies import integers, lists https://www.teach.cs.toronto.edu/~csc148h/winter/notes/testing/hypothesis.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/3 lOMoARcPSD|10729405 1/9/2021 1.7 Introduction to Property-based Testing from insert import insert_after @given(lists(integers()), integers(), integers()) def test_returns_none(lst: List[int], n1: int, n2: int) -> None: """Test that insert_after always returns None. """ assert insert_after(lst, n1, n2) is None The test case (test_returns_none) is preceded by the line @given(lists(integers()), integers(), integers()); what this line does is tell hypothesis to generate “random” inputs of the given types: a list of integers, and then two other integers. These values are then passed to the test function, which then simply calls insert_after on them, and checks that the output is None. The most interesting part is that the “given” line doesn’t just generate one set of random inputs; instead, it generates dozens of them (or even hundreds, depending on how hypothesis is configured), and runs this test function on each one! We call the input specifiers like integers() or lists() a strategy; we’ll see more examples of strategies throughout the term. A more complex property Even though the previous test looked pre y straight-forward, don’t be fooled! Since a property test is just a Python function, we can write pre y complex tests using all of our Python knowledge. For example, to test the second property we mentioned, we’ll need to store both the original length of lst, and the number of times that n1 appeared in it: @given(lists(integers()), integers(), integers()) def test_new_item_count(lst: List[int], n1: int, n2: int) -> None: """Test that the correct number of items is added. """ num_n1_occurrences = lst.count(n1) original_length = len(lst) insert_after(lst, n1, n2) final_length = len(lst) assert final_length - original_length == num_n1_occurrences Further reading https://www.teach.cs.toronto.edu/~csc148h/winter/notes/testing/hypothesis.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/3 lOMoARcPSD|10729405 1/9/2021 1.7 Introduction to Property-based Testing Hypothesis is a powerful property-based testing library, and we’re only scratching the surface of it here. If you’d like more information, please consult the official officialHypothesis Hypothesis official Hypothesis documentation documentation. documentation documentation CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/testing/hypothesis.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/3 lOMoARcPSD|10729405 1/9/2021 2.1 Introduction to Object-Oriented Programming 2.1 Introduction to Object-Oriented Programming We have seen that every object in a Python program has a type, and that an object’s type governs both its possible values and the operations that can be performed on it. As the data we want to store and manipulate gets more complex, Python’s built-in types begin to feel inadequate. Fortunately, we can create our own custom types. Consider Twi er Twi Twi er er is a pre y successful application that allows users to broadcast short messages, Twi er called tweets. If we wanted to write a program like Twi er, we would certainly need to be able to represent a tweet in our program, including the user who wrote the tweet, when the tweet was created, the contents of the tweet, and how many “likes” the tweet has. How would we do so? We could store the data associated with a single tweet in a list, ['David', '2017-09-19', 'Hello, I am so cool', 0] or a dictionary, { 'userid': 'David', 'created_at': '2017-09-19', 'content': 'Hello, I am so cool', 'likes': 0 } and then pass such objects around from function to function as needed. You might find it interesting to compare the relative merits of the list vs. dictionary approach. But there is a serious problem with using either of them: nothing would prevent us from creating a malformed tweet object. For example, if we used a list, we could: Create a malformed tweet, for instance with the values in the wrong order, such as [55, 'Diane', 'Older and even cooler', '2017-09-19']. Ruin a well-formed tweet by calling pop, which would remove the record of the number of people who liked the tweet. If we used a dictionary, we could: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/oop_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/14 lOMoARcPSD|10729405 1/9/2021 2.1 Introduction to Object-Oriented Programming Create a malformed tweet, for instance one that is missing the date: { 'userid': 'Jacqueline', 'content': 'Has the most dignified cat', 'likes': 12 } Ruin a well-formed tweet by adding a new key-value pair that has nothing to do with tweets, for example by doing t['units'] = 'centimeters'. Furthermore, with either a list or a dictionary, nothing would enforce the 280-character limit that Twi er imposes on tweets. Notice that this objection is one of protecting against errors, and not one of absolute correctness. That is, it is certainly possible to write a perfectly correct program that represents tweets using lists or dictionaries—you’ll just probably make lots of mistakes along the way. A be er solution is to create an entirely new data type for tweets. We do this by defining a class. This will allow us to specify the structure of the data precisely, and to control the operations that are performed on the data so that the data always remains well-formed. Defining a class: a ributes A class is a block of code that defines a type of data. The built-in Python types that you’re familiar with like int, str, and list are all defined by classes. Suppose we have a class called X. An object whose type is X is called an instance of class X; for example, the object 3 is an instance of class int. An instance of a class does not have to contain just a single piece of data as an int does; it can hold a collection of data bundled together. Each individual piece of data in an instance is called an instance a ribute of the object. 1 For example, a tweet could possess an instance 1 In this course we’ll often shorten “instance a ribute” to just “a ribute”, but in future study you’ll encounter other kinds of a ributes as well. a ribute for the content of the tweet, and another for the user ID of the person who wrote the tweet. Classes can have an arbitrary number of a ributes, and they can all be of different types: integers, floats, strings, lists, dictionaries, and even other classes. Let’s now see how to actually do this in Python. First, we pick the name of the class, which is usually a capitalized noun. In this case, we’ll pick Tweet. We then write a docstring for the class, which gives a description of both the class and all the instance a ributes of that class. class Tweet: """A tweet, like in Twitter. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/oop_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/14 lOMoARcPSD|10729405 1/9/2021 2.1 Introduction to Object-Oriented Programming === Attributes === userid: the id of the user who wrote the tweet. created_at: the date the tweet was written. content: the contents of the tweet. likes: the number of likes this tweet has received. """ Documenting a ribute types in PyCharm Below the docstring, we declare the type of every instance a ribute; the syntax for doing so is <attribute_name>: <attribute_type>. For example, the first few lines in the Tweet class would be: from datetime import date # We are using a library to represent dates class Tweet: """A tweet, like in Twitter. === Attributes === userid: the id of the user who wrote the tweet. created_at: the date the tweet was written. content: the contents of the tweet. likes: the number of likes this tweet has received. """ # Attribute types userid: str created_at: date content: str likes: int As we discussed in 1.4 1.4Type TypeAnnotations Annotations, Annotations this Python syntax enables programming tools, 1.4 Type Annotations including PyCharm, to check the types of a ributes as we give them values and modify their values throughout our code. Don’t be fooled by the similarity to other programming languages, though! These type annotations do not create the instance variables. In fact, they have no effect when the program runs, and could actually be removed without changing the behaviour of our code. However, it is good practice to include these because, as we said, they can be understood by automated tools. Notice that we have to document the instance a ributes in two places: in the docstring (to specify their meaning) and below it (to specify their types). While this is a li le awkward, keep in mind that each form of documentation serves an important purpose. Users must know the meaning of the instance a ributes of a class in order to use the class, and the information needs to be in the docstring so that help can find it. Automated tools read the https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/oop_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/14 lOMoARcPSD|10729405 1/9/2021 2.1 Introduction to Object-Oriented Programming a ribute types to help us write our code and detect bugs, and they require that the information be in the class body rather than the docstring. Creating an instance of a class After writing only this much in the class body, we have defined a new type! We can import this class and then create an instance of it like this: >>> tweet = Tweet() This creates a new Tweet object and stores a reference to it in the variable tweet. Defining an initializer At this point, the new object doesn’t contain any data. >>> tweet = Tweet() >>> tweet.userid AttributeError: 'Tweet' object has no attribute 'userid' The error makes sense. Remember that a type annotation does not create a variable, so all we have in memory is this: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/oop_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/14 lOMoARcPSD|10729405 1/9/2021 2.1 Introduction to Object-Oriented Programming In order to create and initialize instance a ributes for an instance of a class, we define a special method inside the class called __init__, or in English the initializer. 2 Here is the 2 As we’ll discuss later, we use the term “method” for any function that is defined inside a class. header for an initializer method for our Tweet class: class Tweet: # previous content omitted for brevity def __init__(self, who: str, when: date, what: str) -> None: """Initialize a new Tweet. """ You are likely wondering what the parameter self is for. Every initializer has a first parameter that refers to the instance that has just been created and is to be initialized. By convention, we always call it self. This is such a strong Python convention that most code checkers will complain if you don’t follow it. To understand how self works, let’s examine how we use the initializer: >>> from datetime import date >>> t1 = Tweet('Giovanna', date(2017, 9, 18), 'Hello') Notice that we never mention the initializer __init__ by name; it is called automatically, and the values in parentheses are passed to it. Also notice that we pass three values to the initializer, even though it has four parameters. We never have to pass a value for self; it automatically receives the id of the instance that is to be initialized. So this is what is happening in memory at the beginning of the initializer: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/oop_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 5/14 lOMoARcPSD|10729405 1/9/2021 2.1 Introduction to Object-Oriented Programming The initializer’s job is to create and initialize the instance a ributes. Let’s write the code to do this for the a ribute userid. In the case of our example, we want to add to the new Tweet object as follows: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/oop_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 6/14 lOMoARcPSD|10729405 1/9/2021 2.1 Introduction to Object-Oriented Programming This will require an assignment statement. What will go on the left side? We need to create a new variable called userid, but if we write userid = ... (we will figure out the right side in a moment), this will create a new variable called userid in the stack frame. We need to put it in the new object instead. Fortunately, self refers to the new object, and we can “go into” the object by writing self followed by a dot ‘.’. 3 3 This is known as dot notation, and is common to many programming languages. So our assignment statement will be self.userid = .... What goes on the right side? We need to get id1 into the new a ribute. Our parameter who stores that, and we have access to it because it is in our stack frame. So the assignment statement will be self.userid = who. We have just created an instance a ribute! Here is the full initializer method: class Tweet: # previous content omitted for brevity https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/oop_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 7/14 lOMoARcPSD|10729405 1/9/2021 2.1 Introduction to Object-Oriented Programming def __init__(self, who: str, when: date, what: str) -> None: """Initialize a new Tweet. """ self.userid = who self.created_at = when self.content = what self.likes = 0 By the time the initializer is about to return, we have created four instance a ributes in total and this is the state of memory: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/oop_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 8/14 lOMoARcPSD|10729405 1/9/2021 2.1 Introduction to Object-Oriented Programming and after we return, we can assign the id of the new object to t1: With the new object properly set up and a reference to it stored, we can access each of its a ributes by using dot notation. >>> from datetime import date >>> t1 = Tweet('Giovanna', date(2017, 9, 18), 'Hello') >>> t1.userid 'Giovanna' >>> t1.created_at datetime.date(2017, 9, 18) https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/oop_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 9/14 lOMoARcPSD|10729405 1/9/2021 2.1 Introduction to Object-Oriented Programming >>> t1.content 'Hello' >>> t1.likes 0 Notice that we let the client code choose initial values for a ributes who, when, and what, through passing arguments to the initializer. We do not give the client code control over the initial value for likes; instead, every Tweet object begins with zero likes. This was simply a design decision. For any initializer you write, you will have to decide which a ributes will have an initial value that the client code gets control over. What really happens when we create a new object Our initializer differs from the functions you are familiar with in important ways: As noted above, an initializer always has a first parameter called self, and we never have to pass a value for self. By convention, we omit a type annotation for self. This is because the type of self should always be the class that this method belongs to (in our example, this is Tweet). As we will see, these differences show up in all methods that we write. You may also notice that the return type of the initializer is None, and yet a call to the initializer seems to return the new instance. This makes sense once we know that creating a Tweet doesn’t just cause __init__ to be called. It actually does three things: 4 4 Of course, this is true not just for our Tweet class, but in fact every class in Python. 1. Create a new Tweet object behind the scenes. 2. Call __init__ with the new object passed to the parameter self, along with the other three arguments (for who, when, and what). 3. Return the new object. This step is where the object is returned, not directly from the call to __init__ in Step 2. Revisiting the terminology Once we define the Tweet class, how many Tweet objects can we construct? There is no limit. Each one is an object that is an instance of class Tweet. Suppose we create 25 Tweet objects. How many content variables have we created? 25. There is one for each instance of Tweet. This is why we call it an instance a ribute. A class acts as a blueprint or template: when we define the class, we specify what a ributes every single instance of that class will have. This allows us to enforce a common structure on all data of the given type, which is one of the main purposes of having a type! Defining a class: methods https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/oop_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 10/14 lOMoARcPSD|10729405 1/9/2021 2.1 Introduction to Object-Oriented Programming Now that we have our new data type, we can write functions that take in tweets as arguments, or even create and return a new tweet! Here are two simple examples: def like(tweet: Tweet, n: int) -> None: """Record the fact that <tweet> received <n> likes. Precondition: n >= 0 >>> t = Tweet('Rukhsana', date(2017, 9, 16), 'Hey!') >>> like(t, 3) >>> t.likes 3 """ tweet.likes += n def retweet(new_user: str, tweet: Tweet, new_date: date) -> Tweet: """Create a copy of the given tweet with the new user and date. The new tweet has 0 likes, regardless of the number of likes of the original tweet. >>> t1 = Tweet('Rukhsana', date(2017, 9, 16), 'Hey!') >>> t2 = retweet('Calliope', t1, date(2017, 9, 18)) >>> t2.userid 'Calliope' >>> t2.created_at datetime.date(2017, 9, 18) >>> t2.content 'Hey!' >>> t2.likes 0 """ return Tweet(new_user, new_date, tweet.content) While it is certainly possible to accomplish everything that we would ever want to do with our Tweet class by writing functions, there are downsides of doing so: these functions are separate entities from the class itself, and must be imported by any program that wants to make use of them. Defining methods instead Think back to how you used Python strings before you knew anything about writing your own classes. You were used to doing things like this: >>> word = 'supercalifragilisticexpealidocious' >>> word.count('i') 6 https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/oop_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 11/14 lOMoARcPSD|10729405 1/9/2021 2.1 Introduction to Object-Oriented Programming It would be nice to be able to use a Tweet in this way, but we can’t; our current class provides no services other than storage of instance a ributes. We can change that by moving the functions inside the class, to make them methods, which is simply the term for functions that are defined within a class. We have seen one example of a method already: the initializer, __init__, is a special method that performs the crucial operation of initializing the instance a ributes of a newly-created instance of a class. But any function that operates on an instance of a class can be converted into a method by doing the following: Indent the function so that it is part of the class body (i.e., underneath class Tweet:). Ensure that the first parameter of the function is an instance of the class, and name this parameter self. For example, we could make like a method of Tweet with the following code: class Tweet: ... def like(self, n: int) -> None: """Record the fact that <self> received <n> likes. Precondition: n >= 0 """ self.likes += n Notice that we now use parameter self to access instance a ributes, just as we did in the initializer. Calling methods Now that like is a method of the Tweet class, we do not need to import it separately; importing just the class Tweet is enough. We call it using the same dot notation that we use to access an object’s a ributes: >>> from datetime import date >>> tweet = Tweet('Rukhsana', date(2017, 9, 16), 'Hey!') >>> tweet.like(10) # dot notation! >>> tweet.likes 10 Notice that when we call tweet.like(10) we pass one argument, yet the method has two parameters, self and n. What dot notation does for a method call is automatically pass the value to the left of the dot (in this case, tweet) as the method’s first parameter self. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/oop_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 12/14 lOMoARcPSD|10729405 1/9/2021 2.1 Introduction to Object-Oriented Programming Again, think back to how you used Python strings before you knew anything about writing your own classes. When you wrote code like word.count('i'), you passed only the string to be searched for, in this case 'i'. How does Python know in what string to search for it? To the left of the dot we said word, so that is the string to search in. If we had wri en name.count('i') then name would be the string to search in. The string method count is just like the methods that we write: it has a first parameter called self that refers to the object to operate on. Referring to methods by their class A method really is just a function associated with a class, and can be referred to from the class directly, without using an instance. For example, the method count is part of the str class, and its full name is str.count. Using this, we can call it directly, just as we would any other function. The following calls are equivalent: # Use dot notation to send word to self. >>> word.count('i') 6 # Send word as a parameter. >>> str.count(word, 'i') 6 Similarly, now that like is a method of the Tweet class, these are equivalent: >>> intro = Tweet('Diane', date(2018, 9, 11), 'Welcome to CSC148!') >>> intro.like(10) >>> Tweet.like(intro, 10) Though we have these two alternatives, we almost always call methods on an instance directly, without referring to the class. This is because in object-oriented programming, we elevate the object as the entity of central importance. Every time we use dot notation, we are reminded that it is an object we are working with, whether we are accessing a piece of data bundled with that object or performing an operation on that object. There is another important technical reason we use dot notation with the object, but we’ll defer that discussion until we discuss inheritance. Methods vs. functions We just saw that methods in Python are just a special kind of function (ones that are defined within a class). Everything you already know about designing and writing functions applies equally to all methods you’ll write. But how do we decide when to make something a function and when to make it a method? Here is the main design difference between functions and methods. Methods are part of the very definition of the class, and form the basis of how others can use the class. They are https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/oop_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 13/14 lOMoARcPSD|10729405 1/9/2021 2.1 Introduction to Object-Oriented Programming bundled together with the class, and are automatically available to every instance of the class. In contrast, functions that operate on a class instance must be imported separately before they are used. So it sounds like functions are “less useful” than methods because you need to do a bit of extra work to use them. Why not make everything a method? When we design a class, we aren’t just designing it for ourselves, but for anyone else who might want to use that class in the future. It is impossible to predict every single thing a person will want to use a class for, and so it is impossible to write every method that could possibly ever be useful. And even if we spent a whole lot of time and energy trying to be comprehensive in defining many methods, this creates the additional problem that anyone who wants to use the class must weed through pages and pages of documentation to find the methods that are actually useful for their purpose. Here is the rule of thumb we will use. When we write a class, we write methods for behaviours that we think will be useful for “most” users of the class, and functions for the operations that users of the class must implement themselves for their specific needs. This is a design choice, and it is not a black and white choice; judgment is required! Special methods We said that the initializer was a special method. This is actually a technical term in Python, and is used to describe a method that we don’t have to call using the regular method call syntax. For example, we do not explicitly call __init__; it happens automatically as part of the three steps for creating a new instance of a class. Double underscores are used around a method name to indicate that it is a special method. As we’ll soon learn, there are other special methods. For instance, if we define a method called __str__, it will be called automatically any time we print an instance of a class, allowing us to specify how the tweet is reported. For example, this would allow us to write: >>> print(t1) Giovanna said "Hello" on 2017-09-18 (0 likes) CSC148 CSC148 CSC148Notes Notes NotesTable Table Tableof of ofContents Contents Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/oop_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 14/14 lOMoARcPSD|10729405 1/9/2021 2.2 Representation Invariants 2.2 Representation Invariants We now know how to define a class that bundles together related pieces of data and includes methods that operate on that data. These methods provide services to client code, and if we write them sensibly, the client code can be sure that any instances they create will always be in a sensible state. For instance, we can make sure that no data is missing by writing an initializer that creates and initializes every instance a ribute. And if, say, one instance a ribute must always be greater than another (because that is a rule in the domain of our program), we can ensure that the initializer and all of the methods will never violate that rule. Let’s return to our Twi er example to consider what writing the methods “sensibly” entails. Documenting rules with representation invariants 280-character limit on tweets Twi er imposes a 280-character 280-characterlimit limiton ontweets tweets. tweets If we want our code to be consistent with this rule, we must both document it and make sure that every method of the class enforces the rule. First, let’s formalize the notion of “rule”. A representation invariant is a property of the instance a ributes that every instance of a class must satisfy. For example, we can say that a representation invariant for our Tweet class is that the content a ribute is always at most 280 characters long. We document representation invariants in the docstring of a class, underneath its a ributes. While we could write these representation invariants in English, we often prefer concrete Python code expressions that evaluate to True or False, as such expressions are unambiguous and can be checked directly in our program. class Tweet: """A tweet, like in Twitter. === Attributes === userid: the id of the user who wrote the tweet. created_at: the date the tweet was written. content: the contents of the tweet. likes: the number of likes this tweet has received. === Representation Invariants === - len(self.content) <= 280 """ # Attribute types userid: str https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/representation_invariants.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/4 lOMoARcPSD|10729405 1/9/2021 2.2 Representation Invariants created_at: date content: str likes: int Even though this is a new definition, we have seen representation invariants already: every instance a ribute type annotation is a representation invariant! For example, the annotation content: str means that the content of a tweet must always be a string. Enforcing representation invariants Even though documenting representation invariants is essential, documentation alone is not enough. As the author of a class, you have the responsibility of ensuring that each method is consistent with the representation invariants, in the following two ways: 1. At the beginning of the method body (i.e., right when the method is called), you can always assume that all of the representation invariants are satisfied. 2. At the end of the method (i.e., right before the method returns), it is your responsibility to ensure that all of the representation invariants are satisfied. That is, each representation invariant is both a precondition and postcondition of every method in a class. You are free to temporarily violate the representation invariants during the body of the method (and will often do so while mutating the object), as long as by the end of the method, all of the invariants are restored. The initializer method is an exception: it does not have any preconditions on the a ributes (since they haven’t even been created yet), but it must initialize the a ributes so that they satisfy every representation invariant. In our Twi er code, what method(s) may require modification in order to ensure that our representation invariant (len(self.content) <= 280) is enforced? Currently, the initializer allows the user to create a Tweet object with any message they want, including one that exceeds the limit. There are a variety of strategies that we can take for enforcing our representation invariant. One approach is to process the initializer arguments so that the instance a ributes are initialized to allowed values. For example, we might truncate a tweet message that’s too long: def __init__(self, who: str, when: date, what: str) -> None: """Initialize a new Tweet. If <what> is longer than 280 chars, only first 280 chars are stored. >>> t = Tweet('Rukhsana', date(2017, 9, 16), 'Hey!') >>> t.userid 'Rukhsana' >>> t.created_at datetime.date(2017, 9, 16) https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/representation_invariants.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/4 lOMoARcPSD|10729405 1/9/2021 2.2 Representation Invariants >>> t.content 'Hey!' >>> t.likes 0 """ self.userid = who self.created_at = when self.content = what[:280] self.likes = 0 Another approach is to not change the code at all, but instead specify a precondition on the initializer: def __init__(self, who: str, when: date, what: str) -> None: """Initialize a new Tweet. Precondition: len(what) <= 280. >>> t = Tweet('Rukhsana', date(2017, 9, 16), 'Hey!') >>> t.userid 'Rukhsana' >>> t.created_at datetime.date(2017, 9, 16) >>> t.content 'Hey!' >>> t.likes 0 """ self.userid = who self.created_at = when self.content = what self.likes = 0 As we discussed in 1.3 1.3The TheFunction FunctionDesign DesignRecipe Recipe, Recipe a precondition is something that we 1.3 The Function Design Recipe assume to be true about the function’s input. In the context of this section, we’re saying, “The representation invariant will be enforced by our initializer assuming that the client code satisfies our preconditions.” On the other hand, if this precondition is not satisfied, we aren’t making any promise about what the method will do (and in particular, whether it will enforce the representation invariants). Another example: non-negativity constraints Look again at the a ributes of Tweet. Another obvious representation invariant is that likes must be at least 0; our type annotation likes: int allows for negative integers, after all. Do any methods need to change so that we can ensure this is always true? We need to check the initializer and any other method that mutates self.likes. First, the initializer sets likes to 0, which satisfies this invariant. The method Tweet.like adds to the likes a ribute, which would seem safe, but what if the client code passes a https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/representation_invariants.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/4 lOMoARcPSD|10729405 1/9/2021 2.2 Representation Invariants negative number? Again, we are faced with a choice on how to handle this. We could impose a precondition that Tweet.like be called with n >= 0. Or, we could allow negative numbers as input, but simply set self.likes = 0 if its value falls below 0. Or, we could simply refuse to add a negative number, and simply return (i.e., do nothing) in this case. All of these options change the method’s behaviour, and so whatever we choose, we would need to update the method’s documentation! Client code can violate representation invariants also We’ve now learned how to write a class that declares and enforces appropriate representation invariants. We guarantee that whenever client code creates new instances of our class, and calls methods on them (obeying any preconditions we specify), our representation invariants will always be satisfied. Sadly, even being vigilant in implementing our methods doesn’t fully prevent client code from violating representation invariants—we’ll see why in the next section. CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/representation_invariants.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/4 lOMoARcPSD|10729405 1/9/2021 2.3 Designing Classes 2.3 Designing Classes We have now introduced three elements of a class: instance a ributes (data) methods (operations) representation invariants (properties) Now that we understand the basic mechanics of classes, it’s time to think about the design of classes. In fact, there are a whole host of design questions that you’ll face when designing object-oriented programs. To help guide you in this process, we have prepared a Class ClassDesign DesignRecipe Recipe, Recipe which serves an analogous role to the Function Design Recipe from Class Design Recipe CSC108. This is a reference document that you aren’t required to follow explicitly, but will be a helpful way to guide your thinking when designing your own classes. Information hiding The fundamental themes of the Class Design Recipe are design before coding and information hiding. Just as a great deal of thought goes into precisely specifying the purpose and expected behaviour of a function before you implement it, so too do you have to think about the design of a class before implementing even a single method. The relationship between the author and client of a class plays a powerful and subtle role in class design. When we design a class, we must think about how another person would use this class. In other words, we design a class to be used by others, whether it’s other team members, colleagues on a different project, or even ourselves when we are writing new code months or years into the future (and only vaguely recall writing the class in the first place). And one of the biggest desires of “other users” is to be able to use our class without having to know at all how it works. Designing classes by separating the public interface of the class from the private implementation details is known as information hiding, and is one of the fundamental elements of object-oriented programming. One of the biggest advantages of designing our programs in this way is that after our initial implementation, we can feel free to modify it (e.g., add new features or make it more efficient) without disturbing the public interface, and rest assured that this doesn’t affect other code that might be using this class. Unfortunately, this course is too small in scope to give you the opportunity to write code for other people, although do keep in mind that you’re always writing your code for your future self. We’ll encourage you to follow the Class Design Recipe and think carefully about a clear separation between what one needs to know to implement a class and what information one needs to know to use that class. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/class_design.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/4 lOMoARcPSD|10729405 1/9/2021 2.3 Designing Classes Private-ness in Python As we have already discussed, the Class Design Recipe places a great emphasis on the distinction between the public interface of a class and its private implementation. So far, the focus has been on using documentation to define a clear interface: explicitly writing a good class docstring with all public a ributes of the class clearly documented, and method docstrings that describe the operations the class supports. In this section, we’ll discuss another important way to document the a ributes and methods that we want to keep private, and then go over pitfalls concerning the very concept of “private-ness” in Python. Leading underscores An extremely common Python naming convention is to name anything that is considered private with a leading underscore. An underscore on an instance a ribute indicates to a programmer writing client code that they should not access the instance variable: They should not use its value, and they certainly shouldn’t change it. We can not not only mark a ributes as private, but methods as well. What would be the point of a method that client code shouldn’t call? It could be a private helper for one of the methods that client code is welcome to call. Python’s “we’re all adults” philosophy In other programming languages, when we declare restrictions on which a ributes can be accessed outside the class and which cannot, they are enforced as part of the language itself. In Java, for example, a empting to access or modify an a ribute that has been marked as private leads to an error that prevents the program from running at all. The Python language takes a different approach to private a ributes and methods, which is informed by one of its core philosophies: “We’re all adults here.” The idea is that the language gives its programmers a great deal of freedom when writing code—including allowing programmers to access private a ributes and methods of classes from outside the class. While there are some Python language mechanisms for performing further restriction on access, they are beyond the scope of the course, and they are weak mechanisms that can be circumvented. As a result of the Python philosophy, if someone else wants to use your class, they are ultimately responsible for using it “properly.” And if they do not, well, they’re an adult; if they access a private a ribute or method, they should be aware that this might lead to unexpected or disappointing results. This permissiveness doesn’t mean that we give up on private a ributes or methods altogether. Our previous discussion about the philosophy of public vs. private is still valid, and indeed respected by Python programmers. It just means that it is absolutely vital in Python to write good documentation and follow coding conventions. In particular, this is why it is not enough to implement methods so that they enforce our desired representation https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/class_design.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/4 lOMoARcPSD|10729405 1/9/2021 2.3 Designing Classes invariants. Because a programmer may wish to access and mutate instance a ributes directly, our representation invariants must be carefully documented so that the programmer knows (and is responsible for maintaining) these invariants. This way, we give the users of our class enough information to use it the way we intended, and alert them to the things they should not do. And if the user ignores our documentation? That’s up to them, risks and all. Combining classes: composition A class is almost never defined and used in isolation. It is much more often the case that it belongs to a large collection of classes, which are all related to each other in various ways. One fundamental type of relationship between two classes occurs when the instances of one class have an a ribute which refers to one or more instances of the other class. A User object might have a list of Tweets as an a ribute. Colloquially, we say that a User “has some” Tweets. class User: """A Twitter user. === Attributes === userid: the userid of this Twitter user. bio: the bio of this Twitter user. tweets: the tweets that this user has made. """ # Attribute types userid: str bio: str tweets: List[Tweet] This type of relationship between classes is called composition, and appears all the time in object-oriented programming, because it arises so naturally. Whenever we have two classes, and one refers to instances of the other, that’s composition! It is also the case that two classes might be related by composition in more than one way. For example, we might change Tweet so that it has an instance a ribute user of type User, rather than just a string for the user’s id. We could even add an extra a ribute original_creator of type User as well, representing the distinction between the user who originally wrote the tweet, and another user who retweets it. Modelling with classes A common programming task is to take an English description of a problem and design classes that model the problem. The main idea is that the class(es) should correspond to the most important noun(s) in the problem, the a ributes should be the information (other https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/class_design.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/4 lOMoARcPSD|10729405 1/9/2021 2.3 Designing Classes nouns or adjectives) associated with these noun(s), and the methods correspond to the verbs. Here are a few examples for you to try out. People We’d like to create a simple model of a person. A person normally can be identified by their name, but might commonly be asked about her age (in years). We want to be able to keep track of a person’s mood throughout the day: happy, sad, tired, etc. Every person also has a favourite food: when she eats that food, her mood becomes 'ecstatic'. And though people are capable of almost anything, we’ll only model a few other actions that people can take: changing their name, and greeting another person with the phrase 'Hi ____, it's nice to meet you! I'm ____.' Rational numbers It’s slightly annoying for math people to use Python, because fractions are always converted to decimals and rounded, rather than kept in exact form. Let’s fix that! A rational number consists of a numerator and denominator; the denominator cannot be 0. Rational numbers are wri en like 7/8. Typical operations include determining whether the rational is positive, adding two rationals, multiplying two rationals, comparing two rationals, and converting a rational to a string. Restaurant recommendation We want to build an app which makes restaurant recommendations for a group of friends going out for a meal. Each person has a name, current location, dietary restrictions, and some ratings and comments for existing restaurants. Each restaurant has a name, a menu from which one can determine what dishes accommodate what dietary restrictions, and a location. The recommendation system, in addition to actually making recommendations, should be able to report statistics like the number of times a certain person has used the system, the number of times it has recommended each restaurant, and the last recommendation made for a given group of people. CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/object-oriented-programming/class_design.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/4 lOMoARcPSD|10729405 1/9/2021 2.4 Inheritance: Introduction and Methods 2.4 Inheritance: Introduction and Methods In this and the next few sections, we’ll learn about a relationship called inheritance that can exist between two classes. We will focus on one particular way of using inheritance in our code design, and through that, will learn how inheritance works in Python. Consider a payroll system Suppose we are designing an application to keep track of a company’s employees, including some who are paid based on an annual salary and others who are paid based on an hourly wage. We might choose to define a separate class for these two types of employee so that they could have different a ributes. For instance, only a salaried employee has a salary to be stored, and only an hourly-paid employee has an hourly wage to be recorded. The classes could also have different methods for the different ways that their pay is computed. This design overlooks something important: employees of both types have many things in common. For instance, they all have data like a name, address, and employee id. And even though their pay is computed differently, they all get paid. If we had two classes for the two kinds of employees, all the code for these common elements would be duplicated. This is not only redundant but error prone: if you find a bug or make another kind of improvement in one class, you may forget to make the same changes in the other class. Things get even worse if the company decides to add other kinds of employees, such as those working on commission. Inheritance to the rescue! A be er design would “factor out” the things that are common to all employees and write them once. This is what inheritance allows us to do, and here is how: We define a base class that includes the functionality that are common to all employees. We define a subclass for each specific type of employee. In each subclass, we declare that it is a kind of employee, which will cause it to “inherit” those common elements from the base class without having to define them itself. Terminology note: if class B is a subclass of class A, we also say that A is a superclass of B. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/inheritance/inheritance_methods.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/6 lOMoARcPSD|10729405 1/9/2021 2.4 Inheritance: Introduction and Methods In our running example of a company with different kinds of employees, we could define a base class Employee and two subclasses as follows (for now, we will leave the class bodies empty): class Employee: pass # We use the "(Employee)" part to mark SalariedEmployee as a subclass of Employee. class SalariedEmployee(Employee): pass # We use the "(Employee)" part to mark HourlyEmployee as a subclass of Employee. class HourlyEmployee(Employee): pass Inheritance terminology It’s useful to know that there are three ways to talk about classes that are in an inheritance relationship: base class, superclass, and parent class are synonyms. derived class, subclass, and child class are synonyms. Defining methods in a base class Now let’s fill in these classes, starting with the methods we want. Once we have that, we can figure out the data that will be needed to implement the methods. To keep our example simple, let’s say that we only need a method for paying an employee, and that it will just print out a statement saying when and how much the employee was paid. Here’s the outline for the Employee class with a pay method. class Employee: """An employee of a company. """ def pay(self, pay_date: date) -> None: """Pay this Employee on the given date and record the payment. (Assume this is called once per month.) """ pass If we try to write the body of the pay method, we run into a problem: it must compute the appropriate pay amount for printing, and that must be done differently for each type of employee. Our solution is to pull that part out into a helper method: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/inheritance/inheritance_methods.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/6 lOMoARcPSD|10729405 1/9/2021 2.4 Inheritance: Introduction and Methods class Employee: """An employee of a company. """ def get_monthly_payment(self) -> float: """Return the amount that this Employee should be paid in one month. Round the amount to the nearest cent. """ pass # We still have to figure this out. def pay(self, pay_date: date) -> None: """Pay this Employee on the given date and record the payment. (Assume this is called once per month.) """ payment = self.get_monthly_payment() print(f'An employee was paid {payment} on {date}.') Now method pay is complete, but we have the same problem with get_monthly_payment: it has to be different for each type of employee. Clearly, the subclasses must define this method, each in their own way. But we are going to leave the incomplete get_monthly_payment method in the Employee class, because it defines part of the interface that every type of Employee object needs to have. Subclasses will inherit this incomplete method, which they can redefine as appropriate. We’ll see how to do that shortly. Notice that we did as much as we could in the base class, to avoid repeating code in the subclasses. Making the base class abstract Because the Employee class has a method with no body, client code should not make instances of this incomplete class directly. We do two things to make this clear: Change the body of the incomplete method so that it simply raises a NotImplementedError. We call such a method an abstract method, and we call a class which has at least one abstract method an abstract class. Add a comment to the class docstring stating that the class is abstract, so that anyone writing client code is warned not to instantiate it. 1 1 While we won’t follow it in this course, a common Python convention you’ll encounter is to put the word “abstract” in the class name, e.g AbstractEmployee. Here is the complete definition of our class: class Employee: """An employee of a company. This is an abstract class. Only subclasses should be instantiated. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/inheritance/inheritance_methods.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/6 lOMoARcPSD|10729405 1/9/2021 2.4 Inheritance: Introduction and Methods """ def get_monthly_payment(self) -> float: """Return the amount that this Employee should be paid in one month. Round the amount to the nearest cent. """ raise NotImplementedError def pay(self, pay_date: date) -> None: """Pay this Employee on the given date and record the payment. (Assume this is called once per month.) """ payment = self.get_monthly_payment() print(f'An employee was paid {payment} on {pay_date}.') It is possible for client code to ignore the warning and instantiate this class—Python does not prevent it. But look at what happens when we try to call one of the unimplemented methods on the object: >>> a = Employee() >>> # This method is itself abstract: >>> a.get_monthly_payment() Traceback... NotImplementedError >>> # This method calls a helper method that is abstract: >>> a.pay(date(2018, 9, 30)) Traceback... NotImplementedError Subclasses inherit from the base class Now let’s fill in class SalariedEmployee, which is a subclass of Employee. Very importantly, all instances of SalariedEmployee are also instances of Employee. We can verify this using the built-in function isinstance: >>> # Here we see what isinstance does with an object of a simple built-in type. >>> isinstance(5, int) True >>> isinstance(5, str) False >>> # Now let's check how it works with objects of a type that we define. >>> fred = SalariedEmployee() >>> # fred's type is as we constructed it: SalariedEmployee. >>> # More precisely, the object that fred refers to has type SalariedEmployee. >>> type(fred) <class 'employee.SalariedEmployee'> >>> # In other words, the object is an instance of SalariedEmployee. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/inheritance/inheritance_methods.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/6 lOMoARcPSD|10729405 1/9/2021 2.4 Inheritance: Introduction and Methods >>> isinstance(fred, SalariedEmployee) True >>> # Here's the important part: it is also an instance of Employee. >>> isinstance(fred, Employee) True Because Python “knows” that fred is an instance of Employee, this object will have access to all methods of Employee! We say that fred inherits all of the Employee methods. So even if SalariedEmployee remains an empty class, its instances can still call the methods get_monthly_payment and pay, because they are inherited. class SalariedEmployee(Employee): pass >>> fred = SalariedEmployee() >>> # fred inherits Employee.get_monthly_payment, and so can call it. >>> # Of course, it raises an error when called, but it indeed is accessed. >>> fred.get_monthly_payment() Traceback... NotImplementedError Completing the subclass Our SalariedEmployee and HourlyEmployee subclasses each inherit two methods: pay and get_monthly_payment. The method pay is complete as it is, and is appropriate for all types of employees, so we needn’t do anything with it. However, get_monthly_payment needs a new definition that does not raise an error and that defines the behaviour appropriately for the particular kind of employee. We accomplish this simply by defining the method again in the subclass. 2 We say that this new method definition overrides the inherited definition: 2 For now, we will hard-code values in the method, but will generalize this later. class SalariedEmployee(Employee): def get_monthly_payment(self) -> float: # Assuming an annual salary of 60,000 return round(60000.0 / 12.0, 2) class HourlyEmployee(Employee): def get_monthly_payment(self) -> float: # Assuming a 160-hour work month and a $20/hour wage. return round(160.0 * 20.0, 2) >>> fred = SalariedEmployee() >>> fred.get_monthly_payment() 5000.0 >>> jerry = HourlyEmployee() >>> jerry.get_monthly_payment() 3200.0 https://www.teach.cs.toronto.edu/~csc148h/winter/notes/inheritance/inheritance_methods.html Downloaded by michael ayad (michael.maged2014@gmail.com) 5/6 lOMoARcPSD|10729405 1/9/2021 2.4 Inheritance: Introduction and Methods We now have a working version of all three classes, albeit a very limited one. Download and run the the thecode code codethat that thatwe’ve we’ve we’vewri wri wri en en enso so sofar far far. far You can experiment with it as you continue reading. How Python resolves a method name The interaction above includes the call fred.get_monthly_payment(). Since the name get_monthly_payment could refer to several possible methods (one in class Employee, one in class SalariedEmployee, and one in class HourlyEmployee), we say that the method name must be “resolved”. To understand inheritance, we need to know how Python handles method resolution in general. This is how it works: whenever code calls a.myMethod(), Python determines what type(a) is and looks in that class for myMethod. If myMethod is found in this class, that method is called; otherwise, Python next searches for myMethod in the superclass of the type of a, and then the superclass of the superclass, etc., until it either finds a definition of myMethod or it has exhausted all possibilities, in which case it raises an AttributeError. In the case of the call fred.get_monthly_payment(), type(fred) is SalariedEmployee, and SalariedEmployee contains a get_monthly_payment method. So that is the one called. This method call is more interesting: fred.pay(date(2018, 9, 30)). The value of type(fred) is SalariedEmployee, but class SalariedEmployee does not contain a pay method. So Python next checks in the superclass Employee, which does contain a pay method, so then that is called. Straightforward. But then inside Employee.pay, we have the call self.get_monthly_payment(). Which get_monthly_payment is called? We’re already executing a method (pay) inside the Employee class, but that doesn’t mean we call Employee.get_monthly_payment. 3 Remember the rule: type(self) determines which class 3 Good thing, because it raises an error! Python first looks in for the method. At this point, self is fred, whose type is SalariedEmployee, and that class contains a get_monthly_payment method. So in this case, when Employee.pay calls self.get_monthly_payment(), it gets SalariedEmployee.get_monthly_payment. CSC148 CSC148 CSC148Notes Notes NotesTable Table Tableof of ofContents Contents Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/inheritance/inheritance_methods.html Downloaded by michael ayad (michael.maged2014@gmail.com) 6/6 lOMoARcPSD|10729405 1/9/2021 2.5 Inheritance: Attributes and Initializers 2.5 Inheritance: A ributes and Initializers Let’s return to the thepayroll payrollcode codewe wewrote wrote and generalize it from hard-coded values to the payroll code we wrote instance a ributes. This will allow us to customize individual employees with their own annual salaries or hourly wages. Documenting a ributes Just as the base class contains methods (even abstract ones!) that all subclasses need to have in common, the base class also documents a ributes that all subclasses need to have in common. Both are a fundamental part of the public interface of a class. We decided earlier that the application would need to record an id and a name for all employees. Here’s how we document that in the base class: 1 1 We put an underscore at the end of the a ribute id_ in order to distinguish it from the built-in function id. class Employee: """An employee of a company. === Attributes === id_: This employee's ID number. name: This employee's name. """ id_: int name: str Defining an initializer in the abstract superclass Even though abstract classes should not be instantiated directly, we provide an initializer in the superclass to initialize the common a ributes. class Employee: def __init__(self, id_: int, name: str) -> None: """Initialize this employee. """ self.id_ = id_ self.name = name def get_monthly_payment(self) -> float: """Return the amount that this Employee should be paid in one month. Round the amount to the nearest cent. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/inheritance/inheritance_attributes.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/6 lOMoARcPSD|10729405 1/9/2021 2.5 Inheritance: Attributes and Initializers """ raise NotImplementedError def pay(self, date: str) -> None: """Pay this Employee on the given date and record the payment. (Assume this is called once per month.) """ payment = self.get_monthly_payment() print(f'An employee was paid {payment} on {date}.') Inheriting the initializer in a subclass Because the initializer is a method, it is automatically inherited by all Employee subclasses just as, for instance, pay is. >>> # Assuming SalariedEmployee does not override Employee.__init__, >>> # that method is called when we construct a SalariedEmployee. >>> fred = SalariedEmployee(99, 'Fred Flintstone') >>> # We can see that Employee.__init__ was called, >>> # and the two instance attributes have been initialized. >>> fred.name 'Fred Flintstone' >>> fred.id_ 99 Just as with all other methods, for each subclass, we must decide whether the inherited implementation is suitable for our class, or whether we want to override it. In this case, the inherited initializer is not suitable, because each subclass requires that additional instance a ributes be initialized: For each SalariedEmployee we need to keep track of the employee’s salary, and for each HourlyEmployee we need to keep track of their number of work hours per week and their hourly wage. Certainly we could override and replace the inherited initializer, and in its body copy the code from Employee.__init__: class SalariedEmployee(Employee): def __init__(self, id_: int, name: str, salary: float) -> None: self.id_ = id_ # Copied from Employee.__init__ self.name = name # Copied from Employee.__init__ self.salary = salary # Specific to SalariedEmployee class HourlyEmployee(Employee): def __init__(self, id_: int, name: str, hourly_wage: float, hours_per_month: float) -> None: self.id_ = id_ # Copied from Employee.__init__ self.name = name # Copied from Employee.__init__ https://www.teach.cs.toronto.edu/~csc148h/winter/notes/inheritance/inheritance_attributes.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/6 lOMoARcPSD|10729405 1/9/2021 2.5 Inheritance: Attributes and Initializers self.hourly_wage = hourly_wage self.hours_per_month = hours_per_month # Specific to HourlyEmployee # Specific to HourlyEmployee This is not a very satisfying solution because the first two lines of each initializer are duplicated—and for more complex abstract base classes, the problem would be even worse! Since the inherited initializer does part of the work by initializing the a ributes that all employees have in common, we can instead use Employee.__init__ as a helper method. In other words, rather than override and replace this method, we will override and extend it. As we saw briefly last week, we use the superclass name to access its method: 2 2 Python has a much more powerful mechanism for accessing the superclass without naming it directly. It involves the built-in super function, but this is beyond the scope of this course. class SalariedEmployee(Employee): def __init__(self, id_: int, name: str, salary: float) -> None: # Note that to call the superclass initializer, we need to use the # full method name '__init__'. This is the only time you should write # '__init__' explicitly. Employee.__init__(self, id_, name) self.salary = salary In the subclasses, we also need to document each instance a ribute and declare its type. 3 3 In this course, we also include type annotations from the parent class. For a technical reason, the current version of python_ta sometimes complains when these type annotations are missing. Here are the complete subclasses: class SalariedEmployee(Employee): """ === Attributes === salary: This employee's annual salary === Representation invariants === - salary >= 0 """ id_: int name: str salary: float def __init__(self, id_: int, name: str, salary: float) -> None: # Note that to call the superclass initializer, we need to use the # full method name '__init__'. This is the only time you should write # '__init__' explicitly. Employee.__init__(self, id_, name) self.salary = salary def get_monthly_payment(self) -> float: return round(self.salary / 12, 2) https://www.teach.cs.toronto.edu/~csc148h/winter/notes/inheritance/inheritance_attributes.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/6 lOMoARcPSD|10729405 1/9/2021 2.5 Inheritance: Attributes and Initializers class HourlyEmployee(Employee): """An employee whose pay is computed based on an hourly rate. === Attributes === hourly_wage: This employee's hourly rate of pay. hours_per_month: The number of hours this employee works each month. === Representation invariants === - hourly_wage >= 0 - hours_per_month >= 0 """ id_: int name: str hourly_wage: float hours_per_month: float def __init__(self, id_: int, name: str, hourly_wage: float, hours_per_month: float) -> None: Employee.__init__(self, id_, name) self.hourly_wage = hourly_wage self.hours_per_month = hours_per_month def get_monthly_payment(self) -> float: return round(self.hours_per_month * self.hourly_wage, 2) We can see that when we construct an instance of either subclass, both the common instance a ributes (name and id_) and the subclass-specific a ributes are initialized: >>> fred = SalariedEmployee(99, 'Fred Flintstone', 60000.0) >>> fred.name 'Fred Flintstone' >>> fred.salary 60000 >>> barney = HourlyEmployee(23, 'Barney Rubble', 1.25, 50.0) >>> barney.name 'Barney Rubble' >>> barney.hourly_wage 1.25 >>> barney.hours_per_month 50.0 We have now completed the second secondversion versionof ofthe thecode code. code Download it so that you can second version of the code experiment with it as you continue reading. Subclasses inherit methods, not a ributes https://www.teach.cs.toronto.edu/~csc148h/winter/notes/inheritance/inheritance_attributes.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/6 lOMoARcPSD|10729405 1/9/2021 2.5 Inheritance: Attributes and Initializers It may seem that our two subclasses have “inherited” the a ributes documented in the Employee class 4. But remember that a type annotation does not create a variable. Consider 4 In many other languages, instance a ributes are inherited. this example: >>> fred = SalariedEmployee(99, 'Fred Flintstone', 60000.0) >>> fred.name 'Fred Flintstone' The only reason that fred has a name a ribute is because the SalariedEmployee initializer explicitly calls the Employee initializer, which initializes this a ribute. A superclass initializer is not called automatically when a subclass instance is created. If we remove this call from our example, we see that the two a ributes name and id_ are missing: class SalariedEmployee(Employee): def __init__(self, id_: int, name: str, salary: float) -> None: # Superclass call commented out: # Employee.__init__(self, id_, name) self.salary = salary >>> fred = SalariedEmployee('Fred Flintstone') >>> fred.name AttributeError Initializers with different signatures Notice that the signatures for Employee.__init__ and SalariedEmployee.__init__ are different. SalariedEmployee.__init__ has an additional parameter for the salary. This makes sense. We should be able to configure each salaried employee with their own salary, but it is irrelevant to other types of employee, who don’t have a salary. Because abstract classes aren’t meant to be instantiated directly, their initializers are considered private, and so can be freely overridden and have their signatures changed in each subclass. This offers flexibility in specifying how subclasses are created, and in fact it is often the case that different subclasses of the same abstract class will have different initializer signatures. However, subclass initializers should always call the initializer of their superclass! It turns out that Python allows us to change the signature of any method we override, not just __init__. However, as we’ll discuss in the next section, in this course we’ll use inheritance to define interfaces that your subclasses should implement. Because a function signature is a crucial part of its interface, you should not do this for uses of inheritance in this course. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/inheritance/inheritance_attributes.html Downloaded by michael ayad (michael.maged2014@gmail.com) 5/6 lOMoARcPSD|10729405 1/9/2021 2.5 Inheritance: Attributes and Initializers CSC148 CSC148 CSC148Notes Notes NotesTable Table Tableof of ofContents Contents Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/inheritance/inheritance_attributes.html Downloaded by michael ayad (michael.maged2014@gmail.com) 6/6 lOMoARcPSD|10729405 1/9/2021 2.6 Inheritance: Thoughts on Design 2.6 Inheritance: Thoughts on Design Now that you understand the mechanics of inheritance, let’s go back and make some final comments. Four things we can do with an inherited method When a subclass inherits a method from its superclass, there are four things we can choose to do in the subclass. 1. Simply inherit an implemented method If a method has been implemented in the superclass and its behaviour is appropriate for the subclass, then we can simply use this behaviour by choosing not to override the method. For example, HourlyEmployee does not define a pay method, so it simply inherits the pay method from Employee. Any time we call pay on an instance of HourlyEmployee, the Employee.pay method is called. Of course, we should never do this when the method is abstract, because a subclass must override every abstract method to implement it properly. 2. Override an abstract method to implement it When a method has not been implemented in the superclass (its body is just raise NotImplementedError), the method must be overridden in the subclass in order to provide an implementation. 1 For example, SalariedEmployee and HourlyEmployee must both implement 1 This is not a requirement of the Python language, but is a feature of the way we are using inheritance. In other uses of inheritance, implementation can be deferred to a subclass of the subclass or a class further down in the inheritance chain. the abstract get_monthly_payment method. 3. Override an implemented method to replace it If a method has been implemented in the superclass, but the subclass requires a different behaviour, the subclass can override the method and provide a completely different implementation. This is something we haven’t yet seen, but is very simple. For example, we could override the pay method in SalariedEmployee: class SalariedEmployee(Employee): def get_monthly_payment(self) -> float: # Assuming an annual salary of 60,000 return round(60000 / 12, 2) https://www.teach.cs.toronto.edu/~csc148h/winter/notes/inheritance/inheritance_design.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/4 lOMoARcPSD|10729405 1/9/2021 2.6 Inheritance: Thoughts on Design def pay(self, pay_date: date) -> None: print('Payment rejected! Mwahahahaha.') >>> fred = SalariedEmployee() >>> fred.pay(date(2017, 9, 30)) Payment rejected! Mwahahahaha. 4. Override an implemented method to extend it Sometimes we want the behaviour that was defined in the superclass, but we want to add some other behaviour. In other words, we want to extend the inherited behaviour. We have witnessed this in the initializers for our payroll system. The Employee initializer takes care of instance a ributes that are common to all employees. Rather than repeat that code, each subclass initializer calls it as a helper and then has additional code to initialize additional instance a ributes that are specific to that subclass. We can extend any inherited method, not just an initializer. Here’s an example. Suppose at pay time we wanted to print out two messages, the original one from Employee, and also a SalariedEmployee-specific message. Since we already have a superclass method that does part of the work, we can call it as a helper method instead of repeating its code: class SalariedEmployee(Employee): def pay(self, pay_date: date) -> None: Employee.pay(self, pay_date) # Call the superclass method as a helper. print('Payment accepted! Have a nice day. :)') >>> fred = SalariedEmployee() >>> fred.pay(date(2017, 9, 30)) An employee was paid 3200 on September 30, 2017. Payment accepted! Have a nice day. :) Using inheritance to define a shared public interface Our use of inheritance allows client code to do the same thing for all types of employee. Here’s an example where we iterate over a list of employees and call method pay on each, without regard to what kind of Employee each one is: >>> employees = [ SalariedEmployee(14, 'Fred Flintstone', 5200.0), HourlyEmployee(23, 'Barney Rubble', 1.25, 50.0), SalariedEmployee(99, 'Mr Slate', 120000.0) ] for e in employees: # At this point, we don't know what kind of employee e is. # It doesn't matter, because they all share a common interface, # as defined by class Employee! https://www.teach.cs.toronto.edu/~csc148h/winter/notes/inheritance/inheritance_design.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/4 lOMoARcPSD|10729405 1/9/2021 2.6 Inheritance: Thoughts on Design # For example, they all have a pay method. e.pay(date(2018, 8, 31)) In other words, the client can write code to an interface defined once in the abstract class that will work for any of its subclasses—even ones that we haven’t thought of yet! This is very powerful. If we couldn’t treat all kinds of employees the same way, we would need an if block that checks what kind of employee we have and does the specific thing that is appropriate for each kind. Much messier, especially if there are more than one or two subclasses! We say that the Employee class represents the shared public interface of classes SalariedEmployee and HourlyEmployee. The public interface of a class is the way client code interacts with the methods and a ributes of the class. It’s a “shared” public interface in the sense that it is held in common between SalariedEmployee and HourlyEmployee. We say that class Employee is polymorphic, 2, to signify that it can take different forms: as a 2 The roots of this word are poly, which means “many”, and morphe, which means “form”. So “polymorphic” literally means “taking many forms”. SalariedEmployee or an HourlyEmployee. Abstract classes are useful The Employee class is abstract, and client code should never instantiate it. Is it therefore useless? No, quite the opposite! We’ve already seen that it defines a shared public interface that client code can count on, and as a result, supports polymorphism. Furthermore, polymorphic client code will continue to work even if new subclasses are wri en in the future! Our abstract Employee class is useful in a second way. If and when someone does decide to write another subclass of Employee, for instance for employees who are paid a commission, the programmer knows that the abstract method get_monthly_payment must be implemented. In other words, they must support the shared public interface that the client code counts on. We can think of this as providing helpful guidance for the programmer writing the new subclass. When to use inheritance We’ve seen some benefits of inheritance. However, inheritance isn’t perfect for every situation. Don’t forget the other kind of relationship between classes that we’ve seen: composition. For example, to represent people who are car owners, a Person object might have an a ribute car which stores a reference to a Car object. We wouldn’t use inheritance to represent the relationship between Person and Car! https://www.teach.cs.toronto.edu/~csc148h/winter/notes/inheritance/inheritance_design.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/4 lOMoARcPSD|10729405 1/9/2021 2.6 Inheritance: Thoughts on Design Composition is commonly thought of as a “has a” relationship. For example, a person “has a” car. Inheritance is thought of as an “is a” relationship. For example, a salaried employee “is an” employee. Of course, the “has a” vs. “is a” categorization is rather simplistic, and not every real-world problem is so clearly defined. When we use inheritance, any change in a superclass affects all of its subclasses, which can lead to unintended effects. To avoid this complexity, in this course we’ll stick to using inheritance in the traditional “shared public interface” sense. Moreover, we will often prefer that a subclass not change the public interface of a superclass at all: not by changing the interface of any public methods (e.g., adding/removing parameters, or changing their types) not by adding new public methods or a ributes to a subclass (of course, adding private a ributes or methods is acceptable) As a general programming concept, inheritance has many other uses, and you’ll learn about some of them in CSC207, Software Design. CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/inheritance/inheritance_design.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/4 lOMoARcPSD|10729405 1/9/2021 2.7 The object Class and Python Special Methods 2.7 The object Class and Python Special Methods Now that we understand inheritance, we can gain a deeper understanding of some of Python’s fundamental special methods. The object superclass In our very first lecture, we described every piece of data as an object, and have continued to use this term throughout this course. It turns out that “object” is not merely a theoretical concept, but made explicit in the Python language. In Python, we have a class called object, which is an ancestor of every other class, both built-in classes like int or our userdefined classes like Employee. 1 1 By “ancestor” we mean either a parent class, or a parent of a parent class, etc. Inheriting special methods This object class gives default implementations for many special methods we have seen before, including: __init__, which allows us to create instances of a class even when the class body is empty—it’s not magic, our classes simply inherit object.__init__! So every time we define __init__ within our own class, we are actually overriding object.__init__. __str__, which is called when you call either str or print on an object. The default implementation? Identifying the class name and a location in your computer’s memory: >>> class Donut: ... pass ... >>> d1 = Donut() >>> print(d1) <Donut object at 0x103359828> __eq__, whose default implementation simply uses is to compare two objects. Keep in mind that even though these methods are called “special”, overriding them in your classes works in the exact same way as other methods: simply define a method with the specific name of that special method. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/inheritance/python_special_methods.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/2 lOMoARcPSD|10729405 1/9/2021 2.7 The object Class and Python Special Methods CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/inheritance/python_special_methods.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/2 lOMoARcPSD|10729405 1/9/2021 3.1 Introduction to Abstract Data Types 3.1 Introduction to Abstract Data Types In the first few weeks of the course, we have mainly played the role of the class designer and implementer. However, you have actually spent most of your programming career in the opposite role: whenever you use one of Python’s built-in functions or data structures, you only worry about what it does, not how it works. In other words, you are a client of the built-in Python libraries and classes. Some concepts are so general and so useful across many problems that they transcend any specific programming language. An abstract data type (or ADT) defines some kind of data and the operations that can be performed on it. It is a pure interface, with no mention of an implementation—that’s what makes it abstract. In contrast to this, a data structure is a concrete strategy for storing some data. For example, one data structure we could use to store the grades of a class on a series of course activities is a list of lists: grades = [['Sadia', 78, 82], ['Yuan', 75, 64], ['Elise', 80, 71]] # To know what the "columns" are for, # we could use another list: items = ['A1', 'midterm'] An alternative data structure is a dictionary of dictionaries: g2 = {'A1': {'Sadia': 78, 'Yuan': 75, 'Elise': 80}, 'midterm': {'Sadia': 82, 'Yuan': 64, 'Elise': 71}} You can very likely think of other options, and may find it interesting to consider pros and cons of the two data structures above. But the point here is that ADTs are fundamentally concerned with the what: what data is stored, and what we can do with this data. Data structures are concerned with the how: how is that data stored, and how do we actually implement our desired methods to operate on this data? This distinction is crucial to the kind of abstract thinking you’ll develop as programmers: by separating the what from the how, you’ll gain substantial benefits, as we are about to learn. Some famous abstract data types In this section, we’ll briefly describe some of the most common abstract data types in computer science. Now, while computer scientists generally agree on what the “main” abstract data types are, they often disagree on what operations each one actually supports. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/abstract-data-types/introduction.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/3 lOMoARcPSD|10729405 1/9/2021 3.1 Introduction to Abstract Data Types You’ll notice here that we’ve taken a fairly conservative approach for specifying operations, limiting ourselves to the most basic ones. Set Data: a collection of unique elements Operations: get size, insert a value (without introducing duplicates), remove a specified value, check membership Multiset Data: a collection of elements (possibly with duplicates) Operations: same as Set, but the insert operation allows duplicates List Data: an ordered sequence of elements Operations: access element by index, insert a value at a given index, remove a value at a given index Map Data: a collection of key-value pairs, where each key is unique and associated with a single value Operations: lookup a value for a given key, insert a new key-value pair, remove a key-value pair, update the value associated with a given key Iterable Data: a collection of values (may or may not be unique) Operations: iterate through the elements of the collection one at a time. Many of these will sound familiar, as you’ll have used them in your past programming experience, even if you haven’t heard the term “abstract data type” before! For example, you have probably used both a Python list and dict. However, there is a really important distinction between Python’s built-in classes and the ADTs we’ve listed above: list and dict are data structures, not abstract data types. That is, they’re concrete implementations, and every necessary decision about how these classes store their data and implement their methods has been made. Indeed, the designers of the Python programming language put a great deal of effort into these implementations so that list and dict operations are extremely fast. So a dict, for instance, is not itself an ADT. But it is fair to say that a dict is a natural implementation of the Map ADT. However, there is NOT a one-to-one correspondence between ADTs and data structures, in Python or any other language. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/abstract-data-types/introduction.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/3 lOMoARcPSD|10729405 1/9/2021 3.1 Introduction to Abstract Data Types A single ADT can be implemented by many different data structures. For example, although the Python list is a natural implementation of the List ADT, we could implement it instead with a dict in which each key is the index of its item. A list of 3 elements, hello, 42, and goodbye in positions 0, 1, and 2 respectively, would be {0: 'hello', 1: 42, 2: 'goodbye'} On the flip side, each data structure can be used to implement multiple ADTs. The Python list can be used to implement not just the List ADT, but each of the other above ADTs as well. For instance, think about how you would implement the Set ADT with a list, and in particular, how you would avoid duplicates. 1 A dict could also implement any of the 1 Beginning Python programmers often implement the Set ADT with a list, but Python has a built-in set class that implements the Set ADT, does all the work of duplicate-avoidance for you, and does it very efficiently. ADTs above, and the same is true of the new data structures you will learn in this course. The value of knowing all the standard ADTs We’ve said that each ADT is so general that it transcends any individual problem or program or even programming language. And in fact the ADTs given above are implemented in other programming languages (to varying degrees). While the exact data structures used to implement them vary significantly from language to language, these ADTs concepts form a common vocabulary whose understanding is necessary to being a professional computer scientist. CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/abstract-data-types/introduction.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/3 lOMoARcPSD|10729405 1/9/2021 3.2 Stacks and Queues 3.2 Stacks and Queues To round out our study of ADTs, we’ll learn about two new ADTs this week: the Stack and the Queue. Both of these ADTs store a collection of items, and support operations to add an item and remove an item. However, unlike a Set or Multiset, in which the client code may specify which item to remove, Stacks and Queues remove and return their items in a fixed order—client code is allowed no choice. This might seem very restrictive and simplistic, but you’ll soon learn how the power of these ADTs lies in their simplicity. Once you learn about them, you’ll start seeing them everywhere, and be able to effectively communicate about these ADTs to any other computer scientist. The Stack ADT The Stack ADT is very simple. A stack contains zero or more items. When you add an item, it goes “on the top” of the stack (we call this “pushing” onto the stack) and when you remove an item, it is removed from the top also (we call this “popping” from the stack). 1 1 The name “stack” is a deliberate metaphor for a stack of cafeteria trays or books on a table. The net effect is that the first item added to the stack is the last item removed. We call this Last-In-First-Out (or LIFO) behaviour. To summarize: Stack Data: a collection of items Operations: determine whether the stack is empty, add an item (push), remove the most recently-added item (pop) In code: class Stack: """A last-in-first-out (LIFO) stack of items. Stores data in last-in, first-out order. When removing an item from the stack, the most recently-added item is the one that is removed. """ def __init__(self) -> None: """Initialize a new empty stack.""" def is_empty(self) -> bool: """Return whether this stack contains no items. >>> s = Stack() >>> s.is_empty() True https://www.teach.cs.toronto.edu/~csc148h/winter/notes/abstract-data-types/stacks_and_queues.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/4 lOMoARcPSD|10729405 1/9/2021 3.2 Stacks and Queues >>> s.push('hello') >>> s.is_empty() False """ def push(self, item: Any) -> None: """Add a new element to the top of this stack. """ def pop(self) -> Any: """Remove and return the element at the top of this stack. >>> s = Stack() >>> s.push('hello') >>> s.push('goodbye') >>> s.pop() 'goodbye' """ At this point in CSC148, you may immediately picture implementing this with a Python list. You may be wondering “which end of the list is the top of the stack?” But this is irrelevant when you are using the ADT. You are much be er off thinking of a pile of objects stacked up. When you are a client of a stack, you don’t need to know the implementation. The reduction in your cognitive load that the abstraction brings is very important. Without it, complex, modern software would not be possible. The Queue ADT Another important ADT is a Queue. Like a stack, a queue contains zero or more items, but items come out of a queue in the order in which they entered. In other words, a queue exhibits First-in-First-Out (FIFO) behaviour. The lineup at the corner store is—one hopes— a queue. We call adding an item to a queue an enqueue operation, and the removal of an item a dequeue operation. Queue Data: a collection of items Operations: determine whether the queue is empty, add an item (enqueue), remove the least recently-added item (dequeue) List-based implementation of the Stack ADT In this section, we’ll now implement the Stack ADT using a built-in Python data structure: the list. Note that here, we’ve chosen to use the end of the list to represent the top of the stack. This wasn’t the only viable option! class Stack: """A last-in-first-out (LIFO) stack of items. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/abstract-data-types/stacks_and_queues.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/4 lOMoARcPSD|10729405 1/9/2021 3.2 Stacks and Queues Stores data in first-in, last-out order. When removing an item from the stack, the most recently-added item is the one that is removed. """ # === Private Attributes === # _items: # The items stored in the stack. The end of the list represents # the top of the stack. _items: List def __init__(self) -> None: """Initialize a new empty stack. """ self._items = [] def is_empty(self) -> bool: """Return whether this stack contains no items. >>> s = Stack() >>> s.is_empty() True >>> s.push('hello') >>> s.is_empty() False """ return self._items == [] def push(self, item: Any) -> None: """Add a new element to the top of this stack. """ self._items.append(item) def pop(self) -> Any: """Remove and return the element at the top of this stack. >>> s = Stack() >>> s.push('hello') >>> s.push('goodbye') >>> s.pop() 'goodbye' """ return self._items.pop() We’ll leave a list-based Queue implementation as an exercise for this week’s lab. Abstraction is critical Abstraction is one of the most powerful concepts in computer science. An ADT is an abstraction so general it transcends any specific programming language. This kind of abstraction has profound implications. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/abstract-data-types/stacks_and_queues.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/4 lOMoARcPSD|10729405 1/9/2021 3.2 Stacks and Queues Looking at the class from the outside, a programmer writing client code needs to understand only its public interface. This frees them to focus on what they want to do with the class and ignore everything about how it is implemented. 2 If a client creates a Stack 2 Imagine if every time you wanted to do s.split() you had to think through how your string s was represented and how the split method worked. It would be a huge distraction from your real task. object in their code, they know there are exactly three operations that can be performed on it: checking whether it’s empty, pushing an item onto it, and popping an item from it. This reduces cognitive load for the programmer dramatically. Modern, complex software would be impossible otherwise. Looking at the class from the inside, the implementer has complete freedom to change implementation details with no effect on client code. For example, the software can be redesigned to be more efficient, or more elegant (and maintainable). The entire implementation can even change, and every program that uses the class will still work exactly the same as before. We call this “plug-out, plug-in compatibility.” CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/abstract-data-types/stacks_and_queues.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/4 lOMoARcPSD|10729405 1/9/2021 3.3 Exceptions 3.3 Exceptions Right now, our stack implementation raises an unfortunate error when client code calls pop on an empty Stack: 1 1 Why is this bad from the client code’s perspective? >>> s = Stack() >>> s.pop() Traceback (most recent call last): File "<input>", line 1, in <module> File "...", line 58, in pop return self._items.pop() IndexError: pop from empty list Let’s look at some alternatives. Alternative: fail silently One simple improvement is to make the code “fail silently”, making sure to document this behaviour in the method docstring: def pop(self) -> Any: """Remove and return the element at the top of this stack. Do nothing if this stack is empty. >>> s = Stack() >>> s.push('hello') >>> s.push('goodbye') >>> s.pop() 'goodbye' """ if not self.is_empty(): return self._items.pop() Because the client code in this case expects a value to be returned, it could use the “no return value” as a sign that something bad happened. However, this approach doesn’t work for all methods; for example, push never returns a value, not even when all goes well, so failing silently would not alert the client code to a problem until potentially hundreds of lines of code later. And in pop, which does return a value, if we treat None as an indication of an error we can never allow client code to push the value None. There may be clients who want to be able to do that, and to recognize it as a legitimate value when it is popped off again. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/abstract-data-types/exceptions.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/6 lOMoARcPSD|10729405 1/9/2021 3.3 Exceptions Alternative: Raise a user-defined exception A be er solution is to raise an error when something has gone wrong, so that the client code has a clear signal. We want the errors to be descriptive, yet not to reveal any implementation details. We can achieve this very easily in Python: we define our own type of error by making a subclass of a built-in class called Exception. For example, here’s how to define our own kind of Exception called EmptyStackError: class EmptyStackError(Exception): """Exception raised when calling pop on an empty stack.""" pass We call this a user-defined exception. Here’s how we’ll use EmptyStackError in our pop method: def pop(self) -> Any: """Remove and return the element at the top of this stack. Raise an EmptyStackError if this stack is empty. >>> s = Stack() >>> s.push('hello') >>> s.push('goodbye') >>> s.pop() 'goodbye' """ if self.is_empty(): raise EmptyStackError else: return self._items.pop() >>> s = Stack() >>> s.pop() Traceback (most recent call last): File "<input>", line 1, in <module> File "...", line 60, in pop raise EmptyStackError EmptyStackError When we want an EmptyStackError to happen, we construct an instance of that new class and raise it. We have already seen the raise keyword in the context of unimplemented methods in abstract superclasses. It turns out that this mechanism is very flexible, and can be used anywhere in our code to raise exceptions, even ones that we’ve defined ourselves. Notice that the line which is shown to the client is just this simple raise statement; it doesn’t mention any implementation details of the class. And it specifies that an EmptyStackError was the problem. Defining and raising our own errors enables us to give descriptive messages to the user when they have used our class incorrectly. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/abstract-data-types/exceptions.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/6 lOMoARcPSD|10729405 1/9/2021 3.3 Exceptions Customizing the error message One current limitation of the above approach is that simply the name of the error class is not necessarily enough to convey a user-friendly error message. We can change this by overriding the inherited __str__ method in our class: class EmptyStackError(Exception): """Exception raised when calling pop on an empty stack.""" def __str__(self) -> str: """Return a string representation of this error.""" return 'You called pop on an empty stack. :(' >>> s = Stack() >>> s.pop() Traceback (most recent call last): File "<input>", line 1, in <module> File "...", line 60, in pop raise EmptyStackError EmptyStackError: You called pop on an empty stack. :( Exceptions interrupt the normal flow of control The normal flow of control in a program involves pushing a stack frame whenever a function is called, and popping the current (top) stack frame when we reach a return or reach the end of the function/method. When an exception is raised, something very different happens: immediately, the function ends and its stack frame is popped, sending the exception back to the caller, which in turn ends immediately, sending the exception back to its caller, and so on until the stack is empty. At that point, an error message specifying the exception is printed, and the program stops. In fact, when this happens, much more information is printed. For every stack frame that is popped, there was a function/method that had been running and was at a particular line. The output shows both the line number and line of code. For example, here is a module that defines two useful methods and then a very silly one, mess_about, whose sole purpose is to demonstrate how exceptions work: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/abstract-data-types/exceptions.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/6 lOMoARcPSD|10729405 1/9/2021 3.3 Exceptions Because mess_about clears the stack, the call to second_from_top is guaranteed to fail when it tries to pop even one thing from the stack. At the moment of failure, we are executing pop, and beneath it on the call stack are second_from_top, mess_about, and the main block of the module, all on pause and waiting to finish their work. When pop raises an EmptyStackError, we see a full report: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/abstract-data-types/exceptions.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/6 lOMoARcPSD|10729405 1/9/2021 3.3 Exceptions You have undoubtedly seen this kind of error report many times. Now you should be able to use it as a treasure trove of information about what went wrong. Handling exceptions more elegantly Your code can be wri en in a way that takes responsibility for “catching” and handling exceptions. Catching an exception and taking an appropriate action instead of allowing your code to crash is a much more elegant way of dealing with errors because it shields the user from seeing errors that they should never see, and allows the program to continue. Although we will go through a few examples to give you an idea of how to catch and handle exceptions, please make sure to read the python documentation on exceptions exceptions. exceptions You exceptions should carefully read the sections on handling handlingexceptions exceptions and defining definingclean-up clean-upactions actions. actions handling exceptions defining clean-up actions Consider a simple example of asking for input from the user in the form of an integer number, and testing if the number is a divisor of 42. We need to make sure that the input is well-formed. That means that we should make sure that it is indeed an integer, as well as check that the number is not going to result in a division by zero. Here is an example of how to catch and handle exceptions in a graceful way in this context: if __name__ == '__main__': option = 'y' while option == 'y': value = input('Give me an integer to check if it is a divisor of 42: ') try: is_divisor = (42 % int(value) == 0) print(is_divisor) except ZeroDivisionError: print("Uh-oh, invalid input: 0 cannot be a divisor of any number!") except ValueError: print("Type mismatch, expecting an integer!") finally: print("Now let's try another number...") option = input('Would you like to continue (y/n): ') https://www.teach.cs.toronto.edu/~csc148h/winter/notes/abstract-data-types/exceptions.html Downloaded by michael ayad (michael.maged2014@gmail.com) 5/6 lOMoARcPSD|10729405 1/9/2021 3.3 Exceptions In the context of our stack, we can similarly handle an EmptyStackError in a graceful manner. We do not necessarily have to print a message to the user (although we do in the code below), but we must document this exceptional circumstance in the docstring and we must change the return type from a str to Optional[str]. def second_from_top(s: Stack) -> Optional[str]: """Return a reference to the item that is second from the top of s. Do not change s. If there is no such item in the Stack, returns None. """ hold2 = None try: # Pop and remember the top 2 items in s. hold1 = s.pop() hold2 = s.pop() # Push them back so that s is exactly as it was. s.push(hold2) s.push(hold1) # Return a reference to the item that was second from the top. except EmptyStackError: print("Cannot return second from top, stack empty") return hold2 CSC148 CSC148 CSC148Notes Notes NotesTable Table Tableof of ofContents Contents Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/abstract-data-types/exceptions.html Downloaded by michael ayad (michael.maged2014@gmail.com) 6/6 lOMoARcPSD|10729405 1/9/2021 3.4 Analysing Program Running Time 3.4 Analysing Program Running Time Here is an alternate way of implementing the Stack ADT based on lists, using the front of the list to represent the top of the stack. class Stack2: """Alternate stack implementation. This implementation uses the *front* of the Python list to represent the top of the stack. """ # === Private Attributes === # _items: # The items stored in the stack. The front of the list represents # the top of the stack. _items: List def __init__(self) -> None: """Initialize a new empty stack.""" self._items = [] def is_empty(self) -> bool: """Return whether this stack contains no items. >>> s = Stack() >>> s.is_empty() True >>> s.push('hello') >>> s.is_empty() False """ return self._items == [] def push(self, item: Any) -> None: """Add a new element to the top of this stack.""" self._items.insert(0, item) def pop(self) -> Any: """Remove and return the element at the top of this stack. Raise an EmptyStackError if this stack is empty. >>> s = Stack() >>> s.push('hello') >>> s.push('goodbye') >>> s.pop() 'goodbye' https://www.teach.cs.toronto.edu/~csc148h/winter/notes/abstract-data-types/efficiency.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/6 lOMoARcPSD|10729405 1/9/2021 3.4 Analysing Program Running Time """ if self.is_empty(): raise EmptyStackError else: return self._items.pop(0) Even though this implementation seems to be conceptually the same as the first (one uses the back of the list, the other uses the front), it turns out their runtime performance is quite different! A simple time profiler By making use of a built-in Python library called timeit, we can easily get rough estimates of how long our code takes to run. The key function we import is called timeit, 1 which 1 Yes, this function has the same name as the library itself. This is actually fairly common. takes in a piece of Python code to execute, and returns a float representing the amount of time it took to execute it. We illustrate the use of the timeit function in the following example: def push_and_pop(s: Stack) -> None: """Push and pop a single item onto <stack>. This is simply a helper for the main timing experiment. """ s.push(1) s.pop() if __name__ == '__main__': # Import the main timing function. from timeit import timeit # The stack sizes we want to try. STACK_SIZES = [1000, 10000, 100000, 1000000, 10000000] for stack_size in STACK_SIZES: # Uncomment the stack implementation that we want to time. stack = Stack() # stack = Stack2() # Bypass the Stack interface to create a stack of size <stack_size>. # This speeds up the experiment, but we know this violates encapsulation! stack._items = list(range(stack_size)) # Call push_and_pop(stack) 1000 times, and store the time taken in <time>. # The globals=globals() is used for a technical reason that you can ignore. time = timeit('push_and_pop(stack)', number=1000, globals=globals()) https://www.teach.cs.toronto.edu/~csc148h/winter/notes/abstract-data-types/efficiency.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/6 lOMoARcPSD|10729405 1/9/2021 3.4 Analysing Program Running Time # Finally, report the result. The :>8 is used to right-align the stack size # when it's printed, leading to a more visually-pleasing report. print(f'Stack size {stack_size:>8}, time {time}') Running this code on a Stack and a Stack2 instance illustrates a stark difference between these two classes. While the Stack instance seems to take the same amount of time per operation regardless of how many items are on the stack, the Stack2 class seems to have the amount of time grow with the number of items on the stack. In fact, the amount of time required in a Stack2 operation is roughly proportional to the size of the stack, while the amount of time required in Stack is independent of the size of the stack! Memory allocation for lists in Python To understand why there’s such a dramatic difference, we really need to understand how Python lists are stored in memory. Recall that a variable in Python stores a reference to an object. A Python list is a special type of object that contains an ordered sequence of references to other objects, which we call the elements of the list. Importantly, these references are stored in consecutive blocks of memory—just as we’ve been drawing them in our memory model diagrams so far. This is what makes accessing list elements so fast: ge ing the i-th list item can be done just by calculating its address (i.e., location in the computer’s memory), based on the address where the list starts, and offset by i addresses. 2 2 Think about it like this: suppose you’re walking down a hallway with numbered rooms on just one side and room numbers going up by one. If you see that the first room number is 11, and you’re looking for room 15, you can be confident that it is the fifth room down the hall. To preserve this characteristic, lists must always be contiguous; there can’t be any “gaps”, or else Python couldn’t calculate the memory address of the i-th list item. But this makes insertion and deletion less efficient: for an item to be deleted, all items after it have to be moved down one block in memory, and similarly, for insertion all items are moved up one block. We have a trade-off: we give up fast insertion and deletion in order to gain fast lookup by index. There is one more important feature of Python lists that you should know, and it makes adding elements at the end of the list very fast: when you create a new list in Python, it actually allocates (assigns) more memory to the list than the list actually requires. If you create a list of 4 elements, you might get enough space to hold 8 elements. The exact numbers are implementation-specific and not important here; the general idea is that there is usually free space at the end of the list that the list can “expand” into. In particular, if you want to add an object to the end of the list, you can simply add a reference to it into that spot. On the other hand, if you want to add a new item to the list and there is no more free space, a new and larger chunk of memory is allocated for the list and every item is copied into it. 3 https://www.teach.cs.toronto.edu/~csc148h/winter/notes/abstract-data-types/efficiency.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/6 lOMoARcPSD|10729405 1/9/2021 3.4 Analysing Program Running Time 3 You’ll learn more about the details of such an implementation in our data structures course, CSC263. The net effect of these implementation details is that it’s much faster to add and remove at the end of the list than at its front! Adding at the end usually requires only expanding into the extra space; occasionally there won’t be any extra space left, and some time-consuming work has to be done. But on balance, adding at the end is much less time-consuming than adding at the beginning, which always requires shifting every element down a spot. (Removing items is analogous.) Now the reason for the speed difference in our stack example is clear! Analysing algorithm running time In evaluating the quality of our programs, we have talked about two metrics so far. The first is correctness: does our code actually work, even in special corner cases, and handle errors appropriately? The second is design: is the code carefully designed, and welldocumented so that it is easy to understand and work with by both clients and implementers of the code? From what we’ve seen about the two different stack implementations, there is certainly another way in which we can determine the quality of our code: how quickly the code runs. We will now study how to assess the running time efficiency of our programs rigorously, and communicate these ideas with other computer scientists. Observations about runtime First, recall that for most algorithms, their running time depends on the size of the input— as the input numbers or lists get larger, for example, we expect algorithms operating on them to increase as well. So when we measure efficiency, we really care about a function of the amount of time an algorithm takes to run in terms of the size of the input. We can write something like T(n) to denote the runtime of a function of size n (but note that this isn’t always necessarily n). How best to measure runtime? We might try to use tools like the timeit function, but there are many factors that influence the time it takes for code to run: how powerful your machine is, how many other programs are running at the same time. As with Schrödinger’s cat, even the act of observing the runtime can affect performance! What about the number of basic steps an algorithm takes? This is a bit be er, but still subtly misleading: do all “basic” operations take the same amount of time? What counts as a basic operation? Etc. etc. This is where Big-Oh comes in: it allows an elegant way of roughly characterizing the type of growth of the runtime of an algorithm, without actually worrying about things like how different CPUs implement different operations, whether a for loop is faster than a while loop, etc. Not that these things aren’t important—they are simply at another level of detail. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/abstract-data-types/efficiency.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/6 lOMoARcPSD|10729405 1/9/2021 3.4 Analysing Program Running Time There is no point fussing about this level of detail until we know the vastly larger kind of differences in growth rate that Big-Oh is designed to describe. When we characterise the Big-Oh property of a function, we really are thinking about general terms like linear, quadratic, or logarithmic growth. For example, when we see a loop through a list like this: for item in lst: # do something with item we know that the runtime is proportional to the length of the list. If the list gets twice as long, we’d expect this algorithm to take twice as long. The runtime grows linearly with the size of the list, and we write that the runtime is O(n), where n is the length of the list. Ignoring constants, focusing on behaviour as problem size grows In CSC165/CSC240, you learn about the formal mathematical definition of Big-Oh notation, but this is not covered in this course. Intuitively, Big-Oh notation allows us to analyse the running time of algorithms while ignoring two details: 1. The constants and lower-order terms involved in the step counting: 5n, n + 10, 19n − 10, 0.05n are all O(n)—they all have linear growth. 2. The algorithm’s running time on small inputs. The key idea here is that an algorithm’s behaviour as the input size gets very large is much more important than how quickly it runs on small inputs. Instead, Big-Oh notation allows us to capture how running time grows as the problem size grows. We say that Big-Oh deals with asymptotic runtime. Note that these points mean that Big-Oh notation is not necessarily suitable for all purposes. For example, even though the sorting algorithm mergesort runs in time O(nlog n) and the algorithm insertion sort runs in time O(n2), for small inputs (e.g., lists of size <= 10), insertion sort runs significantly faster than mergesort in practice! And sometimes the constants are important too—even though yet another algorithm called quicksort is (on average) an O(nlog n) algorithm, it has smaller constants than mergesort and so typically runs faster in practice. Neither of these practical observations are captured by the sorting algorithms’ respective Big-Oh classes! Terminology and mathematical intuition Here is a table that summarizes some of the more common Big-Oh classes, and the English terminology that we use to describe their growth: Big-Oh Growth term O(log n) logarithmic https://www.teach.cs.toronto.edu/~csc148h/winter/notes/abstract-data-types/efficiency.html Downloaded by michael ayad (michael.maged2014@gmail.com) 5/6 lOMoARcPSD|10729405 1/9/2021 3.4 Analysing Program Running Time Big-Oh Growth term O(n) linear O(n2) quadratic O(2n) exponential (with base 2) Notice that we say growth is “exponential” when the variable is in the exponent, as in 2n (but not n2). There is a very nice mathematical intuition that describes these classes too. Suppose we have an algorithm which has running time N0 when given an input of size n, and a running time of N1 on an input of size 2n. We can characterize the rates of growth in terms of the relationship between N0 and N1: Big-Oh Relationship O(log n) N1 ≈ N0 + c O(n) N1 ≈ 2N0 O(n2) N1 ≈ 4N0 O(2n) N1 ≈ (N0)2 Constant time There is one more use of Big-Oh notation that we require, which is to capture the case of a function whose asymptotic growth does not depend on its input! For example, consider the constant function f(n) = 10. This function doesn’t depend on its input, and so we say that it has constant asymptotic behaviour, writing O(1) to represent this. In the context of running time, we would say that a particular function or method “runs in constant time” to say that its runtime doesn’t depend on the size of its input. For example, our above discussion about index lookup in array-based Python lists can be summarized by saying that it is a constant time operation: looking up lst[i] takes time that does not depend on either len(lst) or i itself! CSC148 CSC148 CSC148Notes Notes NotesTable Table Tableof of ofContents Contents Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/abstract-data-types/efficiency.html Downloaded by michael ayad (michael.maged2014@gmail.com) 6/6 lOMoARcPSD|10729405 1/9/2021 4.1 Introduction to Linked Lists 4.1 Introduction to Linked Lists We have seen that Python lists are an array-based implementation of the list ADT and that they have some drawbacks: inserting and deleting items in a list can require shifting many elements in the program’s memory. For example, we saw that inserting and deleting at the front of a built-in list takes time proportional to the length of the list, because every item in the list needs to be shifted by one spot. This week, we’re going to study a completely different implementation of the List ADT that will a empt to address this efficiency shortcoming. To do so, we’ll use a new data structure called the linked list. Our goal will be to create a new Python class that behaves exactly the same as the built-in list class, changing only what goes on in the private implementation of the class. This will mean that, ultimately, code such as this: for i in range(n): nums.append(i) print(nums) will work whether num is a Python list or an instance of the class we are going to write. We’ll even learn how to make list indexing such as nums[3] = 'spider' work on instances of our class! The concept of “links” The reason why a Python list often requires elements to be shifted back and forth is that the elements of a Python list are stored in contiguous slots in memory. What if we didn’t a empt to have this contiguity? If we had a variable referring to the first element of a list, how would we know where the rest of the elements were? We can solve this easily, if we store along with each element a reference to the next element in the list. This bundling of data—an element plus a reference to the next element–should suggest something familiar to you: the need for a new class whose instance a ributes are exactly these pieces of data. We’ll call this class a node, and implement it in Python as follows: 1 1 We use a preceding underscore for the class name to indicate that this entire class is private: it shouldn’t be accessed by client code directly, but instead is only used by the “main” class described in the next section. class _Node: """A node in a linked list. Note that this is considered a "private class", one which is only meant to be used in this module by the LinkedList class, but not by client code. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/linked-lists/linked_list_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/5 lOMoARcPSD|10729405 1/9/2021 4.1 Introduction to Linked Lists === Attributes === item: The data stored in this node. next: The next node in the list, or None if there are no more nodes. """ item: Any next: Optional[_Node] def __init__(self, item: Any) -> None: """Initialize a new node storing <item>, with no next node. """ self.item = item self.next = None # Initially pointing to nothing An instance of _Node represents a single element of a list; to represent a list of n elements, we need n _Node instances. The references in all of their next a ributes link the nodes together into a sequence, even though they are not stored in consecutive locations in memory, and of course this is where linked lists get their name. A LinkedList class The second class we’ll use in our implementation is a LinkedList class, which will represent the list itself. This class is the one we want client code to use, and in it we’ll implement methods that obey the same interface as the built-in list class. Our first version of the class has a very primitive initializer that always creates an empty list. class LinkedList: """A linked list implementation of the List ADT. """ # === Private Attributes === # The first node in this linked list, or None if this list is empty. _first: Optional[_Node] def __init__(self) -> None: """Initialize an empty linked list. """ self._first = None Example: building links Of course, in order to do anything interesting with linked lists, we need to be able to create arbitrarily long linked lists! We’ll see more sophisticated ways of doing this later, but for practice here we’ll violate privacy concerns and just manipulate the private a ributes directly. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/linked-lists/linked_list_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/5 lOMoARcPSD|10729405 1/9/2021 4.1 Introduction to Linked Lists >>> linky = LinkedList() # linky is empty >>> print(linky._first) None >>> node1 = _Node(10) # New node with item 10 >>> node2 = _Node(20) # New node with item 20 >>> node3 = _Node(30) # New node with item 30 >>> print(node1.item) 10 >>> print(node1.next) # New nodes don't have any links None >>> node1.next = node2 # Let's set some links >>> node2.next = node3 >>> node1.next is node2 # Now node1 refers to node2! True >>> print(node1.next) <_Node object at 0x000000000322D5F8> >>> print(node1.next.item) 20 >>> print(node1.next.next.item) 30 >>> linky._first = node1 # Finally, set linky's first node to node1 >>> linky._first.item # linky now represents the list [10, 20, 30] 10 >>> linky._first.next.item 20 >>> linky._first.next.next.item 30 The most common mistake students make when first starting out with linked lists is confusing an individual node object with the item it stores. So in the example above, there’s a big difference between node1 and node1.item: the former is a _Node object containing the number 10, while the la er is the number 10 itself! As you start writing code with linked lists, you’ll sometimes want to operate on nodes, and sometimes want to operate on items. Making sure you always know exactly which type you’re working with is vital to your success here. Linked list diagrams Because each element of a linked list is wrapped in a _Node object, complete memory model diagrams of linked lists are quite a bit larger than those corresponding to Python’s arraybased lists. For example, the following is a diagram showing a linked list named linky with four elements, in order 109, 68, 71, 3. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/linked-lists/linked_list_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/5 lOMoARcPSD|10729405 1/9/2021 4.1 Introduction to Linked Lists While memory model diagrams are always a useful tool for understanding subtle memory errors—which certainly come up with linked lists!—they can be overkill if you want a quick and dirty linked list diagram. So below we show two stripped down versions of the memory model diagram, which remove all of the “boilerplate” type and a ribute names. The first one keeps the “item” references as arrows to separate memory objects, while the second goes a step further in simplification by writing the numbers directly in the node boxes. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/linked-lists/linked_list_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/5 lOMoARcPSD|10729405 1/9/2021 4.1 Introduction to Linked Lists CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/linked-lists/linked_list_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 5/5 lOMoARcPSD|10729405 1/9/2021 4.2 Traversing Linked Lists 4.2 Traversing Linked Lists The final example in the previous section ended with the sequence of expressions linky._first.item, linky._first.next.item, and linky._first.next.next.item to access the linked list’s first, second, and third elements, respectively. This is, of course, a very cumbersome way of accessing list elements! In this section, we’ll study how to traverse a linked list: that is, how to write code that visits each element of a linked list one at a time, regardless of how long that linked list actually is. The basic structure of this code is quite general, so we will apply it to a bunch of different methods. This may seem repetitive, but linked list code is one of the most technically challenging and important parts of the course, so spend the time to master it! Before we write code to traverse a linked list, let’s remember how traversal might look for a built-in list, manually using an index variable i to keep track of where we are in the list. 1 1 The following code is wri en to be a nice lead-in to linked list traversal; please keep in mind that there are be er ways of iterating through a list in Python! i = 0 while i < len(my_list): ... do something with my_list[i] ... i = i + 1 This code segment consists of four parts: 1. Initialize the index variable i (0 refers to the start of the list). 2. Check if we’ve reached the end of the list. 3. Do something with the current element my_list[i]. 4. Increment the index. This method takes advantage of the fact that Python already gives us a way to access elements of a built-in list by index (using square brackets). In a linked list, we don’t have this luxury, and so the major difference to this pa ern is that we now keep a variable that refers to which _Node object we’re on in the loop. Traversing a linked list consists of the exact same steps, except that the temporary variable now refers to a particular _Node object rather than an index. Other than this change, the steps are exactly the same! curr = my_linked_list._first while curr is not None: the list. ... curr.item ... curr.item. curr = curr.next the next node. # 1. Initialize curr to the start of the list # 2. curr is None if we've reached the end of # 3. Do something with the current *element* # 4. "Increment" curr, setting it to refer to https://www.teach.cs.toronto.edu/~csc148h/winter/notes/linked-lists/linked_list_traversal.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/3 lOMoARcPSD|10729405 1/9/2021 4.2 Traversing Linked Lists For example, here is a LinkedList method that prints out every item in a linked list. class LinkedList: def print_items(self) -> None: """Print out each item in this linked list.""" curr = self._first while curr is not None: print(curr.item) # Note: this is the only line we needed to fill in! curr = curr.next And here is a LinkedList method that’s a bit more complicated but uses the same traversal template. The goal of this method is to convert a linked list into a built-in Python list, in a non-mutating fashion (i.e., by returning a Python list without changing the original list). def to_list(self) -> list: """Return a (built-in) list that contains the same elements as this list. """ items = [] curr = self._first while curr is not None: items.append(curr.item) curr = curr.next return items Our philosophy for “code templates” You might be surprised about our presentation of a code template for traversing a linked list. After all, aren’t templates bad—we shouldn’t just copy-and-paste code, right? But in fact, over the next few weeks of the course, we’ll encourage you to use certain code templates to help get started writing and organizing your code. The difference between these code templates and just regular copy-and-pasting of code is that these templates are meant only to provide an overall code structure, and not replace the hard work of actually thinking about how to write code. In other words, we use templates to make it easier to get started writing code. Consider again our template for iterating through a linked list: curr = my_linked_list._first while curr is not None: ... curr.item ... curr = curr.next Whenever you’re starting to write code to iterate through a linked list, your first step should be to copy-and-paste this template into your code. But that’s the easy part; the next part involves the thinking required to fill in the ellipsis (...) and modify the template to https://www.teach.cs.toronto.edu/~csc148h/winter/notes/linked-lists/linked_list_traversal.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/3 lOMoARcPSD|10729405 1/9/2021 4.2 Traversing Linked Lists suit your particular needs. In the following weeks of this course, you’ll get lots of practice with that. :) CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/linked-lists/linked_list_traversal.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/3 lOMoARcPSD|10729405 1/9/2021 4.3 Linked List Mutation 4.3 Linked List Mutation All of the linked list methods we have looked at so far are non-mutating, meaning they did not change the linked list upon which they were called. Here’s a reminder of the basic traversal pa ern using a while loop: understanding this is critical before moving on! # self is a LinkedList curr = self._first while curr is not None: ... curr.item ... curr = curr.next We started with these methods because they’re generally easier to understand than their mutating cousins. Now, we’re going to look at the two major mutating operations on linked lists: inserting into and deleting items from a linked list. We’ll start with the simplest version of this: appending a new item to the end of a linked list. Before we start, let’s remind ourselves how this works for built-in lists: >>> >>> >>> >>> >>> [1, lst = [] lst.append(1) lst.append(2) lst.append(3) lst 2, 3] Linked list append class LinkedList: def append(self, item: Any) -> None: """Add the given item to the end of this linked list.""" Recall that a LinkedList object has only one a ribute, a reference to the first node in the list. Unfortunately, this means that we have some work to do to implement append: before adding the item, we need to find the currently last node in the linked list, and then add the item to the end of that. Let’s start (as recommended!!) by using our basic code template: def append(self, item: Any) -> None: """Add the given item to the end of this linked list.""" curr = self._first while curr is not None: ... curr.item ... curr = curr.next https://www.teach.cs.toronto.edu/~csc148h/winter/notes/linked-lists/linked_list_mutation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/8 lOMoARcPSD|10729405 1/9/2021 4.3 Linked List Mutation This template is a good start, but now our thinking must begin. First: what do we do with curr.item? The answer is “Nothing!”—we don’t need to actually use any of the existing items in the list, and instead are just going through the list to get to the last node. Unfortunately, there’s a problem with the loop: this loop is designed to keep going until we’ve processed all of the elements of the list, and curr becomes None. But this is actually going too far for our purposes: we want to stop the loop as soon as we reach the last node. 1 1 This is actually a subtle instance of the classic “off-by-one” error in computer science: our iteration goes for one too few times. We modify our loop condition to check whether the current node is the last one by using curr.next is None instead. def append(self, item: Any) -> None: """Add the given item to the end of this linked list.""" curr = self._first while curr.next is not None: curr = curr.next # After the loop, curr is the last node in the LinkedList. # assert curr is not None and curr.next is None At this point, the astute reader will point out a flaw in this change: we aren’t guaranteed that curr starts off as a node—it could be None. But because we don’t want to get bogged down with handling that case right now, we’ll add a TODO comment in our code and keep going. def append(self, item: Any) -> None: """Add the given item to the end of this linked list.""" curr = self._first while curr.next is not None: # TODO: what if curr starts off as None? curr = curr.next # After the loop, curr is the last node in the LinkedList. # assert curr is not None and curr.next is None So then after the loop ends, we know that curr refers to the last node in the linked list, and we are finally in a position to add the given item to the linked list. To do so, we need to create a new node and then connect it in. def append(self, item: Any) -> None: """Add the given item to the end of this linked list.""" curr = self._first while curr.next is not None: # TODO: what is curr starts off as None? curr = curr.next # After the loop, curr is the last node in the LinkedList. # assert curr is not None and curr.next is None https://www.teach.cs.toronto.edu/~csc148h/winter/notes/linked-lists/linked_list_mutation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/8 lOMoARcPSD|10729405 1/9/2021 4.3 Linked List Mutation new_node = _Node(item) curr.next = new_node And finally, let’s handle that TODO. We know from the documentation of our LinkedList class that self._first can only be None if self refers to an empty linked list. But in this case, all we need to do is add the new item to be the first item in the linked list. def append(self, item: Any) -> None: """Add the given item to the end of this linked list.""" curr = self._first if curr is None: new_node = _Node(item) self._first = new_node else: while curr.next is not None: curr = curr.next # After the loop, curr is the last node in the LinkedList. # assert curr is not None and curr.next is None new_node = _Node(item) curr.next = new_node Example: a more general initializer With our append method in place, we can now stop creating linked lists by manually fiddling with a ributes, and instead modify our linked list initializer to take in a list of values, which we’ll then append one at a time: class LinkedList: def __init__(self, items: list) -> None: """Initialize a new linked list containing the given items. The first node in the linked list contains the first item in <items>. """ self._first = None for item in items: self.append(item) While this code is perfectly correct, it turns out that it is rather inefficient; we’ll leave it as an exercise for now to develop a be er approach. Index-based insertion Now suppose we want to implement a more general form of insertion that allows the user to specify the index of the list to insert a new item into (analogous to the built-in list.insert method): https://www.teach.cs.toronto.edu/~csc148h/winter/notes/linked-lists/linked_list_mutation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/8 lOMoARcPSD|10729405 1/9/2021 4.3 Linked List Mutation class LinkedList: def insert(self, index: int, item: Any) -> None: """Insert a new node containing item at position <index>. Precondition: index >= 0. Raise IndexError if index > len(self). Note: if index == len(self), this method adds the item to the end of the linked list, which is the same as LinkedList.append. >>> >>> >>> '[1 >>> >>> '[1 """ lst = LinkedList([1, 2, 10, 200]) lst.insert(2, 300) str(lst) -> 2 -> 300 -> 10 -> 200]' lst.insert(5, -1) str(lst) -> 2 -> 300 -> 10 -> 200 -> -1]' As with append, our first step is to traverse the list until we reach the correct index; and if we want the node to be inserted into position index, we need to access the node at position (index-1)! To write the code, we need to modify our code template to store not just the current node, but the current index of that node as well: def insert(self, index: int, item: Any) -> None: curr = self._first curr_index = 0 while curr is not None and curr_index < index - 1: curr = curr.next curr_index += 1 This loop condition is a bit more complicated, so it’s worth spending some time to unpack. Here, we’re saying that the loop should keep going when the current node is not None and when the current index is less than our target index (index - 1). This means that when the loop is over, the current node is None or the current index has reached the target index (or both!). We therefore need to structure our code into two cases, and handle each one separately: def insert(self, index: int, item: Any) -> None: curr = self._first curr_index = 0 while curr is not None and curr_index < index - 1: curr = curr.next curr_index += 1 # assert curr is None or curr_index == index - 1 if curr is None: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/linked-lists/linked_list_mutation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/8 lOMoARcPSD|10729405 1/9/2021 4.3 Linked List Mutation pass else: # curr_index == index - 1 pass Now, if curr is None then the list doesn’t have a node at position index - 1, and so that index is out of bounds. In this case, we should raise an IndexError. On the other hand, if curr is not None, then we’ve reached the desired index, and can insert the new node using the same strategy as append. def insert(self, index: int, item: Any) -> None: curr = self._first curr_index = 0 while curr is not None and curr_index < index - 1: curr = curr.next curr_index += 1 # assert curr is None or curr_index == index - 1 if curr is None: # index - 1 is out of bounds. The item cannot be inserted. raise IndexError else: # curr_index == index - 1 # index - 1 is in bounds. Insert the new item. new_node = _Node(item) curr.next = new_node # Hmm... Well, almost. The problem with the last else branch is that unlike append, curr might have had other nodes after it! Simply se ing curr.next = new_node loses the reference to the old node at position index, and any subsequent nodes after that one. So before overwriting curr.next, we need to update new_node so that it refers to the old node at position index: def insert(self, index: int, item: object) -> None: curr = self._first curr_index = 0 while curr is not None and curr_index < index - 1: curr = curr.next curr_index += 1 # assert curr is None or curr_index == index - 1 if curr is None: # index - 1 is out of bounds. The item cannot be inserted. raise IndexError else: # curr_index == index - 1 # index - 1 is in bounds. Insert the new item. new_node = _Node(item) new_node.next = curr.next # THIS LINE IS IMPORTANT! curr.next = new_node https://www.teach.cs.toronto.edu/~csc148h/winter/notes/linked-lists/linked_list_mutation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 5/8 lOMoARcPSD|10729405 1/9/2021 4.3 Linked List Mutation Warning! Common error ahead! (and solution) When writing mutating methods on linked lists, we very often update the links of individual nodes to add and remove nodes in the list. We must be very careful when doing so, because the order in which we update the links really ma ers, and often only one order results in the correct behaviour. For example, this order of link updates in the final else branch doesn’t work: curr.next = new_node new_node.next = curr.next On the second line, curr.next has already been updated, and its old value lost. The second line is now equivalent to writing new_node.next = new_node, which is certainly not what we want! The reason this type of error is so insidious is that the code looks very similar to the correct code (only the order of lines is different), and so you can only detect it by carefully tracing through the updates of the links line-by-line. To mitigate this problem, we’ll take advantage of a pre y nice Python feature known as multiple (or simultaneous) assignment: a, b = 1, 2 # Assigns 1 to a and 2 to b The beauty of this approach is that the expressions on the right side are all evaluated before any new values are assigned, meaning that you don’t need to worry about the order in which you write them. For example, these two assignment statements are equivalent: # Version 1 curr.next, new_node.next = new_node, curr.next # Version 2 new_node.next, curr.next = curr.next, new_node In other words, using multiple assignment in this linked list method allows us to ignore the tricky business about the order in which the link updates happen! We strongly recommend using multiple assignment in your own code when working with complex state updating. Tidying up: don’t forget about corner cases! Our insert implementation has one problem: what if index = 0? In this case, it doesn’t make sense to iterate to the (index-1)-th node! This is again a special case which we need to handle separately, by modifying self._first (because in this case, we’re inserting into the front of a linked list): https://www.teach.cs.toronto.edu/~csc148h/winter/notes/linked-lists/linked_list_mutation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 6/8 lOMoARcPSD|10729405 1/9/2021 4.3 Linked List Mutation def insert(self, index: int, item: Any) -> None: if index == 0: new_node = _Node(item) self._first, new_node.next = new_node, self._first else: curr = self._first curr_index = 0 while curr is not None and curr_index < index - 1: curr = curr.next curr_index += 1 # assert curr is None or curr_index == index - 1 if curr is None: # index - 1 is out of bounds. The item cannot be inserted. raise IndexError else: # curr_index == index - 1 # index - 1 is in bounds. Insert the new item. new_node = _Node(item) curr.next, new_node.next = new_node, curr.next Exercise: Index-based deletion The analogue of Python’s list.append is list.pop, which allows the user to remove an item at a specified index in a list. Because this is quite similar to insertion, we won’t develop the full code here, but instead outline the basic steps in some pseudo-code: class LinkedList: def pop(self, index: int) -> Any: """Remove and return node at position <index>. Precondition: index >= 0. Raise IndexError if index >= len(self). >>> lst = LinkedList([1, 2, 10, 200]) >>> lst.pop(2) 10 >>> lst.pop(0) 1 """ # Warning: the following is pseudo-code, not valid Python code! # 1. bounds... # 2. # 3. # If the list is empty, you know for sure that index is out of Else if index is 0, remove the first node and return its item. Else iterate to the (index-1)-th node and update links to remove the node at position index. But don't forget to return the item! https://www.teach.cs.toronto.edu/~csc148h/winter/notes/linked-lists/linked_list_mutation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 7/8 lOMoARcPSD|10729405 1/9/2021 4.3 Linked List Mutation CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/linked-lists/linked_list_mutation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 8/8 lOMoARcPSD|10729405 1/9/2021 4.4 Linked Lists and Running Time 4.4 Linked Lists and Running Time To wrap up the discussion of linked lists, we return to our original motivation to studying linked lists: improving the efficiency of some of the basic list operations. We have already discussed the running time of three operations of array-based lists: Looking up an element of the list by its index (e.g., lst[i]) takes constant time, i.e., is independent of the length of the list, or even which index we’re looking up. In the language of Big-Oh notation, we write O(1) to represent this time. Inserting or removing an element at index i (0 ≤ i < n) in a list of length n takes time proportional to n − i, which is the number of list elements that need to be shifted when this operation occurs. Remember that Big-Oh notation is used to describe “proportional to” relationships, and so we write that this operation takes time O(n − i). In particular, if we only consider inserting/removing at the front of an array-based list (so i = 0), this takes time linear in the length of the list: O(n). On the other hand, if we only consider inserting/removing at the end of such a list (i = n), this is a constant time operation: O(1). 1 1 You might note that mathematically, n − i = 0 if i = n. However, every operation takes at least one step to run, and so there’s an implicit "max(1, ___)" whenever we write a Big-Oh expression to capture the fact that the running time can’t drop below 1. Turning to linked lists What about the corresponding operations for LinkedList? Let’s study our code for LinkedList.insert, first looking at the special cases of inserting into the front and end of a linked list. def insert(self, index: int, item: Any) -> None: # Create a new node new_node = _Node(item) # Need to do something special if we insert into the first position. # In this case, self._first *must* be updated. if index == 0: new_node.next = self._first self._first = new_node else: # Get the node at position (index - 1) curr_index = 0 curr = self._first while curr is not None and curr_index < index - 1: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/linked-lists/linked_list_efficiency.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/3 lOMoARcPSD|10729405 1/9/2021 4.4 Linked Lists and Running Time curr = curr.next curr_index = curr_index + 1 if curr is None: raise IndexError else: # At this point, curr refers to the node at position (index - 1) curr.next, new_node.next = new_node, curr.next We insert into the front of the linked list by calling insert with an index argument of 0. In this case, the if branch executes, which takes constant time— both assignment statements do not depend on the length of the list. On the other hand, suppose we want to insert an item at the end of the linked list, and there’s at least one element already in the linked list. The else branch executes, and the loop must iterate until it reaches the end of the list, which takes time linear in the length of the list. 2 2 Note that the body of the loop, which again consists of assignment statements, takes constant time. In other words, linked lists have the exact opposite running times as array-based lists for these two operations! Inserting into the front of a linked list takes O(1) time, and inserting into the back of a linked list takes O(n) time, where n is the length of the list. This may seem disappointing, because now it isn’t clear which list implementation is “be er.” But in fact this is pre y typical of computer science: when creating multiple implementations of a public interface, each implementation will often be good at some operations, but worse at others. In practice, it is up to the programmer who is acting as a client of the interface to decide which implementation to use, based on how they prioritize the efficiency of certain operations over others. Investigating the subtleties of “input size” Despite our discussion above, we haven’t yet finished the analysis of linked list insert. We’ve really only looked at two special cases: when index is 0, and when index is the length of the linked list. What about all the other numbers? The very first thing we need to look at is the running of each individual line of code. In this case, each individual line (like curr = self._first) takes constant time, i.e., doesn’t depend on the size of the inputs. This means that the overall running time depends on the number of lines that execute, and this in turn depends on the number of times the loop runs. curr_index = 0 curr = self._first while curr is not None and curr_index < index - 1: curr = curr.next curr_index = curr_index + 1 https://www.teach.cs.toronto.edu/~csc148h/winter/notes/linked-lists/linked_list_efficiency.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/3 lOMoARcPSD|10729405 1/9/2021 4.4 Linked Lists and Running Time So how many times does the loop run? There are two possibilities for when it stops: when curr is None, or when curr_index == index - 1. The first case means that the end of the list was reached, which happens after n iterations, where n is the length of the list (each iteration, the curr variable advances by one node). The second case means that the loop ran index - 1 times, since curr_index starts at 0 and increases by 1 per iteration. Since the loop stops when one of the conditions is false, the number of iterations is the minimum of these two possibilities: min(n, index − 1). Since the total number of steps is proportional to the number of loop iterations, we can conclude that the asymptotic running time of this method is O(min(n, index − 1)), where n is the size of self. But because Big-Oh notation is to simplify our running-time expressions by dropping smaller terms, we can drop the “-1” and simply write that the Big-Oh running time is O(min(n, index)). Special cases The Big-Oh expression O(min(n, index)) for LinkedList.insert is the most general expression we could give, since we didn’t make any assumptions about the relationship between the length of the linked list and index. 3 3 Although we are assuming that index >= 0 here! But now suppose we assume that index <= n, which is plausible since any larger value would raise an IndexError error. In this case, the Big-Oh expression simplifies to just O(index), revealing that under this assumption, the running time depends only on index and not on the length of the linked list at all. This is a bit subtle, so let’s say this again. We have a relationship between the running time of insert and the sizes of two of its inputs. But we can simplify this expression by talking about the relationship between these two input sizes. Essentially, we say that if we treat index as small with respect to the size of the list, then the running time of the algorithm does not depend on the size of the list. 4 4 The most extreme case of this is when index == 0, so we’re inserting into the front of the linked list. As we discussed earlier, this takes constant time, meaning it does not depend on the length of the list. On the other hand, suppose index is greater than the size of the list; in this case, the Big-Oh expression simplifies to O(n): even though we know this value raises an IndexError, our current implementation requires traversing the entire linked list before realizing that index is out of bounds! 5 5 Can you find a way to fix this problem efficiently? CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/linked-lists/linked_list_efficiency.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/3 lOMoARcPSD|10729405 1/9/2021 5.1 Motivation: Adding Up Numbers 5.1 Motivation: Adding Up Numbers This week, we’re going to learn about a powerful technique called recursion, which we’ll be using in various ways for the rest of the course. However, recursion is much more than just a programming technique, it is a way of thinking about solving problems. This new way of thinking can be summarized in this general strategy: identify how an object or problem can be broken down into smaller instances with the same structure. Let’s begin with a series of problems that will demonstrate the need for recursion. Summing lists and nested lists Consider the problem of computing the sum of a list of numbers. Easy enough: def sum_list(lst: List[int]) -> int: """Return the sum of the items in a list of numbers. >>> sum_list([1, 2, 3]) 6 """ s = 0 for num in lst: s += num return s But what if we make the input structure a bit more complex: a list of lists of numbers? After a bit of thought, we might arrive at using a nested loop to process individual items in the nested list: def sum_list2(lst: List[List[int]]) -> int: """Return the sum of the items in a list of lists of numbers. >>> sum_list2([[1], [10, 20], [1, 2, 3]]) 37 """ s = 0 for list_of_nums in lst: for num in list_of_nums: s += num return s And now what happens if we want yet another layer, and compute the sum of the items in a list of lists of lists of numbers? Some more thought leads to a “nested nested list”: https://www.teach.cs.toronto.edu/~csc148h/winter/notes/recursion/recursion_motivation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/3 lOMoARcPSD|10729405 1/9/2021 5.1 Motivation: Adding Up Numbers def sum_list3(lst: List[List[List[int]]]) -> int: """Return the sum of the items in a list of lists of lists of numbers. >>> sum_list3([[[1], [10, 20], [1, 2, 3]], [[2, 3], [4, 5]]]) 51 """ s = 0 for list_of_lists_of_nums in lst: for list_of_nums in list_of_lists_of_nums: for num in list_of_nums: s += num return s Of course, you see where this is going: every time we want to add a new layer of nesting to the list, we add a new layer to the for loop. Note that this is quite interesting from a “meta” perspective: the structure of the data is mirrored in the structure of the code which operates on it. Simplifying using helpers You might have noticed the duplicate code above: in fact, we can use sum_list as a helper for sum_list2, and sum_list2 as a helper for sum_list3: def sum_list(lst: List[int]) -> int: """Return the sum of the items in a list of numbers. """ s = 0 for num in lst: # num is an int s += num return s def sum_list2(lst: List[List[int]]) -> int: """Return the sum of the items in a list of lists of numbers. """ s = 0 for list_of_nums in lst: # list_of_nums is a List[int] s += sum_list(list_of_nums) return s def sum_list3(lst: List[List[List[int]]]) -> int: """Return the sum of the items in a list of lists of lists of numbers. """ s = 0 for list_of_lists_of_nums in lst: # list_of_lists_of_nums is a List[List[int]] s+ = sum_list2(list_of_lists_of_nums) return s https://www.teach.cs.toronto.edu/~csc148h/winter/notes/recursion/recursion_motivation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/3 lOMoARcPSD|10729405 1/9/2021 5.1 Motivation: Adding Up Numbers While this is certainly a nice simplification, it does not generalize very nicely. If we wanted to implement sum_list10, a function which works on lists with ten levels of nesting, our only choice with this approach would be to first define sum_list4, sum_list5, etc., all the way up to sum_list9. Heterogeneous lists There is an even bigger problem: no function of this form can handle nested lists with a non-uniform level of nesting among its elements, like [[1, [2]], [[[3]]], 4, [[5, 6], [[[7]]]]] We encourage you to try running the above functions on such a list—what error is raised? CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/recursion/recursion_motivation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/3 lOMoARcPSD|10729405 1/9/2021 5.2 Nested Lists: A Recursive Data Structure 5.2 Nested Lists: A Recursive Data Structure In the previous section, we ended by articulating a fundamental limitation of our sum_list functions: they cannot handle heterogeneous nested lists like [[1, [2]], [[[3]]], 4, [[5, 6], [[[7]]]]] In this section, we’ll overcome this limitation by using a new strategy: breaking down an object or problem into smaller instances with the same structure as the original. To make this more concrete, let’s first identify how the object we are working with, a nested list, can be broken down into smaller instances with the same structure. We will define a new structure that generalizes the idea of “list of lists of lists of … of lists of ints”. We define a nested list as one of two types of values: A single integer. A list of other nested lists ([lst_1, lst_2, ..., lst_n]). Each sublist is called a sub-nested-list of the outer list. This is a recursive definition: it defines nested lists in terms of other nested lists. 1 It may 1 Another term for “recursive definition” is self-referential definition. seem a bit odd that we include “single integers” as nested lists; after all, isinstance(3, list) is False in Python! As we’ll see a few times in this section, it is very convenient to include this part of our recursive definition, and makes both the rest of the definition and the subsequent code we’ll write much more elegant. The depth of a nested list is the maximum number of times a list is nested inside other lists, and we will define the depth of a single integer to be 0. So [1, 2, 3] has depth 1, and [1, [2, [3, 4], 5], [6, 7], 8] has depth 3. Summing up a nested list We can use this definition to guide the design of a function that computes the sum on a nested list of numbers: def sum_nested(obj: Union[int, List]) -> int: """Return the sum of the numbers in a nested list <obj>. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/recursion/nested_lists.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/6 lOMoARcPSD|10729405 1/9/2021 5.2 Nested Lists: A Recursive Data Structure """ if isinstance(obj, int): # obj is an integer return obj else: # obj is a list of nested lists: [lst_1, ..., lst_n] s = 0 for sublist in obj: # each sublist is a nested list s += sum_nested(sublist) return s This is our first example of a recursive function: a function that calls itself in its body. Just as we defined a recursive data structure—nested lists—we have now defined a recursive function that operates on nested lists. Notice how the structure of the data informs the structure of the code: just as the definition of nested lists separates integers and lists of nested lists into two cases, so too does the function sum_nested. And as the recursive part of the definition involves a list of nested lists, our code involves a loop over a list, binds sublist to each inner nested list one at a time, and calls sum_nested on it to compute the sum. We call the case where obj is an integer the base case of the code: implementing the function’s behaviour on this type of input should be very straightforward, and not involve any recursion. The other case, in which obj is a list, is called the recursive case: solving the problem in this case requires decomposing the input into smaller nested lists, and calling sum_nested on these individually to solve the problem. The example above is the simplest type of recursive function, one that has just one base case and one recursive case. Later on, we’ll look at more complex recursive data structures and functions. Partial tracing: reasoning about recursive calls We say that the call to sum_nested inside the for loop is a recursive function call, since it appears in the body of sum_nested itself. Such function calls are handled in the same way as all other function calls in Python, but the nature of recursion means that a single initial function call often results in many different calls to the exact same function. When given a function call on a very complex nested list argument, beginners will often a empt to trace through the code carefully, including tracing what happens on each recursive call. Their thoughts go something like “Well, we enter the loop and make a recursive call. That recursive call will make this other recursive call, which will make this other recursive call, and so on.” This type of literal tracing is what a computer does, but it’s also extremely time-consuming and error-prone for humans to do. Instead, whenever we are given a recursive function and a particular input we want to trace the function call on, we use the technique of partial tracing, which we’ll describe now. There are two cases. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/recursion/nested_lists.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/6 lOMoARcPSD|10729405 1/9/2021 5.2 Nested Lists: A Recursive Data Structure Case 1: The input corresponds to a base case For example, suppose we want to trace the call sum_nested(5). The input 5 is an integer, the simplest kind of nested list. In our recursive function, this corresponds to the if condition being true, and so to trace our code we can simply trace the if branch directly, completely ignoring the else branch! def sum_nested(obj: Union[int, List]) -> int: """Return the sum of the numbers in a nested list <obj>. """ if isinstance(obj, int): # obj is an integer return obj else: ... Tracing this is pre y easy: the line of code return obj simply returns the input, which was 5. This is the correct result: sum_nested(5) should return 5, since the sum of a single integer is just the integer itself. Case 2: The input corresponds to a recursive case For example, suppose we want to trace the call sum_nested([1, [2, [3, 4], 5], [6, 7], 8]) and verify that the output is the correct value of 36. This input corresponds to the else branch of sum_nested, which we show below: else: # obj is a list of nested lists: [lst_1, ..., lst_n] s = 0 for sublist in obj: # each sublist is a nested list s += sum_nested(sublist) return s This code is an instance of the accumulator pa ern, in which we loop over obj, and for each value update the accumulator variable s. Naive tracing is challenging, though, because at each loop iteration we’ll make a recursive call to sum_nested, which may result in tracing many subsequent recursive calls until we finally can update our accumulator. So the key idea of partial tracing is the following: we’ll trace through the above code, but every time there’s a recursive call, instead of tracing into it, we assume it is correct, and simply use the correct return value and continue tracing the rest of our code. 2 2 In the PyCharm debugger, this is analogous to using Step Over rather than Step Into when we reach a function call. To facilitate this, we use a table of values, that we build up as follows. 1. First, we take our input obj, [1, [2, [3, 4], 5], [6, 7], 8], and identify each sub-nested-list. Note that there are only four of them (we don’t count sub-nestedhttps://www.teach.cs.toronto.edu/~csc148h/winter/notes/recursion/nested_lists.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/6 lOMoARcPSD|10729405 1/9/2021 5.2 Nested Lists: A Recursive Data Structure lists of sub-nested-lists). sublist 1 [2, [3, 4], 5] [6, 7] 8 2. Next, beside each one we write down what sum_nested should return on each input. Remember that we aren’t doing any tracing here; instead, we’re filling this in based on the documentation for sum_nested. sublist sum_nested(sublist) 1 1 [2, [3, 4], 5] 14 [6, 7] 13 8 8 3. Finally, we trace through the code from the original else block, updating the value of the accumulator s using the above table. We show these updates in tabular form below. sublist sum_nested(sublist) s N/A N/A 0 (initial value) 1 1 1 (s += 1) [2, [3, 4], 5] 14 15 (s += 14) [6, 7] 13 28 (s += 13) 8 8 36 (s += 8) From our table, we see that after the loop completes, the final value of s is 36, and this is the value returned by our original call to sum_nested. It also happens to be the correct value! Why does partial tracing work? When students first see the partial tracing technique, they are often suspicious: “What do you mean you can assume that the recursive calls are correct? What if they aren’t?!” In other words: yes, this technique is simpler, but why should we trust it? It turns out that this assumption is valid, as long as both of the following properties hold: 1. You are sure that your base case is correct. (You can usually fully trace the code directly to determine this.) https://www.teach.cs.toronto.edu/~csc148h/winter/notes/recursion/nested_lists.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/6 lOMoARcPSD|10729405 1/9/2021 5.2 Nested Lists: A Recursive Data Structure 2. Every time you make a recursive call, it is on a smaller input than the original input. 3 3 If we are recursing on a nested list, a “smaller” input is one with less depth. Because we recurse on a sub-nested-list of the original, it is guaranteed to have less depth. Here’s how to apply these ideas to show that sum_nested is correct: 1. Since I’m confident in the base case for sum_nested, I now know that sum_nested is correct for all nested lists of depth 0. 2. If I have a nested list of depth 1, every recursive call I make is going to be on a nested list of depth 0. I’m going to assume that they’re correct (because of my previous statement). So then using partial tracing, I’m confident that sum_nested is correct for all nested lists of depth 1. 3. If I have a nested list of depth 2, every recursive call I make is going to be on a nested list of depth 0 or 1. I’m going to assume that they’re correct (because of my previous two statements). So then using partial tracing, I’m confident that sum_nested is correct for all nested lists of depth 2. 4. And so on. This idea is formalized in the Principle of Mathematical Induction, a formal proof technique that you’ll learn about in CSC165/240. It’s beyond the scope of this course, but rest assured that it tells us that partial tracing is a valid technique for reasoning about the correctness of recursive functions. What this means for testing and debugging The flip side of this is, if you have a recursive function and it’s incorrect (say, failing a test case that you wrote), there can only be one of the following problems: 1. A base case is incorrect. 2. One or more of the recursive calls is not being made on a smaller input. 3. The recursive case is incorrect, even if you assume that every recursive call is correct! In other words, the problem isn’t in the recursive call itself, it’s in the code surrounding that recursive call. Design Recipe for recursive functions 1. Identify the recursive structure of the problem, which can usually be reduced to finding the recursive structure of the input. Figure out if it’s a nested list, or some other data type that can be expressed recursively. Once you do this, you can often write down a code template to guide the structure of your code. For example, the code template for nested lists is: def f(obj: Union[int, List]) -> ...: if isinstance(obj, int): https://www.teach.cs.toronto.edu/~csc148h/winter/notes/recursion/nested_lists.html Downloaded by michael ayad (michael.maged2014@gmail.com) 5/6 lOMoARcPSD|10729405 1/9/2021 5.2 Nested Lists: A Recursive Data Structure ... else: for sublist in obj: ... f(sublist) ... 2. Identify and implement code for the base case(s). Note that you can usually tell exactly what the base cases are based on the structure of the input to the function. For nested lists, the common base case is when the input is an integer—and if you follow the above template, you won’t forget it. 3. Write down a concrete example of the function call on an input of some complexity (e.g., a nested list of depth 3). Then write down the relevant recursive function calls (determined by the structure of the input), and what they output based on the docstring of the function. In other words, write down the first two columns of the table we described above in the section on partial tracing. 4. Take your results from step 3, and figure out how to combine them to produce the correct output for the original call. This is usually the hardest step, but once you figure this out, you can implement the recursive step in your code! CSC148 Notes Table of Contents CSC148 CSC148Notes NotesTable Tableof ofContents Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/recursion/nested_lists.html Downloaded by michael ayad (michael.maged2014@gmail.com) 6/6 lOMoARcPSD|10729405 1/9/2021 6.1 Introduction to Trees 6.1 Introduction to Trees While the List abstract data type is extremely common and useful, not all data has a natural linear order. Family trees, corporate organization charts, classification schemes like “Kingdom, Phylum, etc.” and even file storage on computers all follow a hierarchical structure, in which each entity is linked to multiple entities “below” it. In computer science, we use a tree data structure to represent this type of data. Trees are a recursive data structure, with the following definition: A tree is either 1 1 Note the similarity between this definition and the one for nested lists. empty, or has a root value connected to any number of other trees, called the subtrees of the tree. We generally draw the root at the top of the tree; the rest of the tree consists of subtrees that are a ached to the root. Note that a tree can contain a root value but not have any subtrees: this occurs in a tree that contains just a single item. Tree Terminology https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/tree_introduction.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/3 lOMoARcPSD|10729405 1/9/2021 6.1 Introduction to Trees A C B E F G D H I J A tree is either empty or non-empty. Every non-empty tree has a root, which is connected to zero or more subtrees. 2 The root value of the above tree is labeled A; it is connected to 2 Because subtrees are themselves trees, each one has its own subtrees. This sometimes leads to confusion! The term “subtree” is always relative to an outer tree, where each subtree is connected to the root of that outer tree. three subtrees. The size of a tree is the number of values in the tree. What’s the relationship between the size of a tree and the size of its subtrees? A leaf is a value with no subtrees. The leaves of the above tree are labeled E, F, G, J, and I. What’s the relationship between the number of leaves of a tree and the number of leaves of its subtrees? The height of a tree is the length of the longest path from its root to one of its leaves, counting the number of values on the path. The height of the above tree is 4. What’s the relationship between the height of a tree and the heights of its subtrees? https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/tree_introduction.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/3 lOMoARcPSD|10729405 1/9/2021 6.1 Introduction to Trees The children of a value are all values directly connected underneath that value. The children of A are B, C, and D. Note that the number of children of a value is equal to the number of its subtrees, but that these two concepts are quite different. The descendants of a value are its children, the children of its children, etc. This can be defined recursively as “the descendants of a value are its children, and the descendants of its children.” What’s the relationship between the number of descendants of a value and the number of descendants of its children? Similarly, the parent of a tree value is the value immediately above and connected to it; each value in a tree has exactly one parent, except the root, which has no parent. The ancestors of a value are its parent, the parent of its parent, etc. This too can be defined recursively: “the ancestors of a value are its parent, and the ancestors of its parent.” Note: sometimes, it will be convenient to say that descendants/ancestors of a value include the value itself; we’ll make it explicit whether to include the node or not when it comes up. Note that a value is never a child of itself, nor a parent of itself. CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/tree_introduction.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/3 lOMoARcPSD|10729405 1/9/2021 6.2 A Tree Implementation 6.2 A Tree Implementation Here is a simple implementation of a tree in Python. 1 1 As usual, we’ll start with a very bare-bones implementation, and then develop more and more methods for this class throughout the course. class Tree: """A recursive tree data structure. """ # === Private Attributes === # The item stored at this tree's root, or None if the tree is empty. _root: Optional[Any] # The list of all subtrees of this tree. _subtrees: List[Tree] # === Representation Invariants === # - If self._root is None then self._subtrees is an empty list. # This setting of attributes represents an empty tree. # # Note: self._subtrees may be empty when self._root is not None. # This setting of attributes represents a tree consisting of just one # node. def __init__(self, root: Optional[Any], subtrees: List[Tree]) -> None: """Initialize a new tree with the given root value and subtrees. If <root> is None, this tree is empty. Precondition: if <root> is None, then <subtrees> is empty. """ self._root = root self._subtrees = subtrees def is_empty(self) -> bool: """Return whether this tree is empty. >>> t1 = Tree(None, []) >>> t1.is_empty() True >>> t2 = Tree(3, []) >>> t2.is_empty() False """ return self._root is None Our initializer here always creates either an empty tree (when root is None), or a tree with a value and the given subtrees. Note that it is possible for root to not be None, but subtrees to still be empty: this represents a tree with a single root value, and no subtrees. As we’ll https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/tree_implementation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/6 lOMoARcPSD|10729405 1/9/2021 6.2 A Tree Implementation soon see, the empty tree and single value cases are generally the base cases when writing recursive code that operates on trees. Recursion on trees There’s a reason we keep asking the same question “What’s the relationship between a tree’s X and the X of its subtrees?” Understanding the relationship between a tree and its subtrees—that is, its recursive structure—allows us to write extremely simple and elegant recursive code for processing trees, just as it did with nested lists earlier in the course. Here’s a first example: “the size of a non-empty tree is the sum of the sizes of its subtrees, plus 1 for the root; the size of an empty tree is 0.” This single observation immediately lets us write the following recursive function for computing the size of a tree. 2 2 Again, note the similarity to nested lists. This will be a consistent refrain throughout this section. def __len__(self) -> int: """Return the number of items contained in this tree. >>> t1 = Tree(None, []) >>> len(t1) 0 >>> t2 = Tree(3, [Tree(4, []), Tree(1, [])]) >>> len(t2) 3 """ if self.is_empty(): return 0 else: size = 1 # count the root for subtree in self._subtrees: size += subtree.__len__() # could also do len(subtree) here return size We can generalize this nicely to a template for recursive methods on trees: def f(self) -> ...: if self.is_empty(): ... else: ... for subtree in self._subtrees: ... subtree.f() ... ... Of course, often the ellipses will contain some reference to self._root as well! An explicit size-one case https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/tree_implementation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/6 lOMoARcPSD|10729405 1/9/2021 6.2 A Tree Implementation Often when first dealing with trees, students like to think explicitly about the case where the tree consists of just a single item. We can modify our __len__ implementation to handle this case separately by adding an extra check: def __len__(self): if self.is_empty(): # tree is empty return 0 elif self._subtrees == []: # tree is a single item return 1 else: # tree has at least one subtree size = 1 # count the root for subtree in self._subtrees: size += subtree.__len__() return size Sometimes, this check will be necessary: we’ll want to do something different for a tree with a single item than for either an empty tree or one with at least one subtree. And sometimes, this check will be redundant: the action performed by this case is already handled by the recursive step. In the case of __len__, the la er situation applies. The single-item case is already correctly handled by the recursive step, which will simply return 1 when there are no subtrees, because the loop does not execute. However, the possibility of having a redundant case shouldn’t discourage you from starting off by including this case. Treat the detection and coalescing of redundant cases as part of the code editing process. Your first draft might have some extra code, but that can be removed once you are confident that your implementation is correct. For your reference, here is the three-case recursive Tree code template: def f(self) -> ...: if self.is_empty(): # tree is empty ... elif self._subtrees == []: # tree is a single value ... else: # tree has at least one subtree ... for subtree in self._subtrees: ... subtree.f() ... ... Traversing a tree https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/tree_implementation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/6 lOMoARcPSD|10729405 1/9/2021 6.2 A Tree Implementation Because the elements of a list have a natural order, lists are pre y straightforward to traverse, meaning (among other things) that it’s easy to write a __str__ method that will produce a str containing all of the elements. With trees, there is a non-linear ordering on the elements. How might we write a __str__ method for trees? Here’s an idea: start with the value of the root, then recursively add on the __str__ for each of the subtrees. That’s pre y easy to implement. The base case is when the tree is empty, and in this case the method returns an empty string. def __str__(self) -> str: """Return a string representation of this tree. """ if self.is_empty(): return '' else: # We use newlines (\n) to separate the different values. s = f'{self._root}\n' for subtree in self._subtrees: s += str(subtree) # equivalent to subtree.__str__() return s Consider what happens when we run this on the following tree structure: >>> >>> >>> >>> >>> >>> >>> 6 4 1 2 3 5 t1 = Tree(1, t2 = Tree(2, t3 = Tree(3, t4 = Tree(4, t5 = Tree(5, t6 = Tree(6, print(t6) []) []) []) [t1, t2, t3]) []) [t4, t5]) We know that 6 is the root of the tree, but it’s ambiguous how many children it has. In other words, while the items in the tree are correctly included, we lose the structure of the tree itself. Drawing inspiration from how PyCharm (among many other programs) display the folder structure of our computer’s files, we’re going to use indentation to differentiate between the different levels of a tree. For our example tree, we want __str__ to produce this: >>> # (The same t6 as defined above.) >>> print(t6) 6 4 1 https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/tree_implementation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/6 lOMoARcPSD|10729405 1/9/2021 6.2 A Tree Implementation 2 3 5 In other words, we want __str__ to return a string that has 0 indents before the root value, 1 indent before its children’s values, 2 indents before their children’s values, and so on. But how do we do this? We need the recursive calls to act differently—to return strings with more indentation the deeper down in the tree they are working. In other words, we want information from where a method is called to influence what happens inside the method. This is exactly the problem that parameters are meant to solve! So we’ll pass in an extra parameter for the depth of the current tree, which will be used to add a corresponding number of indents before each value in the str that is returned. We can’t change the API of the __str__ method itself, but we can define a helper method that has this extra parameter: def _str_indented(self, depth: int) -> str: """Return an indented string representation of this tree. The indentation level is specified by the <depth> parameter. """ if self.is_empty(): return '' else: s = ' ' * depth + str(self._root) + '\n' for subtree in self._subtrees: # Note that the 'depth' argument to the recursive call is # modified. s += subtree._str_indented(depth + 1) return s Now we can implement __str__ simply by making a call to _str_indented: def __str__(self) -> str: """Return a string representation of this tree. """ return self._str_indented(0) >>> >>> >>> >>> >>> >>> 6 4 t1 t2 t3 t4 t5 t6 = = = = = = Tree(1, Tree(2, Tree(3, Tree(4, Tree(5, Tree(6, []) []) []) [t1, t2, t3]) []) [t4, t5]) 1 2 3 5 https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/tree_implementation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 5/6 lOMoARcPSD|10729405 1/9/2021 6.2 A Tree Implementation Technical note: optional parameters One way to customize the behaviour of functions is to make a parameter optional by giving it a default value. This can be done for any function, recursive or non-recursive, inside or outside a class. The syntax for doing so is quite simple; we use it in this revised version of _str_indented to give a default value for depth: def _str_indented(self, depth: int=0) -> str: """Return an indented string representation of this tree. The indentation level is specified by the <depth> parameter. """ if self.is_empty(): return '' else: s = ' ' * depth + str(self._root) + '\n' for subtree in self._subtrees: # Note that the 'depth' argument to the recursive call is # modified. s += subtree._str_indented(depth + 1) return s In this version of _str_indented, depth is an optional parameter that can either be included or not included when this method is called. So we can call t._str_indented(5), which sets its depth parameter to 5, as we would expect. However, we can also call t._str_indented() (no argument for depth given), in which case the method is called with the depth parameter set to 0. Optional parameters are a powerful Python feature that allows us to write more flexible functions and methods to be used in a variety of situations. Two important points to keep in mind, though: All optional parameters must appear after all of the required parameters in the function header. Do NOT use mutable values like lists for your optional parameters. (If you do, the code will appear to work, until it mysteriously doesn’t. Feel free to ask more about this during office hours.) Instead, use optional parameters with immutable values like integers, strings, and None. CSC148 CSC148 CSC148Notes Notes NotesTable Table Tableof of ofContents Contents Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/tree_implementation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 6/6 lOMoARcPSD|10729405 1/9/2021 6.3 Mutating Trees 6.3 Mutating Trees Now that we have some experience working with trees, let’s talk about mutating them. There are two fundamental mutating operations that we want to perform on trees: insertion and deletion. We’ll only cover deletion in this section; you’ll implement an insertion algorithm in this week’s lab. Our goal is to implement the following method: def delete_item(self, item: Any) -> bool: """Delete *one* occurrence of <item> from this tree. Return True if <item> was deleted, and False otherwise. Do not modify this tree if it does not contain <item>. """ We’ll start by filling in the code template, as usual. For this case, we’ll use the three-branch version, which explicitly separates the size-one case. 1 1 As we work through the code for each case, draw an example tree so that you can trace what happens to it. def delete_item(self, item: Any) -> bool: """Delete *one* occurrence of <item> from this tree. Return True if <item> was deleted, and False otherwise. Do not modify this tree if it does not contain <item>. """ if self.is_empty(): ... elif self._subtrees == []: ... else: ... for subtree in self._subtrees: ... subtree.delete_item(item) ... ... The base cases of when this tree is empty and when it has a single value are rather straightforward to implement: def delete_item(self, item: Any) -> bool: if self.is_empty(): return False # item is not in the tree elif self._subtrees == []: if self._root != item: # item is not in the tree return False https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/mutating_trees.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/6 lOMoARcPSD|10729405 1/9/2021 6.3 Mutating Trees else: # resulting tree should be empty self._root = None return True else: ... for subtree in self._subtrees: ... subtree.delete_item(item) ... ... In the recursive step, we’re going to first check whether the item is equal to the root; if it is, then we only need to remove the root, and if not, we need to recurse on the subtrees to look further for the item. def delete_item(self, item: Any) -> Bool: if self.is_empty(): return False # item is not in the tree elif self._subtrees == []: if self._root != item: # item is not in the tree return False else: # resulting tree should be empty self._root = None return True else: if self._root == item: self._delete_root() # delete the root return True else: for subtree in self._subtrees: subtree.delete_item(item) Deleting the root when there are subtrees is a li le bit challenging, so we’ll defer that until later. We can use the common strategy of writing a call to a helper method (_delete_root) that doesn’t actually exist yet. The call will remind us to implement the helper later. The final else branch may look done, but it has serious problems: 1. It doesn’t return anything, violating this method’s type contract. 2. If one of the recursive calls successfully finds and deletes the item, no further subtrees should be modified (or even need to be recursed on). The solution to both of these problems lies in the fact that our current loop doesn’t store the value of the recursive calls anywhere. The key insight is that we should use the return value of each recursive call to determine whether an item was deleted, and whether to continue on to the next subtree: def delete_item(self, item: Any) -> Bool: if self.is_empty(): return False # item is not in the tree elif self._subtrees == []: if self._root != item: # item is not in the tree https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/mutating_trees.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/6 lOMoARcPSD|10729405 1/9/2021 6.3 Mutating Trees it return False else: # resulting tree should be empty self._root = None return True else: if self._root == item: self._delete_root() # delete the root return True else: for subtree in self._subtrees: deleted = subtree.delete_item(item) if deleted: # One occurrence of the item was deleted, so we're done. return True else: # No item was deleted. Continue onto the next iteration. # Note that this branch is unnecessary; we've only shown # to write comments. pass from # If we don't return inside the loop, the item is not deleted # any of the subtrees. In this case, the item does not appear # in <self>. return False Next, let’s deal with the one piece we deferred: implementing _delete_root. Note that all it needs to do is delete the root value of the tree, and restructure the tree so that the root value is not None. 2 2 Why mustn’t we leave a None behind? Hint: Look at the representation invariants for the Tree class. There are many, many ways of doing this. Here’s one where we just pick the rightmost subtree, and “promote” its root and subtrees by moving them up a level in the tree. def _delete_root(self) -> None: """Remove the root item of this tree. Precondition: this tree has at least one subtree. """ # Get the last subtree in this tree. chosen_subtree = self._subtrees.pop() self._root = chosen_subtree._root self._subtrees.extend(chosen_subtree._subtrees) This maybe isn’t very satisfying, because while the result certainly is still a tree, it feels like we’ve changed around a lot of the structure of the original tree just to delete a single element. We encourage you to explore other ways to delete the root of a tree. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/mutating_trees.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/6 lOMoARcPSD|10729405 1/9/2021 6.3 Mutating Trees The problem of empty trees We’re not quite done. In our current implementation of delete_item, suppose we delete an item that is a leaf of the given tree. We’ll successfully delete that item, but the result of doing so is an empty tree —so its parent will contain an empty tree in its subtrees list! For example: >>> t = Tree(10, [Tree(1, []), Tree(2, []), Tree(3, [])]) # A tree with leaves 1, 2, and 3 >>> t.delete_item(1) True >>> t.delete_item(2) True >>> t.delete_item(3) True >>> t._subtrees [<__main__.Tree object at 0x081B4770>, <__main__.Tree object at 0x081B49F0>, <__main__.Tree object at 0x0845BB50>] >>> t._subtrees[0].is_empty() and t._subtrees[1].is_empty() and t._subtrees[2].is_empty() True Our tree t now has three empty subtrees! This is certainly unexpected, and depending on how we’ve wri en our Tree methods, this may cause errors in our code. At the very least, these empty subtrees are taking up unnecessary space in our program, and make it slower to iterate through a subtree list. Fixing the problem So instead, if we detect that we deleted a leaf, we should remove the now-empty subtree from its parent’s subtree list. This actually involves only a very small code change in delete_item: def delete_item(self, item: Any) -> Bool: if self.is_empty(): return False # item is not in the tree elif self._subtrees == []: if self._root != item: # item is not in the tree return False else: # resulting tree should be empty self._root = None return True else: if self._root == item: self._delete_root() # delete the root return True else: for subtree in self._subtrees: deleted = subtree.delete_item(item) if deleted and subtree.is_empty(): https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/mutating_trees.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/6 lOMoARcPSD|10729405 1/9/2021 6.3 Mutating Trees # The item was deleted and the subtree is now empty. # We should remove the subtree from the list of subtrees. # Note that mutating a list while looping through it is # EXTREMELY DANGEROUS! # We are only doing it because we return immediately # afterwards, and so no more loop iterations occur. self._subtrees.remove(subtree) return True elif deleted: # The item was deleted, and the subtree is not empty. return True else: # No item was deleted. Continue onto the next iteration. # Note that this branch is unnecessary; we've only shown it # to write comments. pass from # If we don't return inside the loop, the item is not deleted # any of the subtrees. In this case, the item does not appear # in <self>. return False Note that the code for removing a now-empty subtree is within a loop that iterates through the list of subtrees. In general it is extremely dangerous to remove an object from a list as you iterate through it, because this interferes with the iterations of the loop that is underway. We avoid this problem because immediately after removing the subtree, we stop the method by returning True. Implicit assumptions are bad! Representation invariants are good! Up to this point, you’ve probably wondered why we need a base case for an empty tree, since it seems like if we begin with a non-empty tree, our recursive calls would never reach an empty tree. But this is only true if we assume that each _subtrees list doesn’t contain any empty trees! While this may seem like a reasonable assumption, if we don’t make it explicit, there is no guarantee that this assumption will always hold for our trees. Even though we recognized and addressed this issue in our implementation of delete_item, this is not entirely satisfying—what about other mutating methods? Rather than having to always remember to worry about removing empty subtrees, we can make this assumption explicit as a representation invariant for our Tree class: class Tree: # === Private Attributes === _root: Optional[Any] _subtrees: List[Tree] # === Representation Invariants === # - If self._root is None then self._subtrees is an empty list. # This setting of attributes represents an empty tree. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/mutating_trees.html Downloaded by michael ayad (michael.maged2014@gmail.com) 5/6 lOMoARcPSD|10729405 1/9/2021 6.3 Mutating Trees # # Note: self._subtrees may be empty when self._root is not None. # This setting of attributes represents a tree consisting of just one value. # # - (NEW) self._subtrees does not contain any empty trees. With this representation invariant wri en down, future people working on the Tree class won’t have to remember a special rule about empty subtrees—instead, they’ll just need to remember to consult the class’ representation invariants. Exercises 1. Currently, the size-one case in delete_item is not redundant; however, it is possible to modify _delete_root so that this case and the recursive step can be merged, by allowing _delete_root to take a non-empty tree that has no subtrees. Modify the current implementations of _delete_root and delete_item to achieve this. 2. Write a method delete_item_all that deletes every occurrence of the given item from a tree. Think carefully about the order in which you check the root vs. other subtrees here. CSC148 CSC148 CSC148Notes Notes NotesTable Table Tableof of ofContents Contents Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/mutating_trees.html Downloaded by michael ayad (michael.maged2014@gmail.com) 6/6 lOMoARcPSD|10729405 1/9/2021 6.4 Introduction to Binary Search Trees 6.4 Introduction to Binary Search Trees Next, we’re going to learn about a new data structure called the Binary Search Tree (or BST). Binary search trees make certain operations fast, and are the basis of advanced data structures you’ll learn about in CSC263 CSC263 that are even more efficient. CSC263 The Multiset ADT Our goal this week is to take trees and use them to implement the Multiset ADT, 1 which 1 also referred to as the Collection ADT supports the following behaviours: check whether the collection is empty check whether a given item is in the collection add a given item to the collection remove a given item from the collection Notice that this ADT offers a bit more flexibility than the Container-based ADTs such as Stack and Queue that we have seen previously in the course, as it allows the user to choose which item to remove, rather than using a fixed order of removal. Because removing an item requires searching the collection to make sure that the item is present, this ADT also supports __contains__, which searches within the collection by value, rather than by position. It is this “search” behaviour that we will consider first. Searching in lists To search a list, the obvious iterative algorithm (for both Python lists and linked lists) is to loop through all items in the list and stop when the item is found. When the item is not in the list, all items must be checked, making this a * linear time* operation: the time taken for an unsuccessful search grows proportionally with the length of the list. It turns out that the Tree.__contains__ method has the same behaviour: if the item is not in the tree, every item in the tree must be checked. 2 So just switching from lists to trees isn’t 2 In the code, the recursive case must loop through every subtree and make a recursive call. enough to do be er! However, one of the great insights in computer science is that adding some additional structure to the input data can enable new, more efficient algorithms. You have seen a https://www.teach.cs.toronto.edu/~csc148h/winter/notes/binary-search-trees/bst_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/2 lOMoARcPSD|10729405 1/9/2021 6.4 Introduction to Binary Search Trees simple form of this called augmentation in previous labs, but we’ll look at more complex “structures” imposed on data here. In the case of Python lists, if we assume that the list is sorted, then we can use the binary search algorithm to greatly improve the efficiency of search. If you need a refresher on binary search, please check out the “binary search” videos from this CSC108 CSC108playlist playlist. playlist CSC108 playlist But because this is still based on built-in array-based lists, we suffer the same drawbacks for insertion and deletion we encountered previously in Section Section3.4 3.4. 3.4 So the question is: can Section 3.4 we achieve efficient search, insertion, and deletion all at once? Yes we can! Binary search trees: definitions To do this, we will combine the branching structure of trees with the idea of binary search to develop a notion of a “sorted tree”, which we will call a Binary Search Tree (BST). A binary tree is a tree in which every item has at most two subtrees. An item in a binary tree satisfies the binary search tree property if its value is greater than or equal to all items in its left subtree, and less than or equal to all items in its right subtree. 3 3 Note that duplicates of the root are allowed in either subtree in this version. A binary tree is a binary search tree if every item in the tree satisfies the binary search tree property (the “every” is important: in general, it’s possible that some items satisfy this property but others don’t). Binary search trees naturally represent sorted data. That is, even if the data doesn’t arrive for insertion in sorted order, the BST keeps track of it in a sorted fashion. This makes BSTs extremely efficient in doing operations like searching for an item; but unlike sorted Python lists, they can be much more efficient at insertion and deletion while maintaining the sortedness of the data! CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/binary-search-trees/bst_intro.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/2 lOMoARcPSD|10729405 1/9/2021 6.5 Binary Search Tree Implementation and Search 6.5 Binary Search Tree Implementation and Search Our implementation of a BinarySearchTree class is heavily based on Tree, but with a few important differences. First, because we know there are only two subtrees, and the left/right ordering ma ers, we use explicit a ributes to refer to the left and right subtrees: class BinarySearchTree: """Binary Search Tree class. This class represents a binary tree satisfying the Binary Search Tree property: for every node, its value is >= all items stored in its left subtree, and <= all items stored in its right subtree. """ # === Private Attributes === # The item stored at the root of the tree, or None if the tree is empty. _root: Optional[Any] # The left subtree, or None if the tree is empty. _left: Optional[BinarySearchTree] # The right subtree, or None if the tree is empty. _right: Optional[BinarySearchTree] Another difference between BinarySearchTree and Tree is in how we distinguish between empty and non-empty trees. In the Tree class, an empty tree has a _root value of None, and an empty list [] for its list of subtrees. In the BinarySearchTree class, an empty tree also has a _root value of None, but its _left and _right a ributes are set to None as well. Moreover, for BinarySearchTree, an empty tree is the only case where any of the a ributes can be None; when we represent a non-empty tree, we do so by storing the root item (which isn’t None) at the root, and storing BinarySearchTree objects in the _left and _right a ributes. The a ributes _left and _right might refer to empty binary search trees, but this is different from them being None! Any method we add to the BinarySearchTree class (a) can rely upon these properties, and (b) must maintain these properties, since the other methods rely upon them. This is so important that we document them in our representation invariants, along with the BST property itself. # === Representation Invariants === # - If _root is None, then so are _left and _right. # This represents an empty BST. # - If _root is not None, then _left and _right are BinarySearchTrees. # - (BST Property) All items in _left are <= _root, # and all items in _right are >= _root. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/binary-search-trees/bst_implementation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/3 lOMoARcPSD|10729405 1/9/2021 6.5 Binary Search Tree Implementation and Search Here are the initializer and is_empty methods for this class, which is based on the corresponding methods for the Tree class: def __init__(self, root: Optional[Any]) -> None: """Initialize a new BST containing only the given root value. BST BST If <root> is None, initialize an empty BST. """ if root is None: self._root = None self._left = None self._right = None else: self._root = root self._left = BinarySearchTree(None) # self._left is an empty self._right = BinarySearchTree(None) # self._right is an empty def is_empty(self) -> bool: """Return whether this BST is empty. """ return self._root is None Note that we do not allow client code to pass in left and right subtrees as parameters to the initializer. This is because binary search trees have a much stronger restriction on where values can be located in the tree, and so a separate method is used to insert new values into the tree that will ensure the BST property is always satisfied. But before we get to the BST mutating methods (inserting and deleting items), we’ll first study the most important BST non-mutating method: searching for an item. Searching a binary search tree Recall that the key insight of the binary search algorithm in a sorted list is that by comparing the target item with the middle of the list, we can immediately cut in half the remaining items to be searched. An analogous idea holds for BSTs. For general trees, the standard search algorithm is to compare the item against the root, and then search in each of the subtrees until either the item is found, or all the subtrees have been searched. When item is not in the tree, every item must be searched. In stark contrast, for BSTs the initial comparison to the root tells you which subtree you need to check. That is, only one recursive call needs to be made, rather than two! def __contains__(self, item: Any) -> bool: """Return whether <item> is in this BST. """ if self.is_empty(): return False https://www.teach.cs.toronto.edu/~csc148h/winter/notes/binary-search-trees/bst_implementation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/3 lOMoARcPSD|10729405 1/9/2021 6.5 Binary Search Tree Implementation and Search else: if item == self._root: return True elif item < self._root: return item in self._left self._left.__contains__(item) else: return item in self._right self._right.__contains__(item) # or, # or, While this code structure closely matches the empty-check for the general Tree class, we can also combine the two levels of nested ifs to get a slightly more concise version: def __contains__(self, item: Any) -> bool: """Return whether <item> is in this BST. """ if self.is_empty(): return False elif item == self._root: return True elif item < self._root: return item in self._left # or, self._left.__contains__(item) else: return item in self._right # or, self._right.__contains__(item) CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/binary-search-trees/bst_implementation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/3 lOMoARcPSD|10729405 1/9/2021 6.6 Mutating Binary Search Trees 6.6 Mutating Binary Search Trees Now that we have seen how searching works on binary search trees, we will study the two mutating methods of the Multiset/Collection ADT: insertion and deletion. Insertion is covered in lab, so here we’ll only discuss deletion. The basic idea is quite straightforward: Given an item to delete, we take the same approach as __contains__ to search for the item. If we find it, it will be at the root of a subtree (possibly a very small one—even just a leaf), where we delete it: def delete(self, item: Any) -> None: """Remove *one* occurrence of <item> from this BST. Do nothing if <item> is not in the BST. """ if self.is_empty(): pass elif self._root == item: self.delete_root() elif item < self._root: self._left.delete(item) else: self._right.delete(item) def delete_root(self) -> None: """Remove the root of this tree. Precondition: this tree is *non-empty*. """ Note that we are again using the strategy of defining a helper method, delete_root, to pull out part of the required functionality that’s a li le tricky. This keeps our methods from growing too long, and also helps us break down a larger task into smaller steps. We now need to work on delete_root. One thing we might try is to set self._root = None. Certainly this would remove the old value of the root, but this only works if the tree consists of just the root (with no children), so removing it makes the tree empty. In this case, we need to make sure that we also set _left and _right to None as well, to ensure the representation invariant is satisfied. def delete_root(self): if self._left.is_empty() and self._right.is_empty(): self._root = None self._left = None self._right = None https://www.teach.cs.toronto.edu/~csc148h/winter/notes/binary-search-trees/bst_mutation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/3 lOMoARcPSD|10729405 1/9/2021 6.6 Mutating Binary Search Trees What about the case when the tree has at least one other item? We can’t just set self._root = None, leaving a root value of None and yet a child that isn’t None; this would violate our representation invariant. We can think of it as leaving a “hole” in the tree. We can analyse the tree structure to detect two “easy” special cases: when at least one of the subtrees is empty, but the other one isn’t. In these cases, we can simply “promote” the other subtree up. def delete_root(self) -> None: if self._left.is_empty() and self._right.is_empty(): self._root = None self._left = None self._right = None elif self._left.is_empty(): # "Promote" the right subtree. self._root, self._left, self._right = \ self._right._root, self._right._left, self._right._right elif self._right.is_empty(): # "Promote" the left subtree. self._root, self._left, self._right = \ self._left._root, self._left._left, self._left._right Finally, we need to handle the case that both subtrees are non-empty. Rather than restructure the entire tree, we can fill the “hole” at the root by replacing the root item with another value from the tree (and then removing that other value from where it was). The key insight is that there are only two values we could replace it with and still maintain the BST property: the maximum (or, rightmost) value in the left subtree, or the minimum (or, leftmost) value in the right subtree. We’ll pick the left subtree here. def delete_root(self) -> None: if self._left.is_empty() and self._right.is_empty(): self._root = None self._left = None self._right = None elif self._left.is_empty(): # "Promote" the right subtree. self._root, self._left, self._right = \ self._right._root, self._right._left, self._right._right elif self._right.is_empty(): # "Promote" the left subtree. self._root, self._left, self._right = \ self._left._root, self._left._left, self._left._right else: self._root = self._left.extract_max() def extract_max(self) -> object: """Remove and return the maximum item stored in this tree. Precondition: this tree is *non-empty*. """ https://www.teach.cs.toronto.edu/~csc148h/winter/notes/binary-search-trees/bst_mutation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/3 lOMoARcPSD|10729405 1/9/2021 6.6 Mutating Binary Search Trees We’ve once again kicked out the hard part to another helper, extract_max. Finding the maximum item is easy: just keep going right to bigger and bigger values until you can’t anymore. And removing that maximum is much easier than our initial problem of BST deletion because that maximum has at most one child, on the left. (How do we know that?) Here’s the method: def extract_max(self) -> object: """Remove and return the maximum item stored in this tree. Precondition: this tree is *non-empty*. """ if self._right.is_empty(): max_item = self._root # Once again, "Promote" the left subtree. self._root, self._left, self._right = \ self._left._root, self._left._left, self._left._right return max_item else: return self._right.extract_max() The single base case here is actually handling two scenarios: one in which self has a left (but no right) child, and one in which it has no children (i.e., it is a leaf). Confirm for yourself that both of these scenarios are possible, and that the single base case handles both of them correctly. One deletion exercise Try implementing delete_all, which is similar to delete_item except that it deletes all occurrences of an item from a BST. Think carefully about how to handle duplicate elements! CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/binary-search-trees/bst_mutation.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/3 lOMoARcPSD|10729405 1/9/2021 6.7 Binary Search Trees and Running Time 6.7 Binary Search Trees and Running Time Now we return to the reason we started talking about binary search trees in the first place: we wanted a more efficient implementation of the Collection ADT, which supports search, insertion, and deletion. The implementation of __contains__, insert, and delete for BSTs all have the same structure, in that they all make just one recursive call inside the recursive step (they each use the BST property to decide which subtree to recurse into). Let’s focus on __contains__ here. def __contains__(self, item: Any) -> bool: """Return whether <item> is in this BST. """ if self.is_empty(): return False else: if item == self._root: return True elif item < self._root: return item in self._left # or, self._left.__contains__(item) else: return item in self._right # or, self._right.__contains__(item) Each recursive call that is made goes down one level into the tree, so the maximum number of recursive calls that can be made when we perform a search in a tree is equal to the height of the tree plus 1, where the extra call comes because our implementation also recurses into the empty subtree of a leaf. Since each line of code inside __contains__ except the recursive call runs in constant time (i.e., doesn’t depend on the size of the tree), the total running time is proportional to the number of recursive calls made. Because we argued that the maximum number of recursive calls is roughly the height of the tree, we could say that the running time is O(h), where h is the height of the tree. However, this is only partially correct. In fact, if we “get lucky” and search for the root item of the tree, it doesn’t ma er how tall it is: we’ll only ever make one comparison before returning True! Worst-case vs. best-case running time https://www.teach.cs.toronto.edu/~csc148h/winter/notes/binary-search-trees/bst_efficiency.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/3 lOMoARcPSD|10729405 1/9/2021 6.7 Binary Search Trees and Running Time So far in this course, we have mainly focused on how the running time of a function or method depends on the size of its inputs. However, it is very often the case that even for a fixed input size, the running time varies depending on some other properties of the inputs —searching for the root item of a BST vs. searching for an item that is very deep in a BST, for example. It is incorrect to say that the time taken to search for an item of a BST is always equal to h + 1 (where h is the height of the tree); really, this quantity h + 1 is just the maximum of a set of possible running times. We define the worst-case running time of an algorithm as a function WC(n), which maps an input size n to the maximum possible running time for all inputs of size n. What we colloquially refer to as the “worst case” for an algorithm is actually a family of inputs, one per input size, where each input is one that results in the maximum running time for its size. For example, we could say that the “worst case” for BST __contains__ is when “the item we’re searching for causes us to recurse down to the deepest leaf in the tree, and then search one of its empty subtrees.” This is a description of not just one input of a fixed size, but rather a set of inputs that all have a property in common. Since the worst-case running time is a function, we can describe it using our Big-Oh notation. We can say that for BST search, the worst-case running time is O(h), where h is the height of the tree. Similarly, the best-case running time is a function that maps input size to the minimum possible running time for that input size. We can say that the “best case” for BST search is when we search for the root item in the BST; note again that we are not limiting this description to one input size. Since __contains__ returns immediately if it verifies that the root is equal to the item we’re searching for, we can say that the best-case running time of the method is O(1), i.e., independent of the height of the tree. When defining a worst case or best case situation for an algorithm, don’t make any assumptions about the size of the input! Students often say that “the best case is when the BST is empty”; but this is incorrect, since it only considers one input size (0). Whatever description or properties you give for the “worst case” or “best case” should make sense for any input size. Tree height and size You might look at O(h) and recall that we said searching through an unsorted list takes O(n) time, where n is the size of the list. Since both of these expressions look linear, it might seem that BSTs are no be er (in terms of Big-Oh) than unsorted lists. This is where our choice of variables really ma ers. We can say that BST search is, in the worst-case, proportional to the height of the BST; but remember that the height of a BST can be much smaller than its size! https://www.teach.cs.toronto.edu/~csc148h/winter/notes/binary-search-trees/bst_efficiency.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/3 lOMoARcPSD|10729405 1/9/2021 6.7 Binary Search Trees and Running Time In fact, if we consider a BST with n items, its height can be as large as n (in this case, the BST just looks like a list). However, it can be as small as log(n)! Why? Put another way, a tree of height h can have at most 2^h - 1 items (draw some examples to convince yourself of this), so if we have n items to store, we need at least log(n) height to store all of them. So if we can guarantee that BSTs always have height roughly log(n), then in fact all three Collection operations (search, insert, delete) have a worst-case running time of O(h) = O(log n), where h is the height of the BST and n is its size. Even for sorted lists, for which we can use binary search and find items in O(log n) time in the worst case, they are still limited by insertion and deletion at the front, as we discussed earlier in the course. BSTs aren’t—what ma ers is not where the item needs to be inserted, but rather the overall height. Unfortunately, neither the insertion nor deletion algorithms we have covered in this course will guarantee that when we modify the tree, its height remains roughly logarithmic in its size. (One example you explored in the lab is when you insert items into a BST in sorted order.) However, in the later course CSC263, Data Structures and Analysis, you will explore more sophisticated insertion and deletion algorithms that do ensure that the height is always logarithmic, thus guaranteeing the efficiency of these operations! CSC148 Notes Table of Contents CSC148 CSC148Notes NotesTable Tableof ofContents Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/binary-search-trees/bst_efficiency.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/3 lOMoARcPSD|10729405 1/9/2021 6.8 Expression Trees 6.8 Expression Trees To wrap up our study of tree-based data structures in this course, we’re going to look at one particularly rich application of trees: representing programs. Picture a typical Python program you’ve wri en: a few classes, more than a few functions, and dozens or even hundreds of lines of code. As humans, we read and write code as text, and we take for granted the fact that we can ask the computer to run our code to accomplish pre y amazing things. But what actually happens when we “run” a program? Another program, called the Python interpreter, is responsible for taking our file and running it. But as you’ve experienced firsthand by now, writing programs that work directly with text is hard; reading strings of characters and extracting meaning from them requires a lot of fussing with small details. There’s a deeper problem with working directly with text: strings are fundamentally a linear structure, but programs (in Python and other programming languages) are much more complex, and in fact have a naturally recursive structure. For example, we can nest for loops and if statements within each other as many times as we want, in any order that we want. So the first step that the Python interpreter takes when given a file to run is to parse the text from the file, and create a new representation of the program, called an Abstract Syntax Tree (AST). 1 The “Tree” part is significant: given the recursive nature of Python programs, it is 1 This is, in fact, a simplification: given the complex nature of parsing and Python programs, there is usually more than one kind of tree that is created during the execution of the program, representing different “phases” of the process. You’ll learn about this more in a course on programming languages or compilers. natural that we’ll use a tree-based data structure to represent them! This week, we’re going to explore the basics of modeling programs using tree-based data structures. Of course, we aren’t going to be able to model all of the Python language in such a short period of time, and so we’ll focus on a relatively straightforward part of the language: some simple expressions to be evaluated. The Expr class In Python, an expression is a piece of code which is meant to be evaluated, returning the value of that expression. 2 2 This is in contrast with statements, which represent some kind of action like variable assignment or return, and with definitions, using keywords like def and class. Expressions are the basic building blocks of the language, and are necessary for computing anything. But because of the immense variety of expression types in Python, we cannot use https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/expression_trees.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/5 lOMoARcPSD|10729405 1/9/2021 6.8 Expression Trees just one single class to represent all types of expressions. Instead, we’ll use different classes to represent each kind of expression—but use inheritance to ensure that they all follows the same fundamental interface. This will set our implementation of “expression trees” apart from other kinds of tree representations we have seen in the course so far. To begin, here is an abstract class. class Expr: """An abstract class representing a Python expression. """ def evaluate(self) -> Any: """Return the *value* of this expression. be The returned value should be the result of how this expression would evaluated by the Python interpreter. """ raise NotImplementedError Notice that we haven’t specified any a ributes for this class! Every type of expression will use a different set of a ributes to represent the expression. Let’s make this concrete by looking at two expression types. Num: numeric constants The simplest type of Python expression is a literal constant like 3 or 'hello'. We’ll start just by representing numeric constants (ints and floats). As you might expect, this is a pre y simple class, with just a single a ribute representing the value of the constant. class Num(Expr): """An numeric constant literal. === Attributes === n: the value of the constant """ n: Union[int, float] def __init__(self, number: Union[int, float]) -> None: """Initialize a new numeric constant.""" self.n = number def evaluate(self) -> Any: """Return the *value* of this expression. be The returned value should be the result of how this expression would evaluated by the Python interpreter. >>> number = Num(10.5) >>> number.evaluate() https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/expression_trees.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/5 lOMoARcPSD|10729405 1/9/2021 6.8 Expression Trees 10.5 """ return self.n # Simply return the value itself! You can think of constants as being the base cases, or leaves, of an abstract syntax tree. Next, we’ll look at one way of combining these constants together in larger expressions. BinOp: arithmetic operations The obvious way to combine numbers together is through the standard arithmetic operations. In Python, an arithmetic operation is an expression that consists of three parts: a left and right subexpression (the two operands of the expression), and the operator itself. We’ll represent this with the following class: 3 3 For simplicity, we restrict the possible operations to only + and * for this example. class BinOp(Expr): """An arithmetic binary operation. === Attributes === left: the left operand op: the name of the operator right: the right operand === Representation Invariants === - self.op == '+' or self.op == '*' """ left: Expr op: str right: Expr def __init__(self, left: Expr, op: str, right: Expr) -> None: """Initialize a new binary operation expression. Precondition: <op> is the string '+' or '*'. """ self.left = left self.op = op self.right = right Note that the BinOp class is basically a binary tree! Its “root” value is the operator name (stored in the a ribute op), while its left and right “subtrees” represent the two operand subexpressions. For example, we could represent the simple arithmetic expression 3 + 5.5 in the following way: BinOp(Num(3), '+', Num(5.5)) https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/expression_trees.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/5 lOMoARcPSD|10729405 1/9/2021 6.8 Expression Trees But of course, the types of the left and right a ributes aren’t Num, they’re Expr—so either of these can be BinOps as well: # ((3 + 5.5) * (0.5 + (15.2 * -13.3))) BinOp( BinOp(Num(3), '+', Num(5.5)), '*', BinOp( Num(0.5), '+', BinOp(Num(15.2), '*', Num(-13.3))) Now, it might seem like this representation is more complicated, and certainly more verbose. But we must be aware of our own human biases: because we’re used to reading expressions like ((3 + 5.5) * (0.5 + (15.2 * -13.3))), we take it for granted that we can quickly parse this text in our heads to understand its meaning. A computer program like the Python interpreter, on the other hand, can’t do anything “in its head”: a programmer needs to have wri en code for every action it can take! And this is where the tree-like structure of BinOp really shines. To evaluate a binary operation, we first evaluate its left and right operands, and then combine them using the specified arithmetic operator. The code is among the simplest we’ve ever wri en! class BinOp(Expr): def evaluate(self) -> Any: """Return the *value* of this expression. """ left_val = self.left.evaluate() right_val = self.right.evaluate() if self.op == '+': return left_val + right_val elif self.op == '*': return left_val * right_val else: raise ValueError(f'Invalid operator {self.op}') The subtle recursive structure of expression trees Even though the code for BinOp.evaluate looks simple, it actually uses recursion in a pre y subtle way. Notice that we’re making pre y normal-looking recursive calls self.left.evaluate() and self.right.evaluate(), matching the tree structure of BinOp. But… where’s the base case? This is probably the most significant difference between our expression tree representation and the tree-based classes we’ve studied so far in this course. Because we are using multiple subclasses of Expr, there are multiple evaluate methods, one in each subclass. Each time self.left.evaluate and self.right.evaluate are called, they could either refer to BinOp.evaluate or Num.evaluate, depending on the types of self.left and self.right. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/expression_trees.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/5 lOMoARcPSD|10729405 1/9/2021 6.8 Expression Trees In particular, notice that Num.evaluate does not make any subsequent calls to evaluate, since it just returns the object’s n a ribute. This is the true “base case” of evaluate, and it happens to be located in a completely different method than BinOp.evaluate! So fundamentally, evaluate is still an example of structural recursion, just one that spans multiple Expr subclasses. Looking ahead Of course, Python programs consist of much, much more than simple arithmetic expressions! In this course, we’re really only scratching the surface of the full set of classes we would need to completely represent any valid Python code. But even though a complete understanding is beyond the scope of this course, the work that we’re doing here is not merely theoretical, but is actually a concrete part of the Python language itself, and tools which operate on Python programs. It turns out that there is a built-in Python library called ast (short for “abstract syntax tree”) that uses the exact same approach we’ve covered here, but of course is comprehensive enough to cover the entire spectrum of the Python language. If you’re interested in reading more about this, feel free to check out some excellent documentation at hhh ps://greentreesnakes.readthedocs.io ps://greentreesnakes.readthedocs.io. ps://greentreesnakes.readthedocs.io ps://greentreesnakes.readthedocs.io CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/trees/expression_trees.html Downloaded by michael ayad (michael.maged2014@gmail.com) 5/5 lOMoARcPSD|10729405 1/9/2021 7.1 Recursive Sorting Algorithms 7.1 Recursive Sorting Algorithms This week we’re switching gears a li le bit. For the bulk of the course we’ve talked about different abstract data types and the data structures we use to implement them. Now, we’re going to talk about one very specific data-processing operation, which is one of the most fundamental in computer science: sorting. Just as we saw multiple data structures that could be used to represent the same ADT, we’ll look at a few different ways to implement sorting. You’ve studied sorting before in CSC108; all of the basic sorting algorithms you probably saw—bubblesort, selection sort, insertion sort—were iterative, meaning they involved multiple loops through the list. 1 You probably also talked about how their running time 1 You may wish to review these these CSC108 videos theseCSC108 CSC108videos videos. videos was quadratic in the size of the list, so each of these algorithms sorts a list of size n in O(n2) steps. (Why? Briefly, each involves n different loops, where each loop has between 1 and n iterations, and 1 + 2 + 3 + ⋯ + n = n(n + 1)/2.) In this lecture, we’re going to use recursion to develop two faster sorting algorithms, mergesort and quicksort. These are both recursive divide-and-conquer algorithms, which is a general class of algorithms that use the following steps: 1. Split up the input into two or more parts. 2. Recurse on each part separately. 3. Combine the results of the previous step into a single result. Where these two algorithms differ is in the spli ing and combining: mergesort does the “hard” (algorithmically complex) work in the combine step, and quicksort does it in the divide step. Mergesort The first algorithm we’ll study is called mergesort, and takes the “divide-and-conquer” philosophy very literally. The basic idea of this algorithm is that it divides its input list into two halves, recursively sorts each half, and then merges each sorted half into the final sorted list. def mergesort(lst: List) -> List: """Return a sorted list with the same elements as <lst>. This is a *non-mutating* version of mergesort; it does not mutate the input list. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/recursive-sorting/recursive_sorting.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/4 lOMoARcPSD|10729405 1/9/2021 7.1 Recursive Sorting Algorithms """ if len(lst) < 2: return lst[:] else: # Divide the list into two parts, and sort them recursively. mid = len(lst) // 2 left_sorted = mergesort(lst[:mid]) right_sorted = mergesort(lst[mid:]) # Merge the two sorted halves. return _merge(left_sorted, right_sorted) The merge operation While this code looks very straightforward, we’ve hidden the main complexity in the helper function _merge, which needs to take two lists and combine them into one sorted list. For two arbitrary lists, there isn’t an “efficient” way of combining them. 2 For example, the 2 We’ll discuss what we mean by “efficient” here in the next reading. first element of the returned list should be the minimum value in either list, and to find this value we’d need to iterate through each element in both lists. But if we assume that the two lists are sorted, this changes things dramatically. For example, to find the minimum item of lst1 and lst2 when both lists are sorted, we only need to compare lst1[0] and lst2[0], since the minimum must be one of these two values. We can generalize this idea so that after every comparison we make, we can add a new element to the sorted list. This is the key insight that makes the _merge operation efficient, and which gives the algorithm mergesort its name. def _merge(lst1: List, lst2: List) -> List: """Return a sorted list with the elements in <lst1> and <lst2>. Precondition: <lst1> and <lst2> are sorted. """ index1 = 0 index2 = 0 merged = [] while index1 < len(lst1) and index2 < len(lst2): if lst1[index1] <= lst2[index2]: merged.append(lst1[index1]) index1 += 1 else: merged.append(lst2[index2]) index2 += 1 # Now either index1 == len(lst1) or index2 == len(lst2). assert index1 == len(lst1) or index2 == len(lst2) # The remaining elements of the other list https://www.teach.cs.toronto.edu/~csc148h/winter/notes/recursive-sorting/recursive_sorting.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/4 lOMoARcPSD|10729405 1/9/2021 7.1 Recursive Sorting Algorithms # can all be added to the end of <merged>. # Note that at most ONE of lst1[index1:] and lst2[index2:] # is non-empty, but to keep the code simple, we include both. return merged + lst1[index1:] + lst2[index2:] Quicksort While quicksort also uses a divide-and-conquer approach, it takes a different philosophy for dividing up its input list. Here’s some intuition for this approach: suppose we’re sorting a group of people alphabetically by their surname. We do this by first dividing up the people into two groups: those whose surname starts with A-L, and those whose surnames start with M-Z. This can be seen as an “approximate sort”: even though each smaller group is not sorted, we do know that everyone in the A-L group should come before everyone in the M-Z group. Then after sorting each group separately, we’re done: we can simply take the two groups and then concatenate them to obtain a fully sorted list. The formal quicksort algorithm uses exactly this idea: 1. First, it picks some element in its input list and calls it the pivot. 2. It then splits up the list into two parts: the elements less than or equal to the pivot, and those greater than the pivot. 3 This is traditionally called the partitioning step. 3 The implementation we’ll show below always chooses the first element in the list to be the pivot. This has some significant drawbacks, which we’ll discuss in lecture and the next reading. 3. Next, it sorts each part recursively. 4. Finally, it concatenates the two sorted parts, pu ing the pivot in between them. def quicksort(lst: List) -> List: """Return a sorted list with the same elements as <lst>. This is a *non-mutating* version of quicksort; it does not mutate the input list. """ if len(lst) < 2: return lst[:] else: # Pick pivot to be first element. # Could make lots of other choices here (e.g., last, random) pivot = lst[0] # Partition rest of list into two halves smaller, bigger = _partition(lst[1:], pivot) # Recurse on each partition smaller_sorted = quicksort(smaller) bigger_sorted = quicksort(bigger) # Return! Notice the simple combining step return smaller_sorted + [pivot] + bigger_sorted https://www.teach.cs.toronto.edu/~csc148h/winter/notes/recursive-sorting/recursive_sorting.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/4 lOMoARcPSD|10729405 1/9/2021 7.1 Recursive Sorting Algorithms It turns out that implementing the _partition helper is simpler than the _merge helper above: we can do it just using one loop through the list. def _partition(lst: List, pivot: Any) -> Tuple[List, List]: """Return a partition of <lst> with the chosen pivot. Return two lists, where the first contains the items in <lst> that are <= pivot, and the second is the items in <lst> that are > pivot. """ smaller = [] bigger = [] for item in lst: if item <= pivot: smaller.append(item) else: bigger.append(item) return smaller, bigger CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/recursive-sorting/recursive_sorting.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/4 lOMoARcPSD|10729405 1/9/2021 7.2 Efficiency of Recursive Sorting Algorithms 7.2 Efficiency of Recursive Sorting Algorithms Iterative algorithms are typically quite straightforward to analyse, as long as you can determine precisely how many iterations each loop will run. Recursive code can be trickier, because not only do you need to analyse the running time of the non-recursive part of the code, you must also factor in the cost of each recursive call made. You’ll learn how to do this formally in CSC236, but for this course we’re going to see a nice visual intuition for analysing simple recursive functions. Mergesort Recall the mergesort algorithm: def mergesort(lst): if len(lst) < 2: return lst[:] else: mid = len(lst) // 2 # This is the midpoint of lst left_sorted = mergesort(lst[:mid]) right_sorted = mergesort(lst[mid:]) return _merge(left_sorted, right_sorted) Suppose we call mergesort on a list of length n, where n ≥ 2. We first analyze the running time of the code in the branch other than the recursive calls themselves: The “divide” step takes linear time, since the list slicing operations lst[:mid] and lst[mid:] each take roughly n/2 steps to make a copy of the left and right halves of the list, respectively. 1 1 Note that while this slicing occurs in the same line as a recursive call, the slicing itself is considered non-recursive, since it occurs before making the recursive calls. The _merge operation also takes linear time, that is, approximately n steps (why?). The other operations (calling len(lst), arithmetic, and the act of returning) all take constant time, independent of n. Pu ing this together, we say that the running time of the non-recursive part of this algorithm is O(n), or linear with respect to the length of the list. https://www.teach.cs.toronto.edu/~csc148h/winter/notes/recursive-sorting/recursive_sorting_efficiency.html Downloaded by michael ayad (michael.maged2014@gmail.com) 1/4 lOMoARcPSD|10729405 1/9/2021 7.2 Efficiency of Recursive Sorting Algorithms But so far we have ignored the cost of the two recursive calls. Well, since we split the list in half, each new list on which we make a call has length n/2, and each of those merge and slice operations will take approximately n/2 steps, and then these two recursive calls will make four recursive calls, each on a list of length n/4, etc. We can represent the total cost as a big tree, where at the top level we write the cost of the merge operation for the original recursive call, at the second level are the two recursive calls on lists of size n/2, and so on until we reach the base case (lists of length 1). Note that even though it looks like the tree shows the size of the lists at each recursive call, what it’s actually showing is the running time of the non-recursive part of each recursive call, which just happens to be (approximately) equal to the size of the list! The height of this tree is the recursion depth: the number of recursive calls that are made before the base case is reached. Since for mergesort we start with a list of length n and divide the length by 2 until we reach a list of length 1, the recursion depth of mergesort is the number of times you need to divide n by 2 to get 1. Put another way, it’s the number k such that 2k ≈ n. Remember logarithms? This is precisely the definition: k ≈ log n, and so there are approximately log n levels. 2 2 Remember that when we omit the base of a logarithm in computer science, we assume the base is 2, not 10. Finally, notice that at each level, the total cost is n. This makes the total cost of mergesort O(nlog n), which is much be er than the quadratic n2 runtime of insertion and selection sort when n gets very large! You may have noticed that this analysis only depends on the size of the input list, and not the contents of the list; in other words, the same work and recursive calls will be made regardless of the order of the items in the original list. The worst-case and best-case Big-Oh running times for mergesort are the same: O(nlog n). https://www.teach.cs.toronto.edu/~csc148h/winter/notes/recursive-sorting/recursive_sorting_efficiency.html Downloaded by michael ayad (michael.maged2014@gmail.com) 2/4 lOMoARcPSD|10729405 1/9/2021 7.2 Efficiency of Recursive Sorting Algorithms The Perils of Quicksort What about quicksort? Is it possible to do the same analysis? Not quite. The key difference is that with mergesort we know that we’re spli ing up the list into two equal halves (each of size n/2); this isn’t necessarily the case with quicksort! Suppose we get lucky, and at each recursive call we choose the pivot to be the median of the list, so that the two partitions both have size (roughly) n/2. Then problem solved, the analysis is the same as mergesort, and we get the nlog n runtime. 3 3 Something to think about: what do we need to know about _partition for the analysis to be the same? Why is this true? But what if we’re always extremely unlucky with the choice of pivot: say, we always choose the smallest element? Then the partitions are as uneven as possible, with one having no elements, and the other having size n − 1. We get the following tree: 4 4 Note that the “1” squares down the tree represent the cost of making the recursive call on an empty list. Here, the recursion depth is n (the size decreases by 1 at each recursive call), so adding the cost of each level gives us the expression (n − 1) + [n + (n − 1) + (n − 2) + ... + 1] = (n − 1) + n(n + 1)/2, making the runtime be quadratic. This means that for quicksort, the choice of pivot is extremely important, because if we repeatedly choose bad pivots, the runtime gets much worse! Its best-case running time is O(nlog n), while its worst-case running time is O(n2). Quicksort in the “real world” https://www.teach.cs.toronto.edu/~csc148h/winter/notes/recursive-sorting/recursive_sorting_efficiency.html Downloaded by michael ayad (michael.maged2014@gmail.com) 3/4 lOMoARcPSD|10729405 1/9/2021 7.2 Efficiency of Recursive Sorting Algorithms You might wonder if quicksort is truly used in practice, if its worst-case performance is so much worse than mergesort’s. Keep in mind that we’ve swept a lot of details under the rug by saying that both _merge and _partition take O(n) steps—the Big-Oh expression ignores constant factors, and so 5n and 100n are both treated as O(n) even though they are quite different numerically. In practice, the non-recursive parts of quicksort can be significantly faster than the nonrecursive parts of mergesort. For most inputs, quicksort has “smaller constants” than mergesort, meaning that it takes a fewer number of computer operations, and so performed faster. In fact, with some more background in probability theory, we can even talk about the performance of quicksort on a random list of length n; it turns out that the average performance is O(nlog n)—with a smaller constant than mergesort’s O(nlog n)—indicating that the actual “bad” inputs for quicksort are quite rare. You will see this proven formally in CSC263/5. CSC148 CSC148Notes NotesTable Tableof ofContents Contents CSC148 Notes Table of Contents https://www.teach.cs.toronto.edu/~csc148h/winter/notes/recursive-sorting/recursive_sorting_efficiency.html Downloaded by michael ayad (michael.maged2014@gmail.com) 4/4