Introduction to the module COMP5180 Algorithms, Correctness and Efficiency Week 9-1 Anna Jordanous a.k.jordanous@kent.ac.uk Today’s lecture • Introduction to the module • Summary • Assessments • Reading • Teaching staff • Schedule • Drop-in sessions • Motivation for the module • What you did last year in relevant modules • Algorithms, maths*, java • * Next lecture: Maths recap in a little more detail Introduction to the module In this module, we shall explore various algorithms and their underlying data structures, their correctness and their efficiency. We shall take forward our understanding from previous algorithm modules (COMP3830) by exploring data structures like lists, balanced trees and graphs, Recursive algorithms and the recursion tree method for the analysis of recursive algorithms. Plus improvements upon recursive algorithms, like backtracking and dynamic programming. Tools for analysis of the computational complexity of algorithms • like the O() notation and the functions used therein, Finally, we will briefly look at classes of algorithms categorised based on run time • In particular, complexity classes P, NP, NP-hard and NPcomplete and the possibility of proving P = NP or not. Assessments The module will be assessed via: • 50% coursework: • Two programming assessments A1 and A2, each with 25% of total weight of the module • Dates TBC soon once approved • A1 – covering content from Anna’s topics • A2 – covering content from Sergey’s topics • Further details will be provided in the Moodle Assessment section when assessments are set. • 50% written examination at the end of the year Reading • Algorithms are fundamental to computing theory and implementation. • There are many excellent books on the topic, some of which are included in the reading list. • The reading or reference material for each lecture or topic will be specified by the respective lecturer. • The same topic may be explored in several books with different levels of exposition. The more the exposure, the better for learning. • See the Reading list link from Moodle Teaching staff The lectures will be delivered by Anna Jordanous (Convenor) and Sergey Ovchinnik. Contact Anna or Sergey with issues regarding the module. Drop-ins will be run by Joseph Kearney. Classes will be supervised by: Ben Alison, Vanessa Bonthuys, Matthew Hibbin, Joseph Kearney, Antonin Rapini Teaching Schedule • Lectures (weeks 9-14 and 16-20): There will be two 1 hour-long lectures every week • Classes (weeks 10-14 and 16-20): Every week, a student will attend one terminal session or class that is scheduled for a small group of students they belong to. See your timetable. • (No classes this week!) Schedule Week# Week (date) Topic for the week Lecturer 9 26/9/22 Intro to module, recap Anna 10 11 12 13 14 15 16 17 18 19 20 3/10/22 10/10/22 17/10/22 24/10/22 31/10/22 7/11/22 14/11/22 21/11/22 28/11/22 5/12/22 12/12/22 Algorithms basics and examples O notation Balanced Binary Search Trees Graphs Heaps (Anna) / Recursions (Sergey) Project week - no lectures or classes Recursions, Recurrences Solving Recurrences, Sorting Backtracking Dynamic Programming Complexity classes Anna Anna Anna Anna Anna/Sergey Sergey Sergey Sergey Sergey Sergey [What happened to week 1 ??] • Short answer: KentVision • Who’d like to know the long answer? Drop-in sessions • Extended learning / Drop-in consolidation optional sessions • In addition to your compulsory lectures and classes, you might like to make use two optional sessions each week, if you feel that you need extra support. • Drop-ins are held during weeks 9-14 and 16-20 on Mondays and Fridays • Except not this morning • Run by Joseph Kearney Motivation for the module • Getting our software to work as well as possible • “Every program depends on algorithms and data structures, but few programs depend on the invention of brand new ones.” -- Kernighan & Pike • Understanding how things work • Advancing what you learned last year • People in industry care that you know about this • E.g. ‘Hello from Google’ … • (email conversation 2014) “…tips for a successful Google interview: The interview will include topics such as coding, data structures, algorithms, computer science theory, and systems design.” Example: quicksort From ‘Hello from Google’ email for interview prep: Quicksort: empirical analysis Running time estimates: ・Home PC executes 108 compares/second. ・Supercomputer executes 1012 compares/second. insertion sort (N2) mergesort (N log N) “…Sorting: Know how to sort. Don’t do bubblesort. You should know the details of at least one n*log(n) sorting algorithm, preferably two (say, quick sort and merge sort)…” quicksort (N log N) computer thousand million billion thousand million billion thousand million billion home instant 2.8 hours 317 years instant 1 second 18 min instant 0.6 sec 12 min super instant 1 second 1 week instant instant instant instant instant instant Lesson 1. Good algorithms are better than supercomputers. Lesson 2. Great algorithms are better than good ones. 19 Learning outcomes (comp-specific) • On successfully completing the module students will be able to: • 8.1 specify, test, and verify program properties; • 8.2 analyse the time and space behaviour of simple algorithms; • 8.3 use known algorithms to solve programming problems; • 8.4 make informed decisions about the most appropriate data structures and algorithms to use when designing software. Learning outcomes (generic) On successfully completing the module students will be able to: • 9.1 demonstrate an understanding of trade-offs when making design decisions; • 9.2 make effective use of existing techniques to solve problems; • 9.3 demonstrate an understanding of how programs (can fail to) match a specification; • 9.4 analyse and compare solutions to technical problems. Pre-requisites • COMP5200: Further Object-Oriented Programming • (and COMP3200 Introduction to Object-Oriented Programming) • COMP3250: Foundations of Computing II • (and COMP3220 Foundations of Computing I) • COMP3830: Problem Solving with Algorithms Pre-requisites: Java (up to COMP5200) object-oriented program design and implementation, a range of fundamental data structures and algorithms, advanced features of object-orientation, such as: • interface inheritance, • abstract classes, • nested classes, • functional abstractions, • exceptions Pre-requisites: Algorithms (COMP3830) – what you learned introductory algorithms, algorithm correctness, algorithm runtime, big-O notation, essential data structures such as arrays, lists and trees, algorithmic programming skills such as searching and sorting, recursion, and divide and conquer. Pre-requisites: Maths (COMP3250/3220) matrices, logic, functions, vectors, differential calculus, probability, algebra, reasoning and proof, set theory, statistics, computer arithmetic We’ll recap some relevant maths in the next lecture Today’s lecture • Introduction to the module • Summary • Assessments • Reading • Teaching staff • Schedule • Drop-in sessions • Motivation for the module • What you did last year in relevant modules • Algorithms, maths*, java • * Next lecture: Maths recap in a little more detail Which maths do we need? COMP5180 Algorithms, Correctness and Efficiency Week 9-2 Anna Jordanous a.k.jordanous@kent.ac.uk Today’s lecture • Introduction: What maths and why • Polynomials • Exponential functions • Factorial • Logarithm • Ceiling/floor function • Modular arithmetic Meta-Comment • Most of this is not new, it is a recap of comp3220/comp3250 but put into context • Content of this lecture(s) serves mainly as a reference for later, i.e. if you come across some befuddling maths later on, chances are it is explained here • Some textbooks on the subject have a similar structure, they start with the maths they need before commencing with algos What do we use maths for (in COMP5180)? • We want to describe the performance of algorithms, mostly w.r.t. runtime, sometimes w.r.t. memory consumption • For that description we deploy some standard mathematical functions, some of which we will recap here • Generally (there is an exception) we do not use these functions in our programs, they just give us a performance model • exception: spreadsheets, measuring performance • We also need a form of abstraction over these functions that allows the description of paramerised programs and is hardware-independent (O-notation and friends) • Not all of the maths is directly tied to performance description though What functions and why? • We will be looking at which functions are commonly used here, but also… • …why they are commonly used • In addition, this is a refresher in notation, and a clarification of jargon • When describing performance etc. we typically want to do this not for one instance of the problem, but for all of them • we characterise a class of problem by a number of “measurements” and describe the performance as a function of those • very often just one measurement “n”, typically the “size” Polynomials • Any function you can build by using addition* and multiplication** to combine constants, variables and exponents e.g. ! ! + 4! + 7 * subtraction is possible via negative constants, ** division is possible via fractional constants the exponentiation is here just used for notation the same function can be written 7 + !(4 + !); NB you might also see 7 + # ⋅ 4 + # and # ! + 4 ⋅ #+7 • Special cases: linear, quadratic, cubic (polynomials of degree 1,2,3, e.g. ! , ! !, ! ") ! $! # # • These examples are not polynomial: 2 , ! , #, ! • Sometimes we describe a function as polynomial if it is bounded by a polynomial, e.g. ! log !(!) is not a polynomial itself but… Why polynomials? • addition: if you have a sequence of statements )%; )!; where the “time” needed to run statement )& is +& , then the time for the sequence is +% + +! • multiplication: if you have a for-loop repeated N times, e.g. for(int i=0;i<N,i++) !! ; !" ; … and the cost for a single loop iteration is + then the overall cost for the loop is - × + • Arbitrary polynomials then arise through sequences of nested loops • (We see these being useful for Big O notation) Example: bubble sort pseudocode for i = 0 to N - 2 for j = 0 to N - 2 if (A(j) > A(j + 1) multiplication temp = A(j) A(j) = A(j + 1) A(j + 1) = temp end-if end-for end-for addition For this forloop, T=1+1+1 =3 (if we assume time +" for each line here =1 ) Polynomial emerges: N × - × 3 = 3- ! (in practice, we usually ignore the constant ‘3’ , and refer to bubble sort as O ' ! - why? • addition: if you have a sequence of statements '" ; '! ; where the “time” needed to run statement '# is )# , then the time for the sequence is )" + )! . • multiplication: if you have a for-loop for(int i=0;i<N,i++) S; and the cost for a single loop iteration is ) then the overall cost for the loop is * × ). Objection • What if the cost of the loop body is not constant, but depends on the loop variable? • Often it suffices for our purposes to overapproximate the cost by a common value • But if that is too crude then: if +& is the time for the ith iteration and there are N iterations then the overall time is ∑*$% &() +& • This does not appear tremendously helpful, however certain patterns are common, *(*$%) *$% e.g. ∑&() 3 = ! • More generally, if +& is a (degree 4) polynomial over variable 3 then the sum can be expressed as a (degree 4 + 1) polynomial over -. Exponential functions • Functions of the form 6- where c is a constant such that 7 > 1 • These grow eventually faster than any polynomial • Jargon: “exponential” is not synonymous with “very bad”; we merely have exponential functions as performance bounds • Algorithms with exponential performance are usually bad news, because they do not scale at all, however there are “bad algorithms” and “bad problems”, i.e. there is a big distinction between • badly coded problem solutions, and • problems for which there is no efficient solution Example of exponential growth • The spread of COVID-19 (pre-vaccine) https://twitter.com/GaryWarshaw/status/12403026537 64059136/photo/1 Image credit @garywarshaw @SignerLab Example 2: bad maximum function int max(int[] arrayA) { return max(arrayA, arrayA.length-1); } int max(int arrayA[],int to) { if (to==0) return arrayA[0]; if (arrayA[to] > max(arrayA, to-1)) return arrayA[to]; else return max(arrayA, to-1); } Explanation • Because the recursive call max(arrayA, to-1) is not stored in a variable, it could be called 2 times, • and max(arrayA, to-2) could be called 2 times for each of those 2 calls (so 4 times overall) [2! = 4] … • and max(arrayA, to-10) called 1024 times overall [2%)] • and max(arrayA, to-100) will be too often for the lifetime of the computer [2%))] • This will specifically happen if the numbers in the array are in reverse order e.g. int arrayA []=new int[100]; for(int i=0;i<100;i++) arrayA[i]=100-i; 100, 99, … , 2, 1 Example 3: create truth table boolean vals[]=new boolean[VARS]; void truthtable(int v) { if (v==VARS) produceRow(); vals[v]=true; truthtable(v+1); vals[v]=false; truthtable(v+1); } Explanation • The method truthtable produces (a portion of) a truth-table • Parameter v tells us for how many variables we already have a value • If we have one for all (v==VARS) we can add a row to the table • Otherwise, add a value and recurse • This is also exponential (2,-./ ), but here we cannot do better than that, because a truth table is exponential in the number of a variables (Non-recursive version of code for creating truth table) boolean nextrow() { for (int i=VAR-1; i>=0; i--) { if (!vals[i]) { vals[i]=true; return true; } else { vals[i]=false; } } return false; } void truthtable() { do { produceRow(); } while(nextrow()); } Factorial • "! = 1 × 2 × 3 × ⋯ × " − 1 × " • Number of permutations of n distinct elements • Grows faster than exponential functions (but slower than "0 ) • We do not normally consider algorithms with such a bad time complexity, but they can arise • e.g. generate-and-test algorithms can in bad scenarios act like that Generate-and-test? • We do not always have a constructive way to solve problem X • but we may have a way of testing whether a candidate solution is an actual solution • so we can repeatedly produce candidate solutions and test them, until we find the real deal; but if there are a lot of candidates… • Example: a naïve sorting algorithm (‘bogo sort’) would be to permute the elements of an array randomly, if the result is in order we are done; otherwise repeat .! • on average this will require ! many iterations • Generate-and-test is also a popular naïve approach to computational creativity / creative AI (see COMP6590, Stage 3) 2" = 8 log ! 8 = 3 Logarithm • log 0 !, the logarithm of x to base c, is defined as the inverse to exponentiation to base c: (constraint: c>0, x>0) • If 3 $ = 5 6ℎ89 log % 5 = ! • log % 3 $ = ! • Common cases for c: 2, 10, ; (<=>?@=A ABC=@3>ℎE), j • (why these common cases? See next slides) • Algebraic laws: • log % => = log % = + log % >, & ' log % = log % = − log % > • 3 ()*# $ = ! • =' = 3 ()*# &' , hence log % =' = > log % = • NB in the world of algorithms we often just use log without a base, because logs of different bases differ only by a constant factor How do logarithms arise in algorithms? Here are some ways % * • Remember the generic cost of a for-loop ∑&(% +& ? If +& = then this sums up & approximately to log 8 - . • The height of a randomly-built binary search tree (with < elements) is on average 2 log 8 < = log 8 < • Various kinds of trees that are “balanced” are so, because the height is bounded by log 0 < for some c: • 3 = A for AVL trees and also weight-balanced union/find structures • The natural log (log 8 or A<) is useful for exponential growth and decay models such as algorithms for economics models • If we are trying to find the index position of a value v in a sorted array, we can do that in log ! < iterations, where n is the length of the array…: Binary search int find(int v, int arrayB[]) { int low=0; int high=arrayB.length-1; while(low<=high) { int mid=(low+high)/2; if(v==arrayB[mid]) return mid; else if(v<arrayB[mid]) high=mid-1; else low=mid+1; } return -1; //not found } Binary search is O(log n) – i.e. an algorithm with logarithmic complexity meaning run time grows proportionally to the log of the size of the input More on Big O notation later in this module Fibonacci and j Image credit: arbyreed (Flickr) • the golden ratio is the number G = 9:% ≈ 1.618 ! • Fibonacci numbers: K) = 0, K% = 1, K.:! = K.:% + K. • 0, 1, 1, 2, 3, 5, 8, 13, 21, … ; ; %& • for large n we have #$% ≈ G, e.g. ≈ 1.619 ; ; # ' <# $% • approximation (from below): K. ≈ 9 https://www.invisionapp.com/insidedesign/golden-ratio-designers/ • log < makes it into the description of some algorithms, because it is (nearly) the inverse of the Fibonacci function, • i.e. given N to find the < such that K. = N we might as well compute log < N Interested Interested in in Fibonacci? Fibonacci? Here’s Here’s an an episode episode of of BBC BBC podcast podcast ‘In ‘In our our time’ time’ on on “The “The Fibonacci Fibonacci Sequence” Sequence” Example: AVL trees AVL trees are binary search trees such that the height of left and right child can differ by at most 1, and every child is itself an AVL tree. • How many nodes does the most sparsely populated AVL tree of height n have? Let’s call that sequence of numbers !, • Clearly: !- = 0, !. = 1, there is not even a choice at those heights • What about height n+2? One child must have height n+1, for the other we only need height n. • Make both most sparsely populated and we get: B+,! = B+," + B+ + 1 • These are the Leonardo numbers (closely related to Fibonacci numbers): • 1, 1, 3, 5, 9, 15, 25, 41, 67, 109, … • D+ = 2F+," − 1 • Maximum height of AVL tree with m elements: log / (* + 1) [We’ll explore this more in a few weeks] Aside: ceiling/floor function • When converting from real numbers to integers, we have a choice between rounding, rounding up (ceiling), and rounding down (floor) • the ceiling function @ gives rounding up, • e.g. 1.08 =2, 3.72 = 4, 5 =5 • the floor function @ rounds the other way, • e.g. 1.08 =1, 3 = 3 • Generally, if @ is a real number then @ and @ are integers such that: @ − 1 < @ ≤@ ≤ @ <@+1 • On the previous slide, we wanted to know the maximal height of a tree, but heights of trees are integers, so… Modular arithmetic • Sometimes we compute arithmetic operations in Computing modulo a number p, • i.e. instead of computing = + R and =R we are really computing = + R %T and =R %T • This is even true for built-in integers (type int) in Java: T = 2"! • Modular equivalence: we write = ≡= R iff =%T = R%T • e.g. 5 ≡- 9 because 5%4 = 9%4 Generally, if we have > ≡# @ and A ≡# B Generally (for any C) there is an additive inverse of > which is C − > • then we also have = + 3 ≡. > + O and =3 ≡. >O • because > + C − > = C ≡# 0 [multi inverse when C is prime] Example Application • In cryptography, one typically encrypts a message * (large integer) as *P modulo some integer +, where , is also a fairly large integer • *P is monstrously huge, would take ages to compute – but we do not need it as an intermediate result • We can exploit: • E) ≡= 1 • E!G ≡= EG × EG ≡= E! G • E!G:% ≡ EG × EG × E ≡ E! G × E Want to read more about how modular arithmetic and exponents are used in public key cryptography? See Chapter 4, “9 algorithms that changed the future” (John MacCormick) Further additional reading • COMP3250 / COMP3220 notes plus recommended reading from COMP3250 / COMP3220 • COMP3830 notes plus recommended reading from COMP3830 • Algorithms in Java (Sedgewick) Ch 2 • Data structures & problem solving using Java (Weiss) Ch 5,19 • Introduction to algorithms (Cormen et al): relevant parts of Ch 1, 2, 3, 13.3 Glossary • Polynomials: Any function you can build by using addition and multiplication to combine constants, variables and exponents • Exponential functions: Functions of the form 0$ where c is a constant such that 1 > 1 • Factorial: 3! = 1 × 2 × 3 × ⋯ × 3 − 1 × 3 - number of permutations of n distinct elements • Generate-and-test: repeatedly produce candidate solutions and test them, until we find an actual working solution • Logarithm: log % #, the logarithm of x to base c, is defined as the inverse to exponentiation to base c: (constraint: c>0, x>0) • golden ratio: the number < = &'( ≈ 1.618 ! • Fibonacci numbers: A) = 0, A( = 1, A*'! = A*'( + A* . Sequence 0, 1, 1, 2, 3, 5, 8, 13, 21, … • Ceiling function: rounding up a number • Floor function: rounding down a number • Modular arithmetic: compute arithmetic operations modulo a number p Today’s lecture • Introduction: What maths and why • Polynomials • Exponential functions • Factorial • Logarithm • Ceiling/floor function • Modular arithmetic Idea of a Data Structure COMP5180 Algorithms, Correctness and Efficiency Anna Jordanous a.k.jordanous@kent.ac.uk Today’s lecture • Goals of a data structure • Data structures – flexible vs fixed • Examples • Pros and cons • Hybrid approaches for the examples • Data structures for algorithms • Case study: Dynamic arrays (Array Lists) • Description and representation • Performance analysis What are the goals of a Data Structure? • We want to store data in it • We want to access and manipulate the data efficiently, via dedicated methods • We want to use memory effectively How is the data organised? This varies, but there are two substantially different approaches. 1. Flexible sized memory, flexible structure: • The data structure is made up of similarly behaving “tiles”, linked together by object references (pointers) • Adding data just adds another tile; sometimes tiles are rewired for long-term performance gain 2. Fixed size memory, rigid structure: • A fixed amount of memory is associated with the structure • When accessing parts of the data a small amount of processing info needs to be maintained • When adding data we may need to clone & destroy the old structure Example (type 1), Top-to-Bottom tree class Tree { int data; Tree left, right; } Tree x,y,z; x= new Tree(); y=new Tree(); z=new Tree(); x.data=4; y.data=3; z.data=7; x.left=y; x.right=z; 4 3 7 Example (type 2), Tree with fixed positioning int SIZE=3; int data[] = new int[SIZE]; data[0]=4; data[1]=3; data[2]=7; 4 3 7 Note: tree is the same as before. We use the convention that left child of node 0 is in position 1, right child in position 2. General convention: left child of node n is at 2n+1, right child at 2n+2. Pros and Cons • flexible approach: • pros: substructures can be moved as a whole, shape is flexible, incremental structure growth is easy • cons’s: substantial proportion of memory dedicated to object references (here: 75%) • fixed approach • pros: links implicit, and go both ways, efficient use of memory when full • con’s: no substructures, empty substructures need to be encoded, inefficient use of memory when nearly empty, growth may run into stop-the-world (necessary pause to free up/reorganise memory) Which is better? • This depends on the application: • Binary search trees [old+new]: flexible approach is better, because 1. we can do balancing [new] efficiently as we can move entire subtrees 2. even if we do not balance, the worst-case memory footprint of the rigid structure is exponential to tree size, linear for flexible • Binary heaps [new]: rigid approach is better, because 1. trees can/should be kept rigidly balanced all the time anyway 2. we need to traverse tree up as well as down 3. between 50% and 100% of memory is payload (contains the actual data) (instead of 25%) Hybrid approaches between those • Hash tables typically use a rigid structure at the top, with a flexible structure underneath – kind of an orchard of trees • no stop-the-world when (rigid part of) table becomes full • performance deterioration when full is minimal • Trees/Forests over a fixed number of data points that have a unique (if any) parent are often represented as int arrays or finite maps • links remain explicit; what to do if data points aren’t fixed? • used in Dijkstra’s algorithm (finding shortest path), union/find What are we investigating when looking at a Data Structure and its Algorithms? • we typically characterize performance when n pieces of elementary data are represented, i.e. the following is relative to n or O(n) • how much computer memory do we need for all that data? • this is a question we often ignore, e.g. when rivalling implementations barely differ in that respect (typical for rivalling flexible approaches) • how much time does it take to perform operations X and Y • …in the worst case scenario • …in a random scenario (on average) • …on average over the lifetime of a data structure (amortized time complexity) Case Study: Array Lists • arrays are a primitive (in most PLs) data structure that is rigid at least at creation time (at compilation time in some PLs) • dynamic arrays (= array lists) are a way to modify arrays to incorporate incremental growth somehow • main difference in usage (other than syntax): • only op to store data in arrays is “set”: array[index]=data; arraylist.set(index,data) is disabled unless previous indexes are assigned • main op to store data in arraylists is “add”, arraylist.add(data); corresponding op for arrays does not exist, but can be faked with additional fields How do dynamic arrays work? internal array dynamic array object Representation of Dynamic Array • a dynamic array has two/three fields: 1. an actual array that contains all the data 2. an integer index that points to the next free cell in the array 3. an integer field giving the length of the array (the “capacity”); in Java this comes for free with the array, but not in all programming languages (PLs) • the idea on the previous slide was: the dynamic array contained 4 elements, so position 4 was the first free position in the array • if the array is full the index is equal to the length How does add work? It may fit… internal array dynamic array object How does add work? It may not fit… internal array dynamic array object So if the new element added does not fit… • a fresh bigger array is created (bigger by a factor of about 1.5) • this becomes the new internal array of the dynamic array • all the data is copied across from old array to new array • now there is room for the extra element • the old internal array will be picked up by the garbage collector at some point Performance? • In the best case O(1) – we check whether there is room, there is, we put the data in • In the worst case O(n) – we check whether there is room, there is not, we create a fresh array, copy n elements across, put the data in • On average, over the lifetime of the data structure? Hm. This is the interesting bit. To get a good performance it is important that the array is grown by a factor, not just a fixed amount! Average performance of add • To compute the “amortized” time complexity we have to add up all the work that happens over the lifetime (n add operations) of the dynamic array and divide it by n. How? • We have the cost of storing all those n elements themselves, over the lifetime that amounts to O(n) • We have the cost of copying all those cells each time the array is replaced. How many are copied over all? • For simplicity, suppose we grow the array by factor of 2. In the worst case, all cells would have been copied once, half of those a second time, half those a third time, in the limit: 2n copies, average at 2. • Cost for n add ops is O(n), so O(1) for a single op. Not examinable ArrayList themselves grow by k=1.5 • On average cells are copied 3 times (worst case). Why? • in general, when the growth factor is k, cells will be copied 1/(1-1/k) % $ ! times. This derives from the formula ∑!"# " = %&' for " < 1. This % applies here for " = ( . • thus: k=1.5, 1/k=2/3, 1-1/k=1/3, 1/(1-1/k)=3 Other ops on dynamic arrays • what about other operations on dynamic arrays? • we could look at the implementation, but representation itself tells us what is possible at best • for example, contains (checking whether an object is in the structure) is surely O(n): • if it is there, we will n/2 comparisons on average to find it • if not, we need n comparisons to make sure it is absent • either case is O(n) An intriguing case is removing an element • there are two issues of concern here 1. how is the element to be removed identified? By itself, or by an index position of the dynamic array? 2. what promises, if any, do we make about index positions of other elements in the array, before and after? • the actual Java API for ArrayList “remove” has two versions, for either kind of removal • both of these are necessarily O(n), but with a weaker promise on index positions remove(int i) could be realised as O(1). Further additional reading • Algorithms in Java (Sedgewick 2003) – Ch 3 • Cracking the coding interview: 150 programming questions and solutions (McDowell 2013) – Ch 1,2 • Data structures & problem solving using Java (Weiss 2010) – Ch 2, 15 • Introduction to algorithms (Cormen et al 2009) - Ch 3, also intro to Section III Glossary • stop-the-world – a pause needed to perform garbage collection e.g. when adding to a fixed data structure • garbage collection (freeing up/reorganise memory) • payload - the part of transmitted data that is the actual intended message • substructure – a structure within a bigger structure • amortized time complexity - average time complexity, based on total cost of operations over the lifetime of a data structure • primitive data structure – an inbuilt data structure in a programming language • dynamic arrays - modified arrays that allow incremental growth during runtime (= array lists) • performance – run time measurements of an algorithm, or of operations on a data structure • operation - a function that gets performed, e.g. on a data structure • PL = programming language Today’s lecture • Goals of a data structure • Data structures – flexible vs fixed • Examples • Pros and cons • Hybrid approaches for the examples • Data structures for algorithms • Case study: Dynamic arrays (Array Lists) • Description and representation • Performance analysis A complete binary tree in nature Balanced Binary Search Trees COMP5180 Algorithms, Correctness and Efficiency Anna Jordanous a.k.jordanous@kent.ac.uk 15 Today’s lecture • Using Trees for searching data and keeping it sorted • Binary tree and Binary Search tree recap • Tree height and number of nodes • Balanced trees and complete trees • Constructing balanced trees • AVL trees • Red-Black trees • [2-3 trees *] • [B-Trees *] * = Non examinable content * We won’t cover details of 2-3 trees or B-Trees. Slides on 2-3 trees and B-Trees are kept in this powerpoint for reference only, if you are interested. Details of 2-3 trees and B-Trees are not examinable – you are expected to know that these are two other ways of constructing balanced trees, but not how they work. Trees 2 9 2 1 3 5 Let’s first recap Binary Trees A binary tree is a linked data structure where each node has links to two other other nodes (left and right). ROOT NODE 7 RIGHT LEFT 3 2 null 9 null null 5 null null LEAF null null 9 null Similarly, right refers to a subtree of nodes that come after the root node. A node which has empty left and right links (i.e. a node which doesn’t point to any other nodes) is called a leaf node 8 1 The entry node is called the root node and left leads to a subtree of values that come before the root node. LEAF null We use family hierarchy terms such as parent, child, sibling, ancestor, descendant, to describe relationships between nodes Searching through the tree to find items Root node 7 right left If we are looking for an item in a binary tree, we have two ways to search through the tree: 1. Depth first search 2. Breadth first search 3 2 null 9 null 8 1 null 5 null null null null 9 null null A binary search tree example (using integer keys) Root node 7 right left 4 2 null null 5 null null null 10 a right child node’s value is always > its parent node’s value null And if we have two nodes with the same value? 8 6 null 9 null In a binary search tree a left child node’s value is always < its parent node’s value (either don’t allow duplicate keys, or make a decision to always put them left or right, or use hash functions to generate unique keys) null we don’t need to choose between depth first or breadth first search, we are guided by the values: Searching a BST Root node 7 • right left • 10 4 if the tree is empty then the key cannot be found. Otherwise compare the key we are looking (k) for with the key in the root node (root_k). If equal, then the search is over if k < root_k then ignore the right subtree, return the result of searching the left subtree. If k > root_k then return result of searching the right subtree. • null 2 null 8 6 null 5 null null null • • null 9 null null http://algs4.cs.princeton.edu/home/ What a Tree class looks like • The key field, used for comparing nodes is an integer, to keep things simple. It could just as easily be a String. • We have called the node class Tree since it can be the root of a whole tree. class Tree { private int key; private Tree left; private Tree right; // any other fields }; // functions for // manipulating trees Issues with (not) using null as empty tree • the empty tree issue, there is a choice: • we can use null for the empty tree (most common, not very OO) • makes recursive code a bit awkward: use static methods instead, or lots of null-checks • we can use a dedicated EmptyTree class; issues: • makes loops awkward • if the empty tree is not shared we waste half the memory • we can use a dedicated object of the Tree class as "I am empty" • half-OO = compromise between the two alternatives above pure OO version abstract class Tree { abstract Tree search(int key); abstract Tree insert(int key); } class EmptyTree extends Tree { Tree search (int k) { return this; } Tree insert(int k) { return new NETree(this,k,this); } } class NETree extends Tree { private int key; private Tree left,right; // constructor... Tree search (int k) { if (k==key) return this; if (k<key) return left.search(k); else return right.search(k); } Tree insert(int k) { ... } }} 11 Half-OO, faking Empty tree class class Tree { private Tree left,right; private int key; // constructor... private final static Tree empty = new Tree(null,0,null); Tree search(int k) { Tree cur=this; while (cur!=empty) { if (k==cur.key) return cur; cur=k<cur.key?cur.left:cur.right; } return null; } Tree insert(int k) { if (this==empty) return new Tree(this,k,this); if (key==k) return this; if (k<key) left=left.insert(k); else right=right.insert(k); return this; } 12 Tree traversal 7 • inorder tree traversal • visit left node (recursive), • then visit root node, • then visit right node (recursive) • 2, 4, 6, 7, 8, 10 right left SORTED LIST if we traverse a BST inorder 10 4 • preorder tree traversal • visit root node, • then visit left node (recursive), • then visit right node (recursive) • 7, 4, 2, 6, 10, 8 Root node null 2 6 8 • postorder tree traversal • visit left node (recursive), • then visit right node (recursive), • then visit root node • 2, 6, 4, 8, 10, 7 null null null null null null How efficient are binary search trees? • The algorithms to search, traverse and insert in a tree move one step down the tree at each iteration (or each recursive method call). • Means that the time taken is limited by the height of the tree. • A tree's height is the maximum distance from the root to a leaf node (that is, a node with no subtrees). • The amount of data stored in a tree (its size) is the number of nodes hash table—example e.g. BSTs vs Hash Tables 0 null 1 null 2 3 null 4 null 5 Fred Alan 8822 43333 null null Mary 7 null null 8 null 9 null 6 4227 10 11 John 3929 null null • Binary search trees and hash tables provide alternative ways to implement a lookup-table abstract data type. • Hash tables are potentially faster • Search time is the same however many records are stored • Hash tables are the usual way to implement a lookup table where speed is very important • Generally, there is a trade-off between table size and speed • BSTs are more flexible and often easier • can create off the top of your head if you understand the principles • can search for more than just an exact match… (see next slide) Flexible Tree Searching Example • Suppose we want write a program to keep track of lengths of timber in a store. • If someone needs a 7.3 meter length then we want to find the piece that is nearest to 7.3 meters but is not smaller. • can trim it to size (with some wastage) but cannot make it longer. • We’d do this by amending the search... • How would you go about this? Today’s lecture • Using Trees for searching data and keeping it sorted • Binary tree and Binary Search tree recap • Tree height and number of nodes • Balanced trees and complete trees • Constructing balanced trees • AVL trees • Red-Black trees • [2-3 trees *] • [B-Trees *] What is the relationship between the height of a tree and its size? • It depends on the shape of the tree. • A tree in which each left subtree is null will effectively be a linked list and its height h will be equal to its size (number of nodes) n. • h=n • Same if each right subtree is null • But this doesn’t look like a very efficient tree… Can we do better? null null null null null null null (We’ll need to know about balanced trees) • A tree is said to be balanced if the height of the two subtrees of every node never differ by more than 1 • NB the height of an empty tree is undefined… but for the purposes of balancing trees, assume the height of an empty tree is -1 What is the best we can do? • The best (most efficient) tree of a given size is the one with the least height. • It is easier to turn the problem round and ask: • How many nodes can we fit into a tree with a given height? a tree of height h=0 has up to n=1 nodes (1 root) a tree of height h=1 can have up to n=3 nodes (1 root + 2 children) a tree of height h=2 can have up to n=7 nodes ( 1 root + 2 children + 4 grandchildren) etc… n=1 when h =0 n=1+2 when h =1 n =1 + 2 + 4 when h = 2 This is the same as saying: n = 20 when h=0 This is the same as saying: n = 20 + 21 when h =1 This is the same as saying: n = 20 + 21 + 22 when h =2 Tree height • A tree of height h can contain up to 1+2+4+ ... +2h nodes h n 0 1 1 3 2 7 • This is a geometric progression and so n (#of nodes) = 2h+1 – 1 • A tree is balanced if the height of the two subtrees of every node never differ by more than 1 • If we have a balanced tree with n nodes and height h: • 2h – 1 < n ≤ 2h+1 – 1 • So 2h < n+1 ≤ 2h+1 • If we take logs we get • h < log2(n+1) ≤ h+1 • • • • So we know now: h < log2(n+1) and h+1 ≥ log2(n+1) • So: h < log2(n+1) and h ≥ log2(n+1) -1 Let’s simplify this a bit: h is roughly log2(n+1) Once n gets big, there’s not much point distinguishing between n and n+1. So let’s ditch the +1s • h is about log2n - now we can calculate the best possible height What is the height of a tree with a million nodes? [n = Remember: 1,000,000] • 2 < n+1 ≤ 2 h • h is approximately log2n • log2 1,000,000 = 19.931568569 • So h is approximately 20 (to nearest whole number) h+1 • h is approximately log2n • The best (most efficient) tree of a given size is the one with the least height. • So a tree with 1,000,000 nodes can have any height between 20 null (minimum, balanced) and 1,000,000 (maximum) null • We ideally want our tree to be much closer to height 20 than 1,000,000. A tree of height 20 will be much faster to search than one null of height 1,000,000. • Unfortunately, if we start with 1,000,000 keys that have been sorted into order, and then build a tree by adding them one at a time, we will end up with a tree that has height 1,000,000. This is a problem when we come to binary search trees null null null null nary tree. Empty or node with links to left and right binary trees. Complete trees omplete Perfectly except • Atree. complete binary balanced, tree is a binary tree for withbottom all levelslevel. except the last level completely filled, and with all the leaves in the last level to the left side. complete tree with N = 16 nodes (height = 4) Complete trees • A complete binary tree is a binary tree with all levels except the last level completely filled, and with all the leaves in the last level to the left side. • Every complete binary tree is balanced.. but this is not necessarily true the other way around. • Below, which of the balanced trees are also complete? Which are not complete? • Can you understand why the unbalanced trees cannot be complete trees? • (Think about it for a bit…) Tree-height questions • Suppose we have nodes in random order and we build a tree by inserting them one at a time, how high will the tree be on average? • answer is about 1.38 log2n for n nodes • (proof – see Weiss’s “Data Structures ..”, Theorem 19.1, p. 704) • means a random tree with 1,000,000 nodes will have height about 27 (which is good). • Suppose that you want to build a binary search tree out of nodes that are (or might be) already in order, can you do it in such a way that the tree is guaranteed not to be high? Today’s lecture • Using Trees for searching data and keeping it sorted • Binary tree and Binary Search tree recap • Tree height and number of nodes • Balanced trees and complete trees • Constructing balanced trees • AVL trees • Red-Black trees • [2-3 trees *] • [B-Trees *] Are these BSTs? Are thesetrees AVL balanced trees or not? Reminder: A tree is said to be balanced if the height of the two subtrees of every node never differ by more than 1 NB the height of an empty tree is undefined… but for the purposes of balancing trees, assume the height of an empty tree is -1 Example adapted from Tomas Petricek’s material Constructing balanced BSTs • Suppose that you want to build a binary search tree out of nodes that are (or might be) already in order, can you do it in such a way that the tree is guaranteed not to be high? • (in other words, keep it as balanced as possible?) • There are ways to build a binary search tree so that it stays pretty* balanced whatever order the nodes are added. • 2-3 trees • AVL trees • Red-black trees • B-trees Example: AVL Trees • There are ways to build a binary search tree so that it stays pretty* balanced whatever order the nodes are added. • One way is to build an AVL tree (named after the inventors AdelsonVelskii and Landis). • An AVL tree has the following properties • the root's left and right subtrees differ in height by at most 1 • the root's left and right subtrees are both AVL trees • An AVL tree can be built by adding nodes and then rebalancing when it loses the AVL property. AVL Trees • in AVL trees we distinguish between nodes that are balanced, or have a slight left tilt, or a slight right tilt • a tilt is introduced by height changes • slight tilt: height differs by 1 • tilt that leaves the tree unbalanced: height differs by >1 • fix tilt problems by "tree rotation" (of subtrees) Rotations https://www.cs.odu.edu/~zeil/cs361/latest/Public/avl/index.html Rotations https://en.m.wikipedia.org/wiki/Tree_rotation Example: Red-Black trees There are ways to build a binary search tree so that it stays pretty* balanced whatever order the nodes are added. One way is to build an Red-Black tree. A Red-Black tree has the following properties • A node can be red or black • normal (black) nodes and overflow (red) nodes • Root node is always black • Null leaves of the tree are black • If a node is red, then its children are black Red-Black Trees (continued) • in red-black trees we need to store data to distinguish between the normal (black) nodes and overflow (red) nodes • additional data: one bit (red/black) • red nodes are only allowed to occur underneath black nodes • invariant: any path from the root to an empty tree passes through exactly the same number of black nodes: black height • fix problems by rotations (of subtrees) CLR p.309: “We call the number of black nodes on any simple path from, but not including, a node x down to a leaf the black-height of the node, denoted bh(x). … We define the black-height of a red-black tree to be the black-height of its root. “ A red black tree example 7 Root node right left 2 8 6 null • normal=black, overflow=red 11 4 null RULES for red-black trees: • A node can be red or black 5 null null null 14 null null 9 null • Root node is always black • Null leaves are black • If a node is red, then its children are black • red nodes are only allowed to occur underneath black nodes • invariant: any path from the root to an empty tree passes through exactly the same number of black nodes • (black height) null null Insertion into a red-black tree • new data is entered into the tree as an overflow node (red) • so black height not affected (yet) • if red node appears underneath red node (violation of invariant) this part of the tree is rotated, pushing a red node up • several cases to be considered, we will look through two of those in the slides • https://www.youtube.com/watch?v=5IBxA-bZZH8 summary: * • if red node reaches top, it is turned black Insertion into a red-black tree • 0: Z = root -> colour Z black • 1: Z.uncle = red -> flip colour of parent, uncle and grandparent • 2: Z.uncle = black (triangle – Z is a Left child and parent is a Right child, or vv) -> rotate so Z becomes the parent • 3: Z.uncle = black (line - Z is a Left child and parent is Left child, or both Right children) -> rotate so Z’s parent becomes the grandparent, then recolour as needed • NB Michael Sambol gives the pseudocode from CLR CONFUSED? See p. 316 of CLR for an alternative representation of the cases 1-3, and the Rob Edwards video at the end for another alternative representation Insertion into a red-black tree • new data is entered into the tree as an overflow node (red) • so black height not affected (yet) • if red node appears underneath red node (violation of invariant) this part of the tree is rotated, pushing a red node up • several cases to be considered, we will look through two of those in the slides • https://www.youtube.com/watch?v=5IBxA-bZZH8 summary: * • if red node reaches top, it is turned black A red black tree example 7 Root node right left 2 5 null null null 14 8 6 null • normal=black, overflow=red 11 4 null RULES for red-black trees: • A node can be red or black 9 null null null • Root node is always black • Null leaves are black • If a node is red, then its children are black • red nodes are only allowed to occur underneath black nodes • invariant: any path from the root to an empty tree passes through exactly the same number of black nodes • (black height) null null Example: Inserting 1 7 Root node right left 2 1 null null 5 null null null 14 8 6 null • normal=black, overflow=red 11 4 null RULES for red-black trees: • A node can be red or black 9 null null null • Root node is always black • Null leaves are black • If a node is red, then its children are black • red nodes are only allowed to occur underneath black nodes • invariant: any path from the root to an empty tree passes through exactly the same number of black nodes • (black height) null null Example: Inserting 10, a red node under 9 7 Root node right left 2 6 null null null null null • normal=black, overflow=red 11 4 1 RULES for red-black trees: • A node can be red or black 5 14 8 9 null null 10 null null null null null • Root node is always black • Null leaves are black • If a node is red, then its children are black • red nodes are only allowed to occur underneath black nodes • invariant: any path from the root to an empty tree passes through exactly the same number of black nodes • (black height) Step 1: repair tree (that was) rooted at 8 7 Root node 2 6 null null null null • normal=black, overflow=red • Root node is always black 11 4 1 RULES for red-black trees: • A node can be red or black right left 5 9 8 null null null null 14 null 10 rotate so Z’s parent becomes the grandparent, then recolour as needed null • Null leaves are black • If a node is red, then its children are black • red nodes are only allowed to occur underneath black nodes • invariant: any path from the root to an empty tree passes through exactly the same number of black nodes • (black height) null null Step 2: repair tree rooted at 7 7 Root node 2 6 null null null null • normal=black, overflow=red • Root node is always black 11 4 1 RULES for red-black trees: • A node can be red or black right left 5 9 8 null null null null 14 null 10 flip colour of parent, uncle and grandparent null • Null leaves are black • If a node is red, then its children are black • red nodes are only allowed to occur underneath black nodes • invariant: any path from the root to an empty tree passes through exactly the same number of black nodes • (black height) null null Final step: blacken root node 7 right • normal=black, overflow=red • Root node is always black 11 4 2 6 null null null null RULES for red-black trees: • A node can be red or black Root node left 1 Colour Z black 5 9 null 8 null null null null 14 10 null • Null leaves are black • If a node is red, then its children are black • red nodes are only allowed to occur underneath black nodes • invariant: any path from the root to an empty tree passes through exactly the same number of black nodes • (black height) null null (CLR pp316-7) Red-black trees Fixes, for red-red violations Case 1: z’s uncle y is red -> flip colours above z Case 2: z’s uncle y is black and z is a right child -> left rotate at z (NB this leads into Case 3) Case 3: z’s uncle y is black and z is a left child -> right rotate (at parent of z) Finish by recolouring root black as needed Observations • If the black height of a red-black tree is !, then... • ...its ordinary height is "(!) • minimum number of nodes in tree for black height !: when there are no overflow nodes: 2" − 1 • maximum number of nodes in a tree when all black nodes have 2 red children: 4" − 1 • thus the ordinary height of a red-black tree with k elements is between log # , and 2 - log # ,, thus it is "(log ,). Comparison: AVL vs Red-Black • AVL trees are more balanced than red-black trees • But AVL trees are slightly more costly to administer • Minimum number of nodes in a tree of heights 20 (& 40) • AVL: 17,710 (267,914,295) • Red-Black: 4,090 (4,194,298) • Which one to use? • If it is more important to keep the tree balanced, use AVL • Red-black trees are a nice compromise if you want to keep the tree fairly balanced without spending lots of extra time doing rotations • In other words, do you want to incur extra performance cost while building the tree (AVL trees) or while using the tree (red-black trees)? * = Non examinable Example: 2-3 trees There are ways to build a binary search tree so that it stays pretty* balanced whatever order the nodes are added. One way is to build a 2-3 tree. * = Non examinable 2-3 tree Allow 1 or 2 keys per node. ・2-node: one key, two children. ・3-node: two keys, three children. Symmetric order. Inorder traversal yields keys in ascending order. Perfect balance. Every path from root to null link has same length. how to maintain? M 3-node 2-node smaller than E E J AC H between E and J R larger than J L P SX null link 4 Taken from https://algs4.cs.princeton.edu/lectures/keynote/33BalancedSearchTrees.pdf * = Non examinable Insertion into a 2-3 tree Insertion into a 2-node at bottom. ・Add new key to 2-node to create a 3-node. insert G L L E AC E R H P SX AC R GH P SX 6 Taken from https://algs4.cs.princeton.edu/lectures/keynote/33BalancedSearchTrees.pdf * = Non examinable Insertion into a 2-3 tree Insertion into a 3-node at bottom. ・Add new key to 3-node to create temporary 4-node. ・Move middle key in 4-node into parent. ・Repeat up the tree, as necessary. ・If you reach the root and it's a 4-node, split it into three 2-nodes. insert Z L L E AC E R H P SX AC RX H P S Z 7 Taken from https://algs4.cs.princeton.edu/lectures/keynote/33BalancedSearchTrees.pdf * = Non examinable https://www.youtube.com/watch?v=bhKixY-cZHE Taken from https://algs4.cs.princeton.edu/lectures/keynote/33BalancedSearchTrees.pdf * = Non examinable 2-3 tree: implementation? Direct implementation is complicated, because: ・Maintaining multiple node types is cumbersome. ・Need multiple compares to move down tree. ・Need to move back up the tree to split 4-nodes. ・Large number of cases for splitting. fantasy code public void put(Key key, Value val) { Node x = root; “ Beautiful algorithms are not always most useful. ” while (x.getTheCorrectChild(key) != the null) { Donald Knuth x— = x.getTheCorrectChildKey(); if (x.is4Node()) x.split(); } if (x.is2Node()) x.make3Node(key, val); else if (x.is3Node()) x.make4Node(key, val); } Bottom line. Could do it, but there's a better way. 15 Taken from https://algs4.cs.princeton.edu/lectures/keynote/33BalancedSearchTrees.pdf * = Non examinable How to implement 2-3 trees with binary trees? Challenge. How to represent a 3 node? ER Approach 1: regular BST. ・No way to tell a 3-node from a 2-node. ・Cannot map from BST back to 2-3 tree. R E Approach 2: regular BST with "glue" nodes. ・Wastes space, wasted link. ・Code probably messy. R E Approach 3: regular BST with red "glue" links. ・ ・Arbitrary restriction: red links lean left. R Widely used in practice. E 17 Taken from https://algs4.cs.princeton.edu/lectures/keynote/33BalancedSearchTrees.pdf * = Non examinable Example: B-trees There are ways to build a binary search tree so that it stays pretty* balanced whatever order the nodes are added. One way is to build a B-tree. B-trees are: • A multi-way tree structure suitable for external (disk file) lookup. • Uses large nodes which can be a disk block • Each node has alternating pointers and data items • All nodes but the root node are always at least half full. • leaf nodes are all at the same level * = Non examinable Inserting in a B-tree • New items are always inserted in leaf nodes • When a leaf node fills up it is split in two and a new entry is made in the layer above. • When the root node fills up it is split and the tree grows in height. • e.g. https://www.youtube.com/watch?v=coRJrcIYbF4 * = Non examinable B-tree example 16 3 7 42 12 52 18 24 27 35 63 * = Non examinable B-tree after adding 30 16 3 7 27 42 52 12 18 24 30 35 63 * = Non examinable Searching in a B-tree ・Start at root. ・Find interval for search key and take corresponding link. ・Search terminates in external node. searching for E * K follow this link because E is between * and K * D H K Q U follow this link because E is between D and H * B C D E F H I J K M N O P Q R T U W X search for E in this external node Searching in a B-tree set (M = 6) 48 Taken from https://algs4.cs.princeton.edu/lectures/keynote/33BalancedSearchTrees.pdf * = Non examinable Strengths of B-trees • Provide fast expandable external lookup for strings. • Cannot become seriously unbalanced. • The large nodes mean that entries can be found with few disk accesses. • The root node, and possibly its immediate children, can easily be kept in memory Further additional reading • Cracking the coding interview: 150 programming questions and solutions (McDowell 2013) –4.1, also Approach V on p. 34 • Data structures & problem solving using Java (Weiss 2010) – Ch 19, 18, 20.6 • Introduction to algorithms (Cormen et al 2009) – Ch 12, 13, 18 • https://www.youtube.com/watch?v=cv_KDQzZpHs [Useful summary of binary search trees] • https://algs4.cs.princeton.edu/lectures/keynote/33BalancedSearchTrees.pdf • https://www.cs.odu.edu/~zeil/cs361/latest/Public/avl/index.html [AVL trees] • https://www.youtube.com/watch?v=qvZGUFHWChY [Red-Black trees] • And follow on videos: https://www.youtube.com/watch?v=5IBxA-bZZH8 etc • https://www.youtube.com/watch?v=v6eDztNiJwo [Full worked e.g. of Red-Black trees, Rob Edwards] • https://en.m.wikipedia.org/wiki/Tree_rotation Glossary • binary tree - linked data structure where each node links to 2 other nodes (left & right, could be empty) • leaf - a tree node with empty child nodes (no links to left or right nodes) • root – the first node in a binary tree • binary search tree - binary tree where all left child nodes’ value < its parent node’s value, all right child nodes value > its parent node’s value • traversal – visiting every node of a tree in turn • height – number of levels in a binary tree (or length of longest path from root to a leaf) • size – number of nodes in a tree • balanced - a tree for which height of the two subtrees of every node never differ by more than 1 • complete - a binary tree with all levels except the last level completely filled, and with all the leaves in the last level to the left side. • AVL trees / Red-black trees / 2-3 trees / B-trees – self-balancing binary search tree (structures optimised to stay as balanced as possible) • Black-height (red-black trees) - the number of black nodes on any simple path from (but not including) the root down to a leaf • Rotation – operation conducted on a BST to rebalance it Today’s lecture • Using Trees for searching data and keeping it sorted • Binary tree and Binary Search tree recap • Tree height and number of nodes • Balanced trees and complete trees • Constructing balanced trees • AVL trees • Red-Black trees • [2-3 trees *] • [B-Trees *] * = Non examinable content * We won’t cover details of 2-3 trees or B-Trees. Slides on 2-3 trees and B-Trees are kept in this powerpoint for reference only, if you are interested. Details of 2-3 trees and B-Trees are not examinable – you are expected to know that these are two other ways of constructing balanced trees, but not how they work. O-notation COMP5180 Algorithms, Correctness and Efficiency Anna Jordanous a.k.jordanous@kent.ac.uk Today’s lecture • Motivation for why O-notation is important • O-notation for classifying inputs • Objections… • Formalising • Examples • Dealing with earlier objections • Manipulating and Computing with O-notation • Alternatives to (Big) O-notation week 12 O-notation 1 O-notation: why it’s important • you have seen O-notation before, e.g. in COMP3830 • in itself, O-notation has nothing to do with algorithms • what it does: classify (and order) mathematical functions • sometimes we encounter mathematical functions of which we have limited knowledge • E.g. exact input values • that knowledge may be enough to classify the function, and the classification allows us to predict the function’s values, with some degree of accuracy • the runtime of program, when given certain characteristics of its input, is an example of such a limited-knowledge function week 12 O-notation 2 Classification Goals, General Ideas • we want to have a relatively simple (but ultimately mathematically precise) way to classify and compare functions, and permit for an element of uncertainty • tools to cope with uncertainty for classification 1. provide a way for ignoring outliers, e.g. behaviour of function for all but finitely many inputs 2. allow function values to vary a little bit, e.g. by a constant factor O notation 3 week 12 O-notation What functions do we use Onotation for? • (sub-) programs that process data • have access to that data, are likely not to ignore the data, may often need to read all of it • it takes some time to compute, either compute an output or perform some responsive behaviour (or both) • this gives a function from data to time • sub-programs: procedures, methods, code-blocks • we are not talking about programs that generally run forever, like web browsers, operating systems, read-eval-print loops • though even these will have subtasks of this ilk week 12 O-notation 4 Comparing parametric programs • we can view the performance of a parametric program* as a mathematical function: mapping program input to... ...the performance data of the program when run on that particular input *a “parametric program” is a program that takes parameters for example, instead of asking: what is the result of sorting [4,76,1,-8,987,3,2]? we ask: how long does it take to sort [4,76,1,-8,987,3,2]? week 12 O-notation 5 Classifying inputs so, our runtime function maps input data… …to the time it takes to run the program on that data • Calculating exact runtimes per input for a mathematical function is not tremendously useful when talking about the program’s behaviour, because… • we need to supply the full input (ok-ish for length 7, but…) • so, to know the runtime of bubblesort on a certain array of 100,000 elements we need to supply all that data, all those 100,000 elements week 12 O-notation 6 Classifying inputs so, our runtime function maps input data… …to the time it takes to run the program on that data • Calculating exact runtimes per input for a mathematical function is not tremendously useful… • instead, we generalise – we will throw inputs that exhibit sufficiently similar behaviour into one category, and then look at the runtime of that category • e.g., instead of supplying the 100,000 elements we just supply that number: 100,000 • assuming/hoping/claiming that bubblesort shows similar behaviour for all arrays of that length. week 12 O-notation 7 Objections So we should treat runtime behaviour as a function...? Some objections one can make to these previous points made: 1. run time depends on your hardware, i.e. the computer you run the program on 2. running the same program on the same input does not give you the exact same runtime every time 3. programs can be non-deterministic, e.g. when using concurrency or random numbers • For now, we ignore these objections… • Later we’ll see whether or to what extent our computational model helps with these issues week 12 O-notation 8 Today’s lecture • Motivation for why O-notation is important • O-notation for classifying inputs • Objections… • Formalising • Examples • Dealing with earlier objections • Manipulating and Computing with O-notation • Alternatives to (Big) O-notation week 12 O-notation 9 Format of O-notation • generally: !(# $! , … , $" ) where the $# are "measurements" of our inputs and # is the growth function; • most of the time: • just one measurement (" = 1) • often call it "n", most commonly: size of input • the growth function & • describes how the program runtime changes for different n values • is put toge.g. addition multiplication, exponentiation, logarithm, factorial, constants • ether using various arithmetic functions, week 12 O-notation 10 Formally !(#($)) is a set of growth functions: NB remember ( )*+ ,*+-+.*, represents multiplication & ∃'. ∃$! . ' > 0 ∧ $! > 0 ∧ ∀$ . $ ≥ $! → 0 ≤ &($) ≤ ' ( #($)} in words: !(#($)) is the set of all functions &, such that for all sufficiently large inputs n, &($) is at most ' ( #($), for some fixed constant ' week 12 O-notation 11 Let’s unwind this: • O-notation classifies functions, so each class O(blah) is a set of functions that match that “blah” description • E.g. if O(blah) is O(log(n)) & ∃'. ∃$! . ' • blah itself is a function, and always belongs > 0 ∧ $! > 0 ∧ ∀$ . $ ≥ $! to that class; generally: # ∈ !(#) • the variable , in the description ranges → 0 ≤ &($) over all values bigger than some fixed ≤ ' ( #($)} number ,$ - this is dealing with outliers • a function g is in the set O(f) if and only if g(n) is bounded by c*f(n), where c is a fixed constant – this is dealing with uncertainty week 12 O-notation 12 Common Examples (from fast to slow) O(1) [no growth - having an upper bound] O(log(n)) [logarithmic growth] O(n) [linear growth] O(n*log(n)) [loglinear growth] O(n2) [quadratic growth] O(n3) [cubic growth] O(2n) [exponential growth] O(3n) [also exponential growth, but genuinely worse] O(n!) [factorial growth] week 12 O-notation 13 O(f(n)): 5 ∃7. ∃,$ . 7 > 0 ∧ ,$ > 0 ∧ ∀, . , ≥ ,$ → 0 ≤ 5(,) ≤ 7 ( #(,)} Example (i) • take the set ! 1 • This is the set of functions that take ‘1’ operation • no matter how big n is • plugging in the definition for O(f(n)) gives us in this case: • 5 ∃7. ∃,$ . 7 > 0 ∧ ,$ > 0 ∧ ∀,. , ≥ ,$ → 0 ≤ 5(,) ≤ 7} • so these are all "growth functions" that have a fixed upper bound • for describing runtime performance people often read this as constant time, but it really means: bounded time • examples of O(1) time: assignments, print statements (of fixed strings), finite sequences of such statements,... week 12 O-notation 14 O(f(n)): 5 ∃7. ∃,$ . 7 > 0 ∧ ,$ > 0 ∧ ∀, . , ≥ ,$ → 0 ≤ 5(,) ≤ 7 ( #(,)} Example (ii) • take the set ! ,% • This is the set of functions that grow at a rate of (roughly) n2 • besides ,% itself the class contains quadratic growth ! % % functions that are bounded by 7, , e.g. , or 7,% or % % 6, + 8 log % , • if c = some constant (c>0) • also functions that "occasionally" show quadratic growth, e.g. 5 , = if (even(,)) then ,% else , • ...and functions that grow strictly more slowly than quadratic, e.g. 5 , = 3, • technically, * + ⊆ * +! etc. week 12 O-notation 15 O(f(n)): 5 ∃7. ∃,$ . 7 > 0 ∧ ,$ > 0 ∧ ∀, . , ≥ ,$ → 0 ≤ 5(,) ≤ 7 ( #(,)} Follow on example, positive • let’s show that (, ↦ 2,% + 4) ∈ !(,% ) • (NB we use ↦ [mapsto] to emphasise that we’re working with a function) • we need to show 2,% + 4 ≤ 7,% for sufficiently large , and some constant 7 • choosing 7 = 4 turns the inequation to 2,% + 4 ≤ 4,% which we can simplify to 2 ≤ ,% • this is not always true, just almost always – it is true if , ≥ 2, so we can choose the cutoff constant ,$ that deals with outliers to ,$ = 2 week 12 O-notation 16 O(f(n)): 5 ∃7. ∃,$ . 7 > 0 ∧ ,$ > 0 ∧ ∀, . , ≥ ,$ → 0 ≤ 5(,) ≤ 7 ( #(,)} Follow on example, negative • ,& ∉ ! ,% • otherwise, ,& ≤ 7,% for all sufficiently large ,, and a fixed 7 • in particular, we need this to hold for , = 7 + 1 • this would give us the goal 7 + 1 & ≤ 7 7 + 1 % • 7+1 7+1 % ≤7 7+1 % • 7+1 ≤7 • (Remember in our formal definition, c > 0) • we can simplify this to the equivalent 1 ≤ 0, which is just false week 12 O-notation 17 Earlier objections… • we can discharge some issues we had raised earlier w.r.t. this performance model 1. hardware differences* will normally be within a constant factor *let’s just ignore things like quantum computing for now… 2. variations between different runs are small, and within a small constant factor • some remaining issues are not as easily dismissed A. Non-determinism? What non-determinism? B. When classifying a group of inputs together (e.g. by input size) they may not be within a constant factor in behaviour (e.g. bubble sort, quicksort) week 12 O-notation 18 Order? • In a convoluted way, this also gives us a notion of order between those functions: • &≤/ ↔&∈* / • this order is reflexive and transitive • but it is not guaranteed to be asymmetric: • different functions f and g could exhibit both # ≤ 5 and 5 ≤ #, in which case… • those functions are equivalent in this notation, they grow at approximately the same rate week 12 O-notation 19 Hard-to-compare example • Imagine two functions for the same task • O(n*m2) versus O(n2*m) randomFunction1(n, m) randomFunction2(n, m) • here we have two measurements of our inputs, n and m... • and the first growth function is linear in n and quadratic in m • and for the second it's the other way around • which should we use…? (Note: if the measurement is cheap to make we could create a hybrid algorithm that first makes the measurement and based on that then decides which algorithm to deploy) week 12 O-notation 20 Today’s lecture • Motivation for why O-notation is important • O-notation for classifying inputs • Objections… • Formalising • Examples • Dealing with earlier objections • Manipulating and Computing with O-notation • Alternatives to (Big) O-notation week 12 O-notation 21 Manipulating O-notation • to describe the O-notation characteristic of a growth function f we often want the simplest growth function g, such that O(f)=O(g) • this would give us a clear (possibly unique) identification of the "blob" generated by f, its "name" so to speak • this involves: • algebraic manipulation, ordinary • eliminating constant factors • eliminating slower-growing summands week 12 O-notation 22 Eliminating constant factors, examples • O(3x2)=O(x2) NB " #$% &$%'%($& represents multiplication • justification: if g is bounded by 7 ( 3$2 then it is also bounded by (3 ( 7) ( $2 • !(log " $) = !(log # $), constant factor log " ' • note: this is why we typically just say log(,) inside Onotation • similarly, !(3$%& ) = !(3$ ), because [algebraic rule] 3$%& = 3 ( 3$ , and 3 is a constant factor • however, ! 8 $%& ≠ ! 8 $ • we can do the same algebraic manipulation but x is not week 12 O-notation 24 a constant factor Eliminating slower growing summands • if # ∈ !(5) then ! # + 5 = ! 5 • why does this work? • 5 ≤ # + 5 ≤ 75 + 5 = (7 + 1)5 • hence: !(5) ⊆ !(# + 5) ⊆ !((7 + 1)5) • now c is a constant, hence so is c+1, and we already know that we can eliminate constant factors, therefore ! 5 = !((7 + 1)5) • example: ! 5,% + 8, = !(,% ) • generally, polynomials can be reduced to the term with largest degree week 12 O-notation 25 Where do all of these operations come from? • program analysis! • sequences of statements: cost is the sum of the costs of all the statements • loops: running a statement k times is k times as expensive as running it once; nested loops -> polynomials • divide-and-conquer searches give rise to logarithmic costs: you can split a search space of size n only log ' , times into - sections of equal size and continue search within one section only • exhaustive trial-and-error searches often give exponential performance (worst case) week 12 O-notation 26 Computing with O-notation • this arises when we analyse code, e.g. if runtime for statementA is O(a) and the runtime for statementB is O(b) then the runtime for the sequence statementA statementB is: O(a) + O(b) which is O(a+b). • note that most of the time we will be able to simplify O(a+b) • multiplications occur when we look at loops week 12 O-notation 27 Cost of if-then-else We can assign a O(...) cost to an if statement if (cond) stA; else stB; ...if we know the cost of its parts: If cond can be performed in O(f(n)), stA in O(g(n)) and stB in O(h(n)) then the whole thing is O(f(n)+max(g(n),h(n))). Note: very often, checking the condition is O(1) and then we can simplify the expression. Even more often, we can also simplify O(max(g(n),h(n))). week 12 O-notation 28 Loops cost of: for(int i=0; i<n; i++) statement We run statement n times. If the cost for (a single execution of) statement is O(f) then the overall cost is n×O(f) which we can express as O(n×f). Note: the cost for the loop infrastructure (increasing and checking the i) has been ignored here. This is safe (for n>0), since all statements are at least O(1). • this is not automatic though for for-each loops, because their infrastructure is user-programmable week 12 O-notation 29 Method Calls and Recursion • cost of a method call is the cost of its method body ("plus 1", to tax the passing of parameters and results) • for recursive methods this is more subtle: • define a performance function T for the method • ...and define it recursively at the places where the method is recursive • ideally, try to find a non-recursively-defined version that satisfies the recurrence equations of T; can mix guessing with equation-solving • example: ! 2# = 2! # + &# • solution: ! # = '(# ) log # ) week 12 O-notation 30 Example, bubblesort loop for (int i=0; i<K; i++) { for (int j=1; j<M; j++) { if (a[j]<a[j-1]) { int aux=a[j]; a[j]=a[j-1]; a[j-1]=aux; } } } week 12 O-notation 31 inside-out analysis, inner part if (a[j]<a[j-1]) { int aux=a[j]; a[j]=a[j-1]; a[j-1]=aux; } This part is O(1): 3 assignments, each has O(1) cost. O(1)+O(1)+O(1)=O(3)=O(1). (3 is a constant factor) Condition is O(1) [not a method call], so cost of ifstatement is O(1+max(1,0))=O(1+1)=O(2)=O(1) week 12 O-notation 32 Middle part for (int j=1; j<M; j++) { if (a[j]<a[j-1]) { int aux=a[j]; a[j]=a[j-1]; a[j-1]=aux; } } The loop is run M-2 times, the loop has (as we have seen) O(1) cost, so the overall cost is O(M-2)=O(M). week 12 O-notation 33 Example, bubblesort loop for (int i=0; i<K; i++) { inner part } We know the cost of the inner part is O(M). We run the green loop exactly K times. Overall cost: O(K×M). For normal bubblesort, we have K=M, where K is the length of the array, giving us a runtime quadratic in the length of the array. week 12 O-notation 34 Today’s lecture • Motivation for why O-notation is important • O-notation for classifying inputs • Objections… • Formalising • Examples • Dealing with earlier objections • Manipulating and Computing with O-notation • Alternatives to (Big) O-notation week 12 O-notation 35 Variations on big-O • O(f) ["big O"] gives a class of growth functions for which 7 ( # is an upper bound • / ∃3. ∃+). 3 > 0 ∧ +) > 0 ∧ ∀+ . + ≥ +) → 0 ≤ /(+) ≤ 3 9 &(+)} • (we already saw this) • there are various other notations around, e.g. • ;(<) is the dual (3 9 & is a lower bound), or / ∈ Ω(&) ⟷ & ∈ *(/); • ?(<) has this as both upper and lower bound, i.e. Θ & = *(&) ∩ Ω(&) • while O(f) provides an upper bound, there is also: • o(f) ["little o"] for providing a strict upper bound: e.g. Big-O vs little-o: ! = *(D ! ) 2D • / ∀3. ∃+). 3 > 0 ∧ +) > 0 ∧ ∀+. + ≥ +) → 0 ≤ / + < 3 9 &(+)} 2D ! ≠ F(D !) !) 2D = F(D week 12 O-notation 36 How problems increase N log(N) 1 0.00 2 0.69 3 1.10 4 1.39 5 1.61 10 2.30 50 3.91 100 4.61 200 5.30 1000 6.91 NlogN 0.00 1.39 3.30 5.55 8.05 23.03 195.60 460.52 1059.66 6907.76 N2 2N N! 1 4 9 16 25 100 2500 10000 40000 1000000 2 4 8 16 32 1024 1.1259E+15 1.26765E+30 1.60694E+60 1.0715E+301 1 2 6 24 120 3628800 3.04141E+64 9.3326E+157 #NUM! #NUM! • Life time of the universe about 4E+10 years (40 billion years) = approx 1E+18 seconds. At 5 Peta (5+E15) FLOPS we get 5E+33 instructions per universe lifetime • With a graph of 200 nodes, an algorithm taking exactly exponential time means we need about 3E+26 universe lifetimes to solve the problem. week 12 O-notation 37 Further additional reading • Introduction to algorithms (Cormen et al 2009) – Ch 2, 3 • Data structures & problem solving using Java (Weiss 2010) – Ch 5 • Algorithms in Java (Sedgewick 2003) – Ch 2 • (Algorithms (Sedgewick and Wayne, 2011) – p. 206-207) • (Cracking the coding interview (McDowell 2013) – questions relating to finding optimally efficient solutions) week 12 O-notation 38 Glossary O-notation – a notation to classify and order performance of (sub)programs using mathematical functions to specify an upper bound for growth of a (sub)program upper bound – maximum limit non-deterministic - where the program does not necessarily run the same every time e.g. when using concurrency or random numbers parametric program - a program that takes parameters growth function – a function which describes how a (sub)program runtime changes for different input sizes (different ‘n’) input size - number of items given as input to a (sub)program sub-programs - procedures, methods, code-blocks, etc reflexive – a property/relation which relates an input to itself, i.e. x relates to x, e.g. equality: 2 = 2 transitive – a property/relation where, if x relates to y and y relates to z, then x relates to z, e.g. less than or equal to: if 2 <= 3 and 3 <= 4 then 2 <= 4 symmetric – a property/relation where, if x relates to y then y relates to x, e.g. addition: 2 + 4 links 2 and 4 in the same way as 4 + 2 asymmetric / anti-symmetric – opposite of symmetric, e.g. subtraction: 2 – 4 does not link 2 and 4 in the same way as 4 - 2 39 Today’s lecture • Motivation for why O-notation is important • O-notation for classifying inputs • Objections… • Formalising • Examples • Dealing with earlier objections • Manipulating and Computing with O-notation • Alternatives to (Big) O-notation week 12 O-notation 40 Graphs COMP5180 Algorithms, Correctness and Efficiency Anna Jordanous a.k.jordanous@kent.ac.uk (Not this type of graph… … this type of graph) Today’s lecture • Graphs – a recap • Definitions and representations • Graph traversal (BFS/DFS) • Weighted Graph Algorithms (Prim’s/Kruskal’s algorithms) • Motivations • Additional graph types, terms, concepts • Density/sparcity and alternative representations • Paths, cycles and DAGs • Connectivity • Bipartite graphs • Isomorphism • Additional graph algorithms • Shortest path Graphs – a recap • Graph is a data structure that consists of a set of nodes and a set of edges that connect the nodes together • What you’ve seen before: • Graph: Definitions • Representing Graphs • Graph Traversal • Breadth-First Search • Depth-First Search • Weighted Graph Algorithms • Prim’s Algorithm • Kruskal’s Algorithm Recap • Graph G = (V, E): • set of nodes V (alt: points, vertices) • set of edges E (alt: arcs, connections, links • All edges have a node at each end: ei,j =(vi,vj) Some possible variations of graphs • labelled/unlabelled; weighted edges • directed/undirected • loops/cycles; multi-edges Nodes have degrees • (indegree/outdegree in a directed graph) Recap - Representing Graphs • Representations for internal computer data structures and for common file formats • Also required for mathematical reasoning • The method we use will depend on • Type/characteristics of graph • Speed of access required • Memory taken up • Maintenance (frequency of insertions/deletions) • Traversal needs • Programming language restrictions • Adjacency lists vs adjacency matrices • (more later) Adjacency list List of reachable neighbours: A: C, F B: D, E C: D, F D: E E: C Edges: A C A F B D B E C D C F D E E C Adjacency matrix • Matrix indicates nodes that connect • In this case we have a binary matrix • (no multi-edges, • 1 or 0 edges from node x to node y) 1 2 3 4 1 0 1 1 0 2 1 0 0 1 3 0 0 0 1 4 1 0 0 0 1 2 3 4 Graph traversal • Start at one vertex of a graph (the start vertex) • Process the data contained at that vertex • Move along an edge to process a neighbour • To avoid entering a repetitive cycle, we need to mark each vertex as it is processed. • (If edges are represented separately, we also mark the edge as traversed) • When the traversal finishes, all the vertices we can reach* from the start vertex are processed. * NB Some of the vertices may not be processed because there is no path from the start vertex What guides our choice of which edge to explore next? What guides our choice of which edge to explore next? A1 Depth first Search (same principle as for trees) • Put unvisited vertices on a stack. Shortest path: Find path from s to t that uses fewest number of edges: • DFS (to visit a vertex s) • Mark s as visited. • For all unmarked vertices v adjacent to s: • DFS (v) (Recursively use DFS to visit all vertices v) What guides our choice of which edge to explore next? A2 Breadth first Search (same principle as for trees) • Put unvisited vertices on a queue. Shortest path: Find path from s to t that uses fewest number of edges: • BFS (from source vertex s) • Put s onto a FIFO queue. Repeat until the queue is empty: • remove the least recently added vertex v • add each of v's unvisited neighbors to the queue, and mark them as visited. • Property. BFS examines vertices in increasing distance from s. Breadth First or Depth First for graph traversal? • Breath first ensures that all the nearest possibilities have been explored • Depth first keeps going as far as it can, then goes back to look at other options • Some applications can use either (e.g. connectivity test), most applications are best with one or the other • In general, travelling around nodes of a graph is a key operation in many cases. We’ll explore this more. Listing all elements – (minimum) spanning tree • Can we draw a tree over the graph that includes all the nodes in the graph (so we can list all elements)? • Yes, if the graph is weighted and connected • (see later re connected) • Such a tree (that contains all the vertices of a graph) is called a spanning tree • Spanning trees always have N-1 edges, where N is the number of nodes in the graph Listing all elements – (minimum) spanning tree • A minimum spanning tree (MST) is a spanning tree whose weight (the sum of the weights of its edges) is no larger than the weight of any other spanning tree • You covered algorithms to find minimum spanning tree • Prim’s Algorithm • Kruskal’s Algorithm Prim’s Algorithm to find MST (minimum spanning tree) • (Although named after Prim in 1957 it is now credited to Jarník in 1930) • Start from an arbitrary node • and choose the edge with least weight to jump to the next node • O(N2) time complexity Example of Prim 2 A 3 B E 4 5 1 D 3 7 2 A C B 13 12 E 5 4 D nodes A0 B2E3D4 D1E3C13 E3C7 C7 {} 13 12 1 7 Start from an arbitrary node and choose the edge with least weight to jump to the next node C A B C D E MST {A} can ‘see’ B2 E3 D4 can ‘see’ D1 E3 C13 AB can ‘see’ E3 C7 AB,BD AB,BD,AE can ‘see’ C7 AB,BD,AE,DC can ‘see’ …. Found MST MST = AB,BD,AE,DC COST = 13 Kruskal’s Algorithm to find MST (minimum spanning tree) • Given a weighted undirected graph, sort the edges according to their weights and keep selecting edges with smallest weights that do not form a cycle, until N-1 edges are selected for the MST. • Requirements • We need to take the edges in order • (a sort needed) • Sorting is the most expensive operation, so time complexity is O(E log E) where E is the number of edges in the graph • We need to detect cycles Kruskal Example 8 C 5 A 2 3 1 E B Given a weighted undirected graph, sort the edges according to their weights and keep selecting edges with smallest weights that do not form a cycle, until N-1 edges are selected for the MST. • Order of Edges: 9 BE 1 Add AB 2 Add AE 3 Don’t Add CE 5 Add AC 8 Don’t Add BD 9 Add (and Finish) (DE 11) D 11 8 C 5 A 2 3 1 E B 9 D 11 MST = BE, AB, CE, BD COST = 17 Prim or Kruskal algorithm to find Minimal Spanning tree? • Kruskal is greedy • test the best edges first • better for sparse graphs , because the time complexity is based on the number of edges • Prim grows a solution • build up the solution gradually based on current best partial solution • better for dense graphs, because the time complexity is based on the number of nodes (more on sparse/dense graphs later) Today’s lecture • Graphs – a recap • Definitions and representations • Graph traversal (BFS/DFS) • Weighted Graph Algorithms (Prim’s/Kruskal’s algorithms) • Motivations • Additional graph types, terms, concepts • Density/sparcity and alternative representations • Paths, cycles and DAGs • Connectivity • Bipartite graphs • Isomorphism • Additional graph algorithms • Shortest path Motivations: why do we care about graphs? • so far we have seen data structures that have a particular implementation and a particular purpose • graphs can be like that too... • ...but often we use them as an abstraction tool, to answer questions about "interconnected data” How/Why are graphs useful to us in applications? Here are three examples (there are hundreds!) 1. Route finding 2. Network analysis/visualisation 3. Understanding meaningful links between things Route finding Application 2: Network analysis/visualisation • Social networks are made up of links between people • e.g. messages, friends, follow, connections, groups etc • We can represent these links as graphs, with people as the nodes/vertices and the links between them as edges Network analysis e.g. musical social networks Graph of music genres: • nodes = genres • nodes connected if a musician makes a track in one genre, then a track in another genre • (Results – EDM / Urban / ‘other’) • Also – who interacts with who? https://www.youtube.com /watch?v=BQz2IQ_uHZY From the Valuing Electronic Music research project http://valuingelectronicmusic.org Application 3: meaningful links If Things have meaningful links connecting them, we can make graphs of Things (nodes) and links (edges) Knowledge graphs/ Semantic Web/Linked Data https://lod-cloud.net/ Internet of Things http://www.informationage.com/graph-databases-makingmeaning-internet-things-123458606/ Today’s lecture • Graphs – a recap • Definitions and representations • Graph traversal (BFS/DFS) • Weighted Graph Algorithms (Prim’s/Kruskal’s algorithms) • Motivations • Additional graph types, terms, concepts • Density/sparcity and alternative representations • Paths, cycles and DAGs • Connectivity • Bipartite graphs • Isomorphism • Additional graph algorithms • Shortest path Dense vs. sparse graphs Dense – a graph with many connections between nodes Sparse – a graph with only a few connections between nodes Rough guide: In a sparse graph we have E=O(N), in a dense (simple) graph we have E=O(N2) From the Valuing Electronic Music research project http://valuingele ctronicmusic.org Still not sparse …? Dense vs. sparse graphs Dense – a graph with many connections between nodes Sparse – a graph with only a few connections between nodes Rough guide: In a sparse graph we have E=O(N), in a dense (simple) graph we have E=O(N2) When does a graph turn from sparse to dense ? • No objectively correct answer, for a single graph • !"#$%& '( %)*%+ We can study !"#$%& '( !')%+ of a graph over time • if this ratio grows proportional to the number of nodes, the graphs are dense, • if it does not go up, we have sparse graphs • if in between: grey area Dense vs. sparse graphs- remember.. Adjacency matrix Matrix operations possible allowing quick algorithms and easy parallelism (dense graphs) 1 2 3 4 1 0 1 1 0 2 0 0 0 1 3 0 0 0 1 4 1 0 0 0 Adjacency list Adjacency matrix is inefficient for sparse graphs, adjacency list is better 1 2 3 4 1: 2, 3 2: 4 3: 4 4: 1 Alternative graph representations Many different ways to represent a graph! Remember our choice of representations is based on things such as our graph characteristics (e.g. density/sparseness) and how we want to implement and analyse our graphs (e.g. speed of access, memory taken up, maintenance, traversal) Let’s see a few more options for how to represent a graph: • Object model • Object model with redundancy • Set based Object model Nodes and edges objects. Like adjacency list, but with actual edge objects storing links (instead of being implicit in the nodes list) NODES n1:A n2:B n3:C n4:D EDGES e1:n1 n2 label X e2:n2 n3 label Y e3:n3 n4 label Z e4:n4 n1 label Q A X Q B Y D Z C • Can hold arbitrary node and edge information • E.g. the labels could be representative of weights • Modification of the graph is fairly easy • But… complex and not intuitive • Finding all neighbouring nodes of a node is hard Object model with redundancy • Here, the edges hold information about which nodes they connect to • and the nodes have information about which edges go in or out NODES EDGES n2:7 in: e1,e5 out: e2 e2:n2 n3 n1:1 in: e4 out: e1 n3:2 in: e2 out: e3 n4:2 in: e3 out: e4, e5 e1:n1 n2 e3:n3 n4 e4:n4 n1 1 7 2 2 e5:n4 n2 As object model except • advantage of quick neighbour finding • disadvantage of maintaining the redundant structures Set Based 1 1 4 3 2 4 4 1 3 G =(N,E) N = {1,2,3,4} E = {(1,2),(1,3),(2,3),(3,4),(4,1)} w:E® ℕ (Only) if we need labels or weights w(1,2) = 3 then we add w(1,3) = 4 functions between w(2,3) = 1 node/edge sets and a w(3,4) = 4 domain of w(4,1) = 1 labels/weights • Good for mathematical reasoning • Can be specific about what a graph is • Does not specify how a graph is stored, only the structure of a graph Today’s lecture • Graphs – a recap • Definitions and representations • Graph traversal (BFS/DFS) • Weighted Graph Algorithms (Prim’s/Kruskal’s algorithms) • Motivations • Additional graph types, terms, concepts • Density/sparcity and alternative representations • Paths, cycles and DAGs • Connectivity • Bipartite graphs • Isomorphism • Additional graph algorithms • Shortest path Paths and cycles • Path = sequence of vertices V0, V1, … Vn, • such that each adjacent pair of vertices Vi, Vi+1, (i.e. vertices next to each other in a path) are connected by an edge • NB route or walk = path in a directed (graph) • A cycle in a graph G is a closed path: • all the edges are different and all the intermediate vertices are different. Special type of directed graph: Directed Acyclic Graphs • We know what a directed graph is • Directed graph is also known as a digraph • A directed, acyclic graph (DAG) is a graph that contains no directed cycles. • (in other words: after leaving any vertex v you can never get back to v by following edges along the arrows.) DAGs can be very useful e.g. in modelling project dependencies: Ø Task i has to be done before task j and k which have to be done before m. Complete graphs • In a complete graph, every vertex is adjacent to every other vertex. • If there are N vertices, there will be N * (N - 1) edges in a complete directed graph and N * (N - 1) / 2 edges in a complete undirected graph. Today’s lecture • Graphs – a recap • Definitions and representations • Graph traversal (BFS/DFS) • Weighted Graph Algorithms (Prim’s/Kruskal’s algorithms) • Motivations • Additional graph types, terms, concepts • Density/sparcity and alternative representations • Paths, cycles and DAGs • Connectivity • Bipartite graphs • Isomorphism • Additional graph algorithms • Shortest path Connectivity • A graph G is connected if there is a path between each pair of vertices, and is disconnected otherwise. • Note that every complete graph is necessarily connected • (as the path between each pair of vertices is just the edge between those vertices), • But… connected graphs are not necessarily complete • (for instance, every tree is a connected graph, but a tree is not a complete graph) Connectivity Every disconnected graph can be split up into a number of connected subgraphs, called components. NB We often distinguish between digraphs and undirected graphs: • Use Connected to talk about undirected graphs where there is a path between every two nodes. • Use Strongly connected to talk about directed graphs where there is a route (directed path) between every two nodes. Example: find strongly connected components of a directed graph • explanation: in a directed graph paths are directed too, following arrows in the same direction only • thus we can have a directed path from A to B without having one that goes from B to A • a strongly connected component is a (full) subgraph in which there are directed paths between any two vertices, each way • Adaptable to many problems • E.g. motif finding in biological networks (when combined with isomorphism – see later) • (identifying over-repeated patterns in a network of data) “Network motifs are defined as over-represented small connected subgraphs in networks” Kim, W., Li, M., Wang, J. et al. Biological network motif detection and evaluation. BMC Syst Biol 5, S5 (2011) Example A D B C E G F Example with added edge A D B C E G F Today’s lecture • Graphs – a recap • Definitions and representations • Graph traversal (BFS/DFS) • Weighted Graph Algorithms (Prim’s/Kruskal’s algorithms) • Motivations • Additional graph types, terms, concepts • Density/sparcity and alternative representations • Paths, cycles and DAGs • Connectivity • Bipartite graphs • Isomorphism • Additional graph algorithms • Shortest path Bipartite graph • In a bipartite graph: two different types of vertex • A bipartite graph G=(V,E) has a set V split into V1 and V2 • (e.g. top right: Jobs and People) • What makes a bipartite graph special that every edge (i,j) that exists in E has (i in V1 )and (j in V2). • i.e. you can never have an edge between two nodes of the same type • Edges must go from one type of node to the other Good for tasks such as allocating resources to people, or pairing up two different types of thing Today’s lecture • Graphs – a recap • Definitions and representations • Graph traversal (BFS/DFS) • Weighted Graph Algorithms (Prim’s/Kruskal’s algorithms) • Motivations • Additional graph types, terms, concepts • Density/sparcity and alternative representations • Paths, cycles and DAGs • Connectivity • Bipartite graphs • Isomorphism • Additional graph algorithms • Shortest path Similarity of two graphs Remember, a graph G = (V, E). This doesn’t tell us how to draw it… • So the same graph can be drawn in different ways. • (are you happy that these are the same graph?) • Also, two graphs my look similar but represent different graphs. • (can you see why these are different graphs?) Similarity of two graphs: isomorphism • AB is an edge of the 2nd graph, but not of the 1st one. • Although the graphs have essentially the same information they are not the same. • However by relabelling the second graph, we can reproduce the first graph. • We say that two graphs G and H are isomorphic to each other, if H can be obtained by relabelling the vertices of G. Similarity of two graphs: isomorphism • We say that two graphs G and H are isomorphic to each other if H can be obtained by relabelling vertices of G. Actually - we haven’t found a really good algorithm to test for isomorphism. But some basic checks help us see if two graphs are isomorphic: • Two isomorphic graphs must have • the same number of nodes and edges, • the same degree sequence. • Two graphs cannot be isomorphic if one of them contains a subgraph that the other does not. Today’s lecture • Graphs – a recap • Definitions and representations • Graph traversal (BFS/DFS) • Weighted Graph Algorithms (Prim’s/Kruskal’s algorithms) • Motivations • Additional graph types, terms, concepts • Density/sparcity and alternative representations • Paths, cycles and DAGs • Connectivity • Bipartite graphs • Isomorphism • Additional graph algorithms • Shortest path Shortest path • The path between two vertices with lowest cost is called the shortest path • For an unweighted graph, cost of a path = number of edges in the path • For a weighted graph, cost of a path = sum of the weights of each edge in the path • Knowing the shortest path between two vertices helps with applications such as route finding Example – unweighted graph 1 2 3 4 5 6 7 8 • Find the shortest path between nodes 1 and 8 • Keep a queue of nodes and a note of which nodes we have visited, and how we got to them • Breadth first search Shortest Path Outline • This algorithm passes along each edge at most once, so it has time complexity O(|E|) Repeat until target found or queue empty Add the unvisited neighbours of the head of the queue to the back of the queue Remove the head of the queue Remember - BFS (from vertex s) Put s onto a FIFO queue. Repeat until the queue is empty: remove the least recently added vertex v add each of v's unvisited neighbors to the queue, and mark them as visited. Shortest unweighted path algorithm G = (V,E) p[a] holds the predecessor of a, initially NIL current is a queue PROCEDURE SHORTEST n m p[n] = n add n to current While current is not empty remove the head of current, x for each neighbour of x, y if p[y]== NIL p[y] = x if y == m then return SUCCESS add y to current return FAIL In the SUCCESS case, the path can be found in p[], going backwards from m. Shortest unweighted path algorithm G = (V,E) p[a] holds the predecessor of a, initially NIL current is a queue PROCEDURE SHORTEST n m p[n] = n add n to current While current is not empty remove the head of current, x for each neighbour of x, y if p[y]== NIL p[y] = x if y == m then return SUCCESS add y to current return FAIL In the SUCCESS case, the path can be found in p[], going backwards from m. 1 2 3 4 5 6 7 8 1 2 4 4 5 5 3 6 7 6 7 7 8 8 Path is 1 2 5 6 8 Shortest Path • To find the shortest path between two nodes in a weighted graph • Dijkstra's algorithm • To implement, we will need the data structure priority queue (PQ) data Priority queue C • A queue where each element also has G a priority assigned to it A • The priority determines the order in which items are held in the queue priority 5 3 2 • Higher priority items can ‘jump the queue’ • For items of the same priority, normal queue ordering applies • The priority can change as nodes are added to the PQ • Dijkstra: the element with greatest priority is the one closest to the start node T 4 New item for the PQ Dijkstra outline • Finding shortest path from node x to (all) other nodes • The data structures needed • a PQ of nodes (priority is cost of path to start node) • for each node, maintain the following information: • the predecessor node - all predecessor info together give us a tree of shortest paths found so far; NIL initially • the (so far found) cost of reaching that node; initially 0 • a boolean flag whether the cheapest path has been found • Dijkstra has time complexity O(E + N log N) with a standard implementation • (PQ as heap – see next week) Algorithm p[m] holds the predecessor of m, initially NIL cheapest(m) marks if this forms part of the cheapest path (initially false for all nodes) Add the start node, x, to the PQ. cost(x) = 0 Repeat until PQ is empty • Remove the node with greatest priority from the PQ, call it n • set cheapest[n] to true • for every neighbour m of n, with m-n edge costing cm, if cheapest[m]==false: if p[m]!=NIL and cost[m]<=cost[n]+cm continue inner loop else set: p[m]=n, cost[m]=cost[n]+cm; add/update m in PQ for known target node y, stop after second bullet point if n==y with SUCCESS; report after loop FAILURE Example of Dijkstra • Find the shortest path from A to H A 1 B 3 3 1 4 4 E 2 F 1 C G 3 1 3 D 1 H Add the start node, x, to the PQ cost(x) = 0. Repeat until PQ is empty - Remove the node with greatest priority from the PQ, call it n - set cheapest[n] to true - for every neighbour m of n, with m-n edge costing cm, if cheapest[m]==false: if p[m]!=NIL and cost[m]<=cost[n]+cm continue inner loop else set: p[m]=n, cost[m]=cost[n]+cm; add/update m in PQ For known target node y, stop after second bullet point if n==y with SUCCESS; report after loop FAILURE Algorithm in action? • there are good animated versions of Dijkstra out there on the WWW • click like a monkey (Japan) • table-based animation (US) Also this demo • Computerphile (UK): https://www.youtube.com/watch?t=180&v=GazC3A 4OQTE&feature=youtu.be Today’s lecture • Graphs – a recap • Definitions and representations • Graph traversal (BFS/DFS) • Weighted Graph Algorithms (Prim’s/Kruskal’s algorithms) • Motivations • Additional graph types, terms, concepts • Density/sparcity and alternative representations • Paths, cycles and DAGs • Connectivity • Bipartite graphs • Isomorphism • Additional graph algorithms • Shortest path Further additional reading • Data structures & problem solving using Java (Weiss 2010) – Ch 14 • Introduction to algorithms (Cormen et al 2009) – Ch 22, 24, B4 • Algorithms (Sedgewick and Wayne, 2011) – Ch 4 • Algorithms in Java (Sedgewick 2003) – Ch 1.2, Ch 3.7, Ch 5, Pt 5 • Introduction to algorithms (Cormen et al 2009) – Part VI Ch22-23, Appendix B.4 • Cracking the coding interview: 150 programming questions and solutions (McDowell 2013) – Ch 4w NB For terms used in the ‘Graphs – a recap’ section, please see COMP3830 lectures Glossary of new terms Dense – a graph with many connections between nodes Sparse – a graph with only a few connections between nodes Path – a sequence of vertices/nodes in a graph that are connected Route / walk – a path in a directed graph Cycle – a closed walk (all edges and all vertices are visited at most once except for the start vertex, which is also the end vertex) Digraph – directed graph DAG / directed acyclic graph – a directed graph that contains no cycles Complete graph – where every vertex is connected (adjacent) to every other vertex Connected – a graph with a path between every two nodes Strongly connected – a directed graph with a route between every two nodes Component – a connected set of nodes within a graph (subgraph) Bipartite – a graph which connects nodes of one type to nodes of a second type Isomorphic –two graphs are isomorphic if they share the same underlying structure/degrees of nodes, such that all you have to do to obtain one graph is relabel vertices (if necessary) Glossary • complete graph - where every vertex is adjacent to every other vertex • adjacent – connected by an edge • connected graph - if there is a path between each pair of vertices (disconnected otherwise) • connected usually used for undirected graphs with a path between every two nodes. • strongly connected for directed graphs with a route (directed path) between every two nodes. • strongly connected component - a (full) subgraph in which there are directed paths between any two vertices, each way • bipartite graph – has nodes split into V1 and V2 such that every edge (i,j) that exists in E has (i in V1) and (j in V2) • isomorphism – same underlying structure: two graphs are isomorphic to each other, if one can be obtained by relabelling the nodes of the other • (degree sequence – all nodes’ indegrees and outdegrees data) • shortest path – the path between two vertices with lowest cost • Priority queue – a queue where each element also has a priority assigned to it Heaps COMP5180 Algorithms, Correctness and Efficiency Anna Jordanous a.k.jordanous@kent.ac.uk Some slides in this lecture taken from Algorithms (Sedgewick and Wayne) material https://algs4.cs.princeton.edu/home/ Today’s lecture Heap – data structure and algorithms including sort • What is a heap? (tree and array format) • Insert, remove and heapify operations • Heapsort • Priority queues using heaps • Max heap and min heap Week 14 - Heaps 2 Heap Data Structure 1 X 3 2 O T 4 8 A 5 G 9 E 10 R 6 S A 11 Week 14 - Heaps I M 7 N 12 3 Heap Data Structure • Another tree data structure that is useful is the heap • A binary heap is a special type of binary tree (organised differently to a binary search tree) that is easier to store as an array Week 14 - Heaps 4 Heap Data Structure • Binary tree such that each parent element is larger than its two children. Thus largest element at root. • Represent as an array (NB indices start at 1) 1X O 3 2 T 4G 8 A 9E 6M 5 S 10 R A 11 7N I 12 K 1 2 3 4 5 6 7 8 9 10 11 12 A[K]X T O G S WeekM14 - Heaps N A E R A I 5 Heap Data Structure • Binary tree such that each parent element is larger than its two children. Thus largest element at root. • Represent as an array (NB indices start at 1) 1X O 3 2 T 4G 8 A 9E 6M 5 S 10 R A 11 7N I 12 K 1 2 3 4 5 6 7 8 9 10 11 12 A[K]X T O G S WeekM14 - Heaps N A E R A I Heap condition/heap property For all nodes i [exc. root node]: Parent(i).value > i.value 6 1X O 3 2T Heap Data Structure 4G 8A 9E 5S 10R 6M A 11 I 12 • Easy to get from vertex to child/parent • Parent of vertex k in position k / 2 • Children of vertex k in positions 2k, 2k + 1 • Each generation has at most double the nodes of the previous one • At most log2N generations • No explicit pointers – implicit in an array representation. Week 14 - Heaps 7 7N Today’s lecture Heap – data structure and algorithms including sort • What is a heap? (tree and array format) • Insert, remove and heapify operations • Heapsort • Priority queues using heaps • Max heap and min heap Week 14 - Heaps 8 Heap Algorithms • All algorithms on heaps operate along some path from the root to the bottom of the heap. • For N items there are ≤ log2N nodes in every path through the heap. • heap algorithms all: • Change the heap so that the heap condition is violated • Travel through the heap modifying it so as to restore the heap condition • (This second step is sometimes referred to as a ‘heapify’ operation) COMPLEXITY: O(n log n) (LOGARITHMIC) The basic heap operations: insert, remove all require fewer than 2 log2 N comparisons when Week 14 - Heaps 9 performed on a heap of N elements Insert (for example, add P to the heap) • Add new node to end of list • Keep exchanging with parent if parent is smaller • To restore the heap condition the HEAPIFY algorithm 1 2 4 8 A O P T 5 G 9 E X 10 R 6 S A 11 12 I 3 7 N M O P M P 13 (Sometimes referred to as the “bottom-up method” because you work from the bottom and bring elements up to restore the heap condition) Exchange P with M then with O Week 14 - Heaps 10 Remove (for example, let’s remove X) • Remove element by… • Overwriting element to be removed with the last item in the heap.. 1 then X XI 2 4 8 A O T 3 I 99 6 M 5 S G E 10 R 11 A 12 Week 14 - Heaps 7 N I 11 Remove - continued • … Then move down heap restoring heap condition (heapify) TI 2 4 8 A TSI O 9 6 M 5 SRI G E 10 RI 11 3 7 N A Week 14 - Heaps 12 NB as you move the element down, swop with largest of the children – why do this..? T T S I S G R R A E O I I I M N A Week 14 - Heaps 13 Heapify – the most important part of heap algorithms If inserting an item into a heap, we then heapify up. Heapify-Up (Array A, position i) // trying to work out what is the largest out of A(i), A(parent). parent = Parent(i) if i > 1 and A(i) > A(parent) then swop_values(A(i), A(parent) Heapify-Up(A, parent) end if Parent(position i) = floor(i/2) Week 14 - Heaps // restore the heap condition, then look again at the heap, from where that parent value was // Parent of node i in position i / 2 14 Heapify If removing an item from a heap, we then heapify down. Heapify-Down (Array A, position i) left = Left(i), right = Right(i) // trying to if left <= heap-size(A) and A(left) > A(i) then work out what is the largest = left largest out of A(i), else largest = i A(left) and end if A(right) if right <= heap-size(A) and A(right) > A(largest) then largest = right end if // get largest in the correct (highest) position in the heap, if largest != i then then look again at the heap, from the position where that swop_values(A(i), A(largest) largest value was Heapify-Down(A, largest) end if Left(position i) = 2i; Right(position i) = 2*i + 1; // Children of node i in positions 2i, 2i + 1 Pseudocode for insert or remove an item in a heap Insert (A, key) // insert new item ‘key’ into the heap held in array A n = size(A) // find position of the final item in A A[n+1] = key // add new item after the final item in A Heapify-Up(A, n+1) // heapify up from added element Remove (A, pos) // remove item at position ‘pos’ from the heap in array A n = size(A) // find position of the final item in A A[pos] = A[n] // overwrite ‘pos’ with final item in A A[n] = null Heapify-Down(A, pos) // heapify down from pos Week 14 - Heaps 16 Demo of 4 operations on the heap, done one after another: 1. Insert element S 2. Remove the maximum element from the heap 3. Remove the (new) maximum element from the heap 4. Insert element S Week 14 - Heaps 17 Time complexity • The basic heap operations of insert, remove all require fewer than 2 log2 N comparisons when performed on a heap of N elements • (Do you understand why?) 1X 2 4G 8 A 9E O T 6M 5 S 10 R A 11 I Week 14 - Heaps 3 7N 12 18 Today’s lecture Heap – data structure and algorithms including sort • What is a heap? (tree and array format) • Insert, remove and heapify operations • Heapsort • Priority queues using heaps • Max heap and min heap Week 14 - Heaps 19 sink(4, 11) exch(1, 10) sink(1, 9) S O Sorting using heaps L A X L M S Heapsort exch(1, 9) sink(1, 8) sink(3, 11) O M A E L A E T L P sink(1, 211) 2 X O 8 9 O 4 5 56 T E 8 10 9 11 10 2 3 4 S5 O R T E 1 exch(1, 7) sink(1, 6) S 3 R 3 R S E X 7 A X 7 S T O T L O 6 7 T build max heap (in place) 6 A AP 11 E O result (heap-ordered) 11) sink(5, sink(5, 11) L S R P M LL P E R EA L starting point (arbitrary order) order) starting E point (arbitrary PM M TT 1 S E M keys in arbitrary order heap construction heap construction 1 4 A R E E O O X S8 9 10 M R A E X M T M A E M S SE P L L LRE L2 E S APR A A4 L M1 A A X A M E3 E E L A E5 M EO 6 O P O7 P P E E O S9 R T10S X11T E R8 X T S R starting point (heap-ordered) starting point (heap-ordered) X E S M T O X result (sorted) 11 exch(1, 11) 1 2exch(1, 3 4 511) 6 7 T8 9 10 T 11 exch(1, 1 exch(1, 2 5)3 4 5) 5 6 sink(1, 10) sink(1, 4) sink(1, 10) P sorting sink(1, 4) XHeapsort: A M P constructing L E X (left) T S and L R A down M O (right) E E a heap A E E L M O P E S R P 14 - Heaps E S R Week L X P O X T exch(1,exch(1, 6) 6) sink(1,sink(1, 5) 5) X T E L R P O X T sorted result sortdownsortdown (in place) XO E L X tree. T S S M R binary ・EView input array as a complete Heap construction: build a max-heap with all N keys. ・ exch(1, 8) exch(1, 2) sink(2, 11) P S repeatedly remove sink(1, 7) the maximum key.sink(1, 1) ・Sortdown: M E E P M X T S P O exch(1, 3) sink(1, 2) R L A in-place sort. O R for Basic plan E L R P X T A E X T E E A R O E E P M P R T exch(1, 4) sink(1, 3) S O O L LR A R A A A M 7 L 8 P R MO 9 L 10 11 S T X E PO E 20 P Sorting using heaps First build the heap by inserting items one at a time, then one-byone, remove the maximum element in the heap and heapify Watch this twice – the second time, watch what happens in the array representation This is an in-place sort (because it happens inside the array) NB Alternative demos: https://www.youtube.com/watch?v=D_B3HN4gcUA https://youtu.be/mAO8LpQ6uGQ or for a faster view with less talking…: https://www.youtube.com/watch?v=MtQL_ll5Kh Q Week 14 - Heaps 21 Heapsort pseudocode Heapsort(Array A) BuildHeap(A) for i = length(A) to 2 exchange(A[1], A[i]) heap-size(A) = heap-size(A) -1 Heapify-Down(A, 1) end for // as A[i] now ignored BuildHeap(Array A) heap-size(A) <- length(A) for i <- floor( length/2 ) to 1 Heapify-Down(A, i) end for Week 14 - Heaps 22 Heapsort: Java implementation Sorting using heaps – in Java public class Heap { public static void sort(Comparable[] a) { int N = a.length; for (int k = N/2; k >= 1; k--) sink(a, k, N); while (N > 1) { exch(a, 1, N); sink(a, 1, --N); } } but make static (and pass arguments) private static void sink(Comparable[] a, int k, int N) { /* as before */ } private static boolean less(Comparable[] a, int i, int j) { /* as before */ } private static void exch(Object[] a, int i, int j) { /* as before */ } } but convert from 1-based indexing Week 14 - Heapsto 0-base indexing 23 42 Comparing heap sort to other sorting algorithms Heapsort: mathematical analysis Proposition. Heap construction uses ≤ 2 N compares and ≤ N exchanges. Proposition. Heapsort uses ≤ 2 N lg N compares and exchanges. algorithm can be improved to ~ 1 N lg N Significance. In-place sorting algorithm with N log N worst-case. ・Mergesort: no, linear extra space. ・Quicksort: no, quadratic time in worst case. ・Heapsort: yes! in-place merge possible, not practical N log N worst-case quicksort possible, not practical Bottom line. Heapsort is optimal for both time and space, but: ・Inner loop longer than quicksort’s. ・Makes poor use of cache. ・Not stable. advanced tricks for improving Week 14 - Heaps 24 45 Today’s lecture Heap – data structure and algorithms including sort • What is a heap? (tree and array format) • Insert, remove and heapify operations • Heapsort • Priority queues using heaps • Max heap and min heap Week 14 - Heaps 25 Priority queue • A queue where each element also has a priority assigned to it • The priority determines the order in which items are held in the queue data C G A priority 5 3 2 T 4 • Higher priority items can ‘jump the queue’ • For items of the same priority, normal queue ordering applies • The priority can change as nodes are added to the PQ Week 14 - Heaps New item for the PQ 26 Priority Queues using heaps • A priority queue is a queue data structure with additional information on each node’s priority, such that the priority of a node decides what position it takes in the queue (and FIFO for nodes of equal priority) • Heaps are a useful way of representing priority queues • Higher the priority, the higher it goes up the heap • Ordering based on priority Week 14 - Heaps 27 e.g. Dijkstra algorithm node p cost cheapest? • Finding shortest path from node x to (all) other nodes in a weighted graph • The data structures needed • a PQ of nodes (priority is cost of path to start node) • Dijkstra: the element with greatest priority is the one closest to the start node • for each node, maintain the following extra info: • p - the predecessor node - all predecessor info together give us a tree of shortest paths found so far; NIL initially • cost - the (so far found) cost of reaching that node; initially 0 • cheapest - a boolean flag whether the cheapest path has been found; false initially Week 14 - Heaps 28 Example of Dijkstra: Find the shortest path from A to H PRIORITY QUEUE node p cost cheapest? This Computerphile video is a nice demo of how Dijkstra alg uses a PQ https://www.youtube.com/watch?t=180&v=GazC3A4OQTE&feature=yo utu.be A 1 B 3 3 1 4 4 E 2 F 1 C G 3 D 1 1 3 H Add the start node, x, to the PQ cost(x) = 0 Repeat until PQ is empty - Remove the node with greatest priority from the PQ, call it n - set cheapest[n] to true - for every neighbour m of n, with m-n edge costing cm, if cheapest[m]==false: if p[m]!=NIL and cost[m]<=cost[n]+cm continue inner loop else set: p[m]=n, cost[m]=cost[n]+cm; add/update m in PQ For known target node y, stop after second bullet point if n==y with SUCCESS; report after loop FAILURE 29 Example of Dijkstra: Find the shortest path from A to H HEAP • When we add a node to the PQ, we order it based on the priority • • Highest priority for Dijkstra is the node closest to start node (lowest cost) Higher the priority, the higher it goes up the heap • When we change a node’s priority, we may have to heapify A 1 B 3 3 1 4 4 E 2 F 1 C G 3 D 1 1 3 H Add the start node, x, to the PQ cost(x) = 0 Repeat until PQ is empty - Remove the node with greatest priority from the PQ, call it n - set cheapest[n] to true - for every neighbour m of n, with m-n edge costing cm, if cheapest[m]==false: if p[m]!=NIL and cost[m]<=cost[n]+cm continue inner loop else set: p[m]=n, cost[m]=cost[n]+cm; add/update m in PQ For known target node y, stop after second bullet point if n==y with SUCCESS; report after loop FAILURE 30 Example of Dijkstra: Find the shortest path from A to H HEAP A 1 B 3 3 1 4 4 E 2 F 1 C G 3 D 1 1 3 H Add the start node, x, to the PQ cost(x) = 0 Repeat until PQ is empty - Remove the node with greatest priority from the PQ, call it n - set cheapest[n] to true - for every neighbour m of n, with m-n edge costing cm, if cheapest[m]==false: if p[m]!=NIL and cost[m]<=cost[n]+cm continue inner loop else set: p[m]=n, cost[m]=cost[n]+cm; add/update m in PQ For known target node y, stop after second bullet point if n==y with SUCCESS; report after loop FAILURE 31 Today’s lecture Heap – data structure and algorithms including sort • What is a heap? (tree and array format) • Insert, remove and heapify operations • Heapsort • Priority queues using heaps • Max heap and min heap Week 14 - Heaps 32 From Dijkstra example of priority queues • When we add a node to the PQ, we order it based on the priority HEAP priority for Dijkstra is the node closest to start node (lowest cost) • Highest • Higher the priority, the higher it goes up the heap • When we change a node’s priority, we may have to heapify Here, we want the nodes with smallest costs to be at the top of the heap This is an example of a min heap… Week 14 - Heaps 33 MAX HEAP and MIN HEAP 1X • We had been looking at max heaps 2 • Heap condition: a node’s value >= its children nodes • The root node has the largest (max) value • We could also consider min heaps, the reverse of max heaps which are 4G 8 A 10 9E Week 14 - Heaps R A 11 I 1 2 • DEMO: https://youtu.be/hfA6q1pf4sk • Min heaps are exactly the same as max heaps except you do things in reverse 6M 5 S • Heap condition: a node’s value <= its children nodes • The root node has the smallest (min) value • (e.g. swap a node with its parent if the node’s value >= its parent’s value O T 4 8 X 9 12 A I 6M 5 E T 7N A G 10 3 R S 11 O 3 7N 12 34 Today’s lecture Heap – data structure and algorithms including sort • What is a heap? (tree and array format) • Insert, remove and heapify operations • Heapsort • Priority queues using heaps • Max heap and min heap Week 14 - Heaps 35 Further additional reading • Algorithms in Java (Sedgewick 2003) – Ch 9 • Data structures & problem solving using Java (Weiss 2010) – Ch 21 (23) • Introduction to algorithms (Cormen et al 2009) – Ch 6 • Algorithms (Sedgewick and Wayne 2011) – Ch 2.4 • Cracking the coding interview: 150 programming questions and solutions (McDowell 2013) – p. 36, Ch 20 • https://www.youtube.com/watch?v=c1TpLRyQJ4w&list=PLTxllHdfUq4fMXqS6gCDWuWhiaRDVGsgu [A good set of videos explaining heaps] Week 14 - Heaps 36 Glossary • (binary) (max) heap - binary tree such that each parent element is larger than its two children • Heaps are usually max heaps • min heaps - the reverse of max heaps, such that each parent element is smaller than its two children • Heap condition/heap property - For all nodes i [exc. root node]: Parent(i).value > i.value • heapify – stepping through the heap modifying it level by level so as to restore the heap condition • Heapsort/heap sort – a sorting algorithm based on building a heap and then outputting the heap’s nodes in order • Sort down – the step of repeatedly outputting the root in heapsort • Priority Queue - A queue data structure where each element also has a priority assigned to it, and higher priority items can ‘jump the queue’