WUCT103 Algorithms & Problem Solving session 12 Revision Wollongong College Australia DIPLOMA IN INFORMATION TECHNOLOGY Sherine Antoun - 2007 Types of Problems Well structured The details, goals and possible actions are all clearly stated 3x = 2 , solve for x Poorly-structured There is uncertainty in details &/or goals &/or possible actions Cook dinner Build a successful career Problem solving method 5 steps 1. 2. 3. 4. 5. Analyse and define the problem Generate alternate solutions Evaluate the alternate solutions and choose one Implement the chosen solution Evaluate the outcome and learn from the experience Analyse & define the problem Understand & analyse Organise & represent the information Look at the problem from different perspectives if necessary Define (formulate) the problem Analyse & define the problem Understand & analyse Facts, constraints, conditions, assumptions Knowledge, experience, skills Gather/acquire new knowledge Current state & goals Permissible actions Definition of an algorithm A step-by-step method for solving a problem or doing a task Input List ALGORITHM Output List Improving your algorithm DEFINE ACTIONS Must explain what is done in each step REFINE Use consistent terms and language Ensure the algorithm knows where to ‘start’ GENERALISE Can you use it to handle a wider variety of similar problems? Three constructs It has been proven that only 3 “constructs” are necessary to make an algorithm work for computer science SEQUENCE An ordered ‘list’ (sequence) of instructions DECISION (selection) May need to test if some condition is true or not true REPETITION May need to repeat some sequence of instructions Representing an algorithm Visually Flowchart PseudoCode Plain ‘english’ The three constructs shown visually sequence decision Action 1 false Action 2 Action n test true repetition false while condition true Another sequence of actions A sequence of actions A sequence of actions The three constructs shown in pseudo code sequence action1 action 2 . . . action n decision repetition IF (condition) WHILE (condition) THEN action action ... ELSE action action … END IF action action … END WHILE A better definition of an algorithm An ordered set of unambiguous steps that produces a result and terminates in a finite time Ordered set Unambiguous steps Produce a result Terminate (halt) in a finite time PROBLEM SOLVING STRATEGIES Four main basic strategies that are widely employed Brute Force Analogy Trial & error Heuristics Divide & Conquer Greedy Finite State Machine (wikipedia) an abstract machine that has only a finite, constant amount of memory. The internal states of the machine carry no further structure. FSMs are very widely used in modelling of application behaviour, design of hardware digital systems, software engineering, study of computation and languages Finite State Machines also known as Finite State Automation (ai-depot.com) At their simplest, model the behaviours of a system, with a limited number of defined conditions or modes, where mode transitions change with circumstance. Finite state machines consist of 4 main elements: 1. 2. 3. 4. states which define behaviour and may produce actions state transitions which are movement from one state to another rules or conditions which must be met to allow a state transition input events which are either externally or internally generated, which may possibly trigger rules and lead to state transitions A finite state machine must have an initial state which provides a starting point, and a current state which remembers the product of the last state transition. Received input events act as triggers, which cause an evaluation of some kind of the rules that govern the transitions from the current state to other states. The best way to visualize a FSM is to think of it as a flow chart of states Finite State Machine (wikipedia) Can be represented using a state diagram. There are finitely many states, and each state has transitions to states. There is an input string that determines which transition is followed (some transitions may be from a state to itself). Finite State Machines There are several types of finite state machines including Acceptors (a.k.a. Recognizers) They either accept/recognize their input or do not. Transducers They generate output from given input. Mealy machines have actions (outputs) associated with transitions Moore machines Have actions (outputs) associated with states Variables However, in computer science a variable is a “place in memory” where we can store something Imagine we have a box called “my box of stones” … we can add stones to the box, we can take stones out of the box and we can look inside the box and check how many stones are there … we can say that “my box of stones” has a certain value at any given time! Sets Sets are used to describe groups of variables In mathematics, a set is simply an unordered collection of elements (things) In computer science we don’t have any data structures that are directly equivalent to sets – but we do have structures that are similar Arrays But if all five student marks are collected together under one name, how do we refer to the mark of any one particular student in the array or do calculations with it? We need to identify each of the ‘cells’ in the array – usually with numbers These ‘cells’ are correctly called INDEXES or INDECES So the value (mark) for the 3rd student is students[3] If that student just scored another 6 marks from a quiz, then that mark is now: students[3] = students[3] + 6 students [1] [2] [3] 51 72 12 [4] [5] 95 2 Linked Lists Each ‘box’ (node) has some extra information that tells you where to find the next one … so they are “joined” in a linear chain Head pointer data link data link data link Linked Lists Head pointer link data link data data link Can look a little like one-dimensional arrays … students [1] [2] [3] 51 72 12 [4] [5] 95 2 But there are significant differences … Linked Lists – different to Arrays An Array uses just one single chunk of memory A Linked List is a number of chunks ‘chained’ together Arrays allow direct access to the content of the cells Eg: you can go directly to the cell: “students[4]” In a Linked List you have to start at the first node and move through the list one-by-one until you find what you are looking for To change the size of an Array, you have to construct a new array, copy the old data into it (‘recast’), then delete the old array To change the size of a Linked List, you just add some more nodes or remove some nodes Linked Lists Two common types of specialized (or ‘restricted’) linked lists are: LIFO (last in – first out) also called a stack FIFO (first in – first out) also called a queue Singly linked lists An ordered collection of data in which each element contains the location of the next element Singly Linked lists Element or node Head next next next Head pList=null Head Pointer Head next Linked lists next Data can be • A Number •Text •Array •Record Record Data key field1 field2 field3 … fieldn End Data Singly linked lists Basic operations Create a list, insert node, delete node & retrieve node Example: a singly linked list of numbers Create an empty list Head Head = null Insert Node – general case Step 1: a list .. 39 Head Step 2: Create new node 72 New Step 3: set New.next = Head set Head = New 72 Head 39 Insert into a null (empty) list Step 1: an empty list .. Head Step 2: Create new node 72 New Step 3: set New.next = Head set Head = New 72 Head Insert at beginning Head 75 39 New New.next = Head Head = New Head 39 75 Insert in middle Head 39 75 Pre 52 New New.next = Pre.next Pre.next = New Head 75 52 39 Delete Node Head 75 52 39 Delete first node Head 75 52 39 Kill Kill=Head Head = Kill.next recycle(Kill) Head 75 52 39 General delete case Head 75 52 Pre Kill 39 Kill=Pre.next Pre.next = Kill.next recycle(Kill) Head 75 52 39 Arrays An array is a fixed-size, sequenced collection of elements of the same data type element 0 element 1 element 2 • The length or size of an array is the number of elements in the array • Each element in the array is identified by an integer called index, which indicates the position of the element in the array. element 6 • For zero-based array, index starts from 0 to the length minus 1 • The upper bound of an array is the index of the last element numbers Q: how many elements in this array? There are 7, from “0” to “n” = (n+1) Two Dimensional Arrays Size of A: 5x4 Access an element: A[row_index][col_index] A[2][1] A why do you think images are usually stored in 2D arrays ? Records A collection of related elements, possibly of different types, having a single name Each element in a record is called a field. A Field has type and is the smallest accessible element emp_no emp_name Field: emp_no pay_rate hours_worked employee Record name The elements in a record can be of the same or different types. But all elements in the record must be related. Records Use period . Accessing records Individual field: record_variable_name.fieldname Whole record: record_variable_name student1 is a record of the type: student student1 and student2 are records of the same type: student student1.id = 102345 Example: set student1 to student2 student1.name = “john” Safe way: copy it field by field student1.gradepoint = 85 student1.id = student2.id student1.name = student2.name student1.gradePoint = student2.gradePoint Graphs The nodes of a graph are called VERTICES The links are called ARCS One type of graph is a DIRECTED GRAPH or “DIGRAPH”, where each ARC has a specific direction Graphs – some terms: If there is no direction in the ARCS, then the graph is an UNDIRECTED GRAPH. And the arcs are called EDGES. A PATH is a sequence of vertices where each vertex is adjacent (next to) the next one A bit like having a Linked List as part of the Graph A CYCLE is a path of at least 3 vertices that starts and ends with the same vertex This path of 3 vertices forms a CYCLE Graphs – some more terms: Two Vertices are CONNECTED if there is a path between them A Graph is CONNECTED if there is some pathway between any two vertices in the graph (in at least one direction) That is: you can “get to anywhere”, somehow A Graph is DISJOINT if it is not CONNECTED Eg: if there is some node ‘floating’ off by itself The DEGREE of a vertex is the number of EDGES (or ARCS) that connect to it Graphs – even more terms A DIGRAPGH (one with arrows) is STRONGLY CONNECTED if there is a path from each vertex to every other vertex A DIGRAPH (one with arrows) is WEAKLY CONNECTED otherwise Imagine a city street that is BOTH a “one-way” AND a “dead-end” Then, you’d get trapped at the end wouldn’t you! UNDIRECTED GRAPHS, if they are connected (ie: not ‘disjoint’) are always STRONGLY CONNECTED, because there is no directionality, so you can eventually get to any node from any other node … so there is no real need to talk about “weak” or “strong” with UNDIRECTED GRAPHS, only whether they are connected or not. Graphs – terms continued The OUTDEGREE of a vertex in a digraph is the number of arcs that leave it The INDEGREE of a vertex in a digraph is the number of arcs that enter it This vertex has OUTDEGREE = 1 and INDEGREE = 2 Graphs - weighting A WEIGHTED GRAPH or network is a graph that has numerical weights attached to each edge (or arc) Imagine a roadmap, where the lines joining each town are marked with their distance, or perhaps with the travel-time Imagine a circuit diagram where each connection is shown with its resistance Imagine a map of airline flights between cities, with the ticket price of each flight marked on them A NEURAL NETWORK is a very special type of weighted graph. A programme can ADJUST the weightings on the edges and make certain edges “easier”. In this way, a neural network can “learn” to solve certain types of problems by itself! Every time it gets a successful result, it reinforces that pathway by adjusting the weights to make it more preferred. This is how we believe the human brain learns! Graphs Many problems that are modelled by weighted graphs (networks) require that the shortest path (‘cheapest’ or ‘quickest’) be found between 2 particular vertices The SHORTEST PATH between two vertices is defined as the path that connects those vertices with the lowest possible total weight .. ie: no other path between them has less weight. Trees A TREE is an ACYCLIC GRAPH, also connected & directed That simply means, a graph with no CYCLES in it, also where there are no ‘lonely’ nodes, and the links are directed ‘downwards’ Unlike real natural trees, which have their roots at the bottom and their leaves at the top, – computer science trees are usually drawn upside down! Trees Trees are extremely useful data structures, so it is worth spending quite a bit of time on them Trees are very good at setting out classification systems (taxonomies) … such as: Your family members over a few generations The library catalogue system Organisation of an army Possible moves in games Tree structures are used extensively in AI (artificial intelligence) Trees Trees are probably the dominant form of data structure in computer science, and even the file-system of your computer is set out in a tree structure Trees can even be used for more complex ideas, such as representing equations in a structure called an EXPRESSION TREE + a This is an example expression tree: d * Its meaning may change depending on the method of tree traversal + b c + Trees EXPRESSION TREE Infix … a*(b+c)+d Prefix … +*a+bcd Postfix … abc+*d+ a d * + b c Trees Trees are made up of NODES, joined by edges which we call BRANCHES We talk about the LEVEL of the nodes, starting from level 0 at the top and working down Level 0 Level 1 Level 2 Level 3 Trees - terms A node that is connected by a single branch to a node at a lower number level (ie: the next row up) is said to be the CHILD of that higher node A node that is connected by a single branch to a node at a higher number level (ie: the next row down) is said to be the PARENT of that lower node Nodes that have the same parent, are said to be SIBLINGS (like “brothers & sisters”) Trees – more terms A node that is connected by a number of branches to a node at a lower number level (a few levels up) is said to be its DESCENDANT A node that is connected by a number of branches to a node at a higher number level (a few levels down) is said to be that nodes ANCESTOR DESCENDANT = child of a child of a … ANCESTOR = parent of a parent of a … This is like “grandfather” or “great grandfather” etc and “grandchild” or “great grandchild” etc Trees – even more terms A ROOTED tree has a special node, called the ROOT NODE, which is the ANCESTOR of all the other nodes in the tree Most of the trees we see in Computer Science are ROOTED TREES Trees – terms continued Nodes in a tree may be referred to as INTERNAL NODES, meaning that they have at least one child Nodes with no children are called LEAF NODES The HEIGHT of a node is the number of edges in the longest path from the node to a LEAF The DEPTH of a node is the number of edges in the path from the ROOT to that node Root node (depth = 0) Leaf node: HEIGHT = 0 DEPTH = 2 Internal node: HEIGHT = 2 DEPTH = 1 Leaf node (height = 0) Minimum Spanning Tree 1 1 4 6 2 2 4 5 3 G 1 6 1 5 7 2 3 3 6 4 3 7 2 4 8 4 4 3 5 4 G 6 3 7 Binary Trees BINARY TREES are trees where each node can have AT MOST, 2 children These are called the LEFT CHILD & the RIGHT CHILD Binary Trees – Child-Sibling Trees Whilst a binary tree can only have up to 2 children per node, a general tree can have any number of children from each node We can convert general trees into binary trees by representing them as child-sibling trees Binary Trees – Child-Sibling Trees A B E A C This graph is NOT binary, node A has 3 children! D B F G E H A E H D C Re-arrange the child sibling into an equivalent tree of binary form D F H B C F G G Join PARENT to LEFT CHILD, then to NEXT SIBLING .. Then each node has AT MOST 2 edges going out Suppose “0” is not a valid item for insertion in the tree R Complete the tree by adding this ‘invalid’ value to the empty nodes C A 0 Z K 0 R C Z D A K T N T 0 S 0 0 W 0 D N S 0 W 0 0 0 Constructing The Tree We need a rule to determine where to place new nodes in the tree This will depend on your application The tree on the previous slide is actually built in alphabetical order The node on the left of each subtree appears before its parent in the alphabet The node on the right of each subtree appears after its parent in the alphabet We can use a binary tree to SORT data R C K N Z T S D A W R C A Z K D T N S W Binary Tree Sort Algorithm: binarytreesort Purpose: sort items that can be ordered Input: items to be sorted, root of tree Output: binary tree with items in order Pre: items must be able to be sorted Post: inorder traversal of the tree produces items in order Return: BEGIN binarytreesort IF the tree is empty THEN set the root to item set inserted to true ENDIF WHILE not inserted do IF item < root IF root has left child THEN set root to left child ELSE set left child to item set inserted to true ENDIF ELSE (item >= root) IF root has right child THEN set right child to item set inserted to true ENDIF Depth-first traversal of a binary tree PREORDER traversal Visit the root of the tree Preorder traverse the left subtree Preorder traverse the right subtree 1 2 3 INORDER traversal Inorder traverse the left subtree Visit the root of the tree Inorder traverse the right subtree 2 1 3 POSTORDER traversal Postorder traverse the left subtree Postorder traverse the right subtree Visit the root of the tree 3 1 2 Preorder traversal of a binary tree A B C A B C D E F E D F Inorder traversal of a binary tree A B C C B D A E F E D F Postorder traversal of a binary tree A B C C D B F E A E D F IntuitiveSort Space: need an extra array of same size as input one, to store the sorted data Operations: For N numbers, need N2 comparisons x 3,9,1,32,11,50,7 x x 1,3,7,x,x,x,x 1,3,7,9,32,11,50 Sorted part 1,3,7,9,32,11,50 Unsorted part Unsorted part 1,3,7,9,11,32,50 Sorted part Selection sort 0 N-2 N-1 FindSmallest Find the smallest Swap smallest Swap the smallest with the first element in the unsorted part 3,9,1,32,11,50,7 Find the smallest number & swap it with the first element 1,9,3,32,11,50,7 Find the smallest number among the numbers from the second to the last & swap it with the second element (or the first element in the unsorted part) 1,3,9,32,11,50,7 Find the smallest number among the numbers from the third to the last swap it with the third element (or the first element in the unsorted part) 1,3,7,32,11,50,9 Selection Sort Space: no extra array is needed Number of operations N-1 + (N-2) + (N-3) + ……+1 = (N-1)*(N-1+1)/2 Worst case: if input is 50,32,11,9,7,3,1 21 comparisons and 21 swaps Best case: if input is 1,3,7,9,11,32,50 21 comparisons, but no swaps = N(N-1)/2 Take the next number of the unsorted part (1), And “hold it” … Insertion sort1 3,9,1,32,11,50,7 Then find the correct place to put in, into the sorted part 3,9, ,32,11,50,7 Compare 1 to 9, 9 is bigger than 1, so move 9 “up” 3,9, 9 9,32,11,50,7 Compare 1 to 3, 3 is bigger than 1, so move 3 “up” 3, 3 3,9,32,11,50,7 Insert the “held over” 1 into the space you just made 1 ,3,9,32,11,50,7 NB: this is not really swapping! .. We are actually “holding” the new smallest Number, and “shuffling” the others “up” until you find the right place to insert it 3,9,1,32,11,50,7 Transferred the first element into the sorted part 3,9,1,32,11,50,7 Transferred the second element into the sorted part and place it in order 3,9,1,32,11,50,7 Transferred the third element into the sorted part and place it properly 1,3,9,32,11,50,7 Insertion Sort Space – not any extra array is needed Operations Worst case N(N-1)/2 comparisons & “moves” Best case N comparisons, no swaps Bubble sort 3 9 1 32 11 50 7 3 9 1 32 11 7 50 3 9 1 32 7 11 50 3 9 1 7 32 11 50 3 9 1 7 32 11 50 3 1 9 7 32 11 50 1 3 9 7 32 11 50 1 3 9 7 32 11 50 Bubble Sort Space: not any extra array is needed Operations – comparisons & swaps Worst case: N(N-1)/2 comparisons & swaps Best case: N comparisons, 0 swaps Sequential Search Sequential Search Operations Worst case: N comparisons Best case: 1 comparison How Can We Search Faster? Sort the array first! Target given 11 3,9,1,32,11,50,7 Find its location Sorting 1,3,7,9,11,32,50 We can now stop searching if we find a match, OR if we see a number BIGGER than the one we’re searching for Binary Search – even better ! Target given 11 3,9,1,32,11,50,7 Find its location 1,3,7,9,11,32,50 Sorting Shell’s Sort (ShellSort) We chop the array into smaller pieces and separate-out some elements and sort them Example: What if we just looked at every 3rd element ShellSort Shellsort Divide the list into h segments using increment, h, and apply Insertion Sort to each segment. The list will be sorted in an efficient way by applying a sequence of diminishing increments, hn, hn-1, …, h1, 1. ShellSort Consider an array with 16 elements Lets ‘chop’ it into 2 chunks .. The “front 8” & the “back 8” Let’s look at the first elements of EACH chunk [that gives us a “group of two” (sub-array) to consider, one from each chunk] Group of two elements to consider: [1] = 4 & [9] = 3 Is this group-of-elements sorted? .. NO! we need to sort this group! (because 3 is smaller than 4) 4 7 10 0 3 9 3 1 3 18 [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] 2 23 8 2 0 6 ShellSort Now you may feel that this was MORE work than doing a simple Insertion Sort on the whole array from the start! Intuitively the ShellSort algorithm feels like it shouldn’t work any better, yet it does! In fact, it cannot do any worse than Insertion Sort (or Bubble Sort) and almost always actually does considerably better! NB: ShellSort is heavily studied by mathematicians who has STILL not quite found all its secrets! So what did we actually do here? With 16 elements: We examined every 8th element (cut in half) Then every 4th element (cut into quarters) Then every 2nd element (cut into eighths) Then finally every 1st element (cut into sixteenths) Each step made us do bigger calculations, on bigger sub-arrays ~ BUT Each step got easier because the sub-arrays were partially sorted from the previous step! MergeSort (John von Newmann) A classic example of “divide & conquer” recursive Split the list into left and right halves Merge the sorted left half and right half into a sorted list Mergesort Performance Needs extra memory for merge On Average, Mergesort (~ n Log n) is even better than Shellsort QuickSort – basic concept A little like MergeSort, but this time we do the hard work deciding how to split the array, THEN its easy to “merge” it In a way, this is “conquer & divide” The relationship between QuickSort & MergeSort is a little bit like the relationship between InsertionSort & SelectionSort QuickSort – basic concept Take the first element, call it the PIVOT Then “slide along right” from the 2nd element looking for anything larger than PIVOT Then “slide along left” from the end, looking for anything smaller than PIVOT If you find ‘stopping points’ then SWAP then, and keep sliding again … Then when the “lo point” and the “hi point” meet, you move the PIVOT to the middle point. You get an array with “lower” points on the left & “higher” points on the right Then you do the same thing to the left side, and to the right side and recursively continue doing this till you are just looking at single elements … THEN you simply concatenate (join) all these single elements together and you have the finished sorted array pivot Basic concept Left Right x,x,x,x, ….., x,x,x,x,x, ……x,y,x. , x,x,x,x,x If there is a special element, y, such that y is greater than or equal to all elements on its left side AND y is smaller than or equal to all elements on its right side? Use this special element, y, to split the array into Left and Right, both excluding y, sort Left Merge is simple sort Right here ! merge sorted left and right (concatenation only) Quicksort Two questions Base cases If the number of elements is 0 or 1 How to partition the list using a pivot Partitioning algorithm Select any element, y, in the list as a pivot Move it to its correct position, say Iy in the list Partition the list at Iy Iy=1 4,2,6,9,15,76,36 x,4,x,x,x,x,x Left Right Selecting an element to be a pivot there are some PROBLEM cases! What if the array is already sorted? You STILL have to do all that horrible work! Sometimes the array is already partially sorted So, what if you choose the biggest or smallest element as PIVOT? Selecting an element to be a pivot there are some PROBLEM cases! Lowest or highest element (Never use these!) 4,2,6,9,15,76,36 4,2,6,9,15,76,36 Middle element 4,2,6,9,15,76,36 Median of first, middle and last elements 4,2,6,9,15,76,36 Quicksort Complexity No extra space (memory) needed Operations Average ~2NlgN Worst ~N2 Comments Most well studied sorting algorithm Pivot selection Never select first or last element Middle element is reasonable Median-of-Three (first, middle and last) partitioning is better Binary Search If the array is SORTED, it makes searching a LOT easier! 2 5 6 9 11 16 24 32 66 79 Array is sorted, small to large (left to right) Guess the middle position Is it the number we want? YES: we are finished NO: THEN ASK: IS THE NUMBER WE GUESSED TOO BIG? Then look ONLY at numbers LEFT of this position, and repeat process IS THE NUMBER WE GUESSED TOO SMALL? Then look ONLY at numbers RIGHT of this position and repeat process Recursive binary search Simpler version of itself ? Search the lower half - “first to mid-1 elements” or Search the upper half - “mid+1 to last elements” Base cases ? Find the target identified by “mid” No target found – “last < first” Heaps A heap is a complete binary tree (Rule 1) the key value of each node is >= to the key values of all nodes in all of its sub-trees (Rule 2) Root has the largest key value. (max-heap) each sub-trees is a heap as well Question Differences between BSTs & heaps? BSTs are not necessarily complete binary trees, heaps are complete binary trees In BST, root key value is greater than the key values of the nodes in its left sub-tree and less than the key values in its right sub-tree In heaps, root key value is the largest in the tree and key values in right sub-tree are not necessarily greater than key values in the left sub-tree. Basic Heap Operations Two basic operations are performed on a heap: Insert – insert a node into a heap Delete – remove root node from the heap Heap operations Insert a new node with key value 11 24 9 12 10 7 11 Need to repair the structure in order to maintain heap property: ReheapUP Heap operations Delete the root node 7 24 9 12 10 7 Need to repair the structure in order to maintain heap property: ReheapDown Basic Heap Algorithms To implement the insert and delete operations, we need two basic buildingblock algorithms: ReheapUp ReheapDown ReheapUp Imagine we have a complete binary tree with N elements. Assume that the first N-1 elements satisfy the order property of heaps, but the last element does not. The ReheapUp operation repairs the structure so that it is a heap by “floating” the last element up the tree until that element is in its correct location in the tree. ReheapDown Imagine we have a complete binary tree that satisfies the heap order property except in the root position. This situation occurs when the root is deleted from the tree, leaving two disjoint heaps. To combine the two disjoint heaps, we move the data in the last tree node to the root position. “SWAP” the last leaf to the root position Obviously, this action destroys the tree’s heap properties! ReheapDown To restore the heap property we need an operation that will “sink” the root down until it is in a position where the heap order property is satisfied. We call this operation ReheapDown. Heap Data Structure As heap is a complete binary tree, it is most often implemented in an array Hence, the relationship between a node and its children is fixed and can be calculated. Heap Data Structure For a node located at index i, its children are found at: Left child: 2 i +1 Right child: 2 i +2 0 1 3 2 4 5 The parent of a node at index i is found at: (i -1)/2 Heap Data Structure Given the index for a left child, j, its right sibling, if any, is found at j+1. Given the index for a right child, k, its left sibling, which must exist, is found at k-1. Build Heap Build a heap from an array of elements that are in random order Divide the array into two parts: the left being heap & the right being data to be inserted into the heap. Initially, the heap only contains the first element. Start from the second element, calling ReheapUp for each array element to be inserted into the heap. Insert Heap Once a heap is build, a new data item (new node) can inserted into the heap as long as there is room in the array To insert a new node into a heap Locate the first empty leaf in the array – immediate after the last node Place the new node at the first empty leaf ReheapUp the new node Delete Heap When deleting a node from a heap, the most common & meaningful logic is to delete the root. The heap is thus left without a root, to re-establish the heap Move the data in the last heap node to the root (swap last leaf to root position) ReheapDown the root. Heap Applications Three common applications of heaps Selection algorithms Select k’th largest numbers in an unsorted list of numbers Sorting Sort a list of numbers in descending (max-heap) / ascending (min-heap) order Priority queues Heap Applications Selection Create a heap from the list of unsorted number Delete k-1 elements from the heap (and place the deleted elements at the end of the heap and reduce the heap size by 1) The k’th element is now at the top - root Example: Select the 4’th largest number from the list of 10 numbers Heap Application Heap sort Build a heap from the list of elements to be sorted Delete the root of the heap, place the deleted elements at the end of the array that stores the heap & reduce the heap size by 1. Once all items have been deleted, you will have a sorted array Heap Sort Heap sort is an improved version of the selection sort in which the largest element (the root) is selected & exchanged with the last element of the unsorted list (or heap) Complexity nlog2n recursive problems Generally we can solve any problem recursively provided: 1. We an specify some specific solutions explicitly (base or stopping cases) 2. The remaining solutions can be defined in terms of these simpler solutions 3. Each recursive call in the definition brings the result closer to the explicitly defined solutions Solving problems using recursion 1. Express the problem as a simpler version of itself 2. Determine the base case(s) 3. Determine the recursive steps General form of recursion Algorithm recursion Purpose this illustrates the general form of a recursive algorithm Post: the solution has been found recursively BEGIN recursion1 IF the base case is reached THEN solve the problem ELSE split the problem into a simpler case using recursion ENDIF END recursion1 ASIDE … intro to complexity One ‘pass’ through a bubble-sort requires about 1 ‘calculation’ for each element, … so that is “n” number of calculations A complete bubble sort requires one ‘pass’ for each element, … so that is about “n x n” = “n2” number of calculations NB: “n x (n-1)” is about the same as “n2” A single traversal down a binary tree is about of order “log n” number of calculations Something like a Binary Search requires one traversal for each element, … so that is about “n x log n” number of calculations ASIDE … intro to complexity n n2 n Log n Log n n Log n n! 1 1 1 0 0 1 2 4 8 0.301 0.602 2 3 9 27 0.477 1.413 6 4 16 64 0.602 2.408 24 5 25 125 0.698 3.498 120 6 36 216 0.778 4.668 720 7 49 343 0.854 5.916 5,040 8 64 512 0.903 7.224 40,320 9 81 729 0.954 8.588 362,880 10 100 1,000 1 10 3,628,800 100 10,000 1,000,000 2 200 9.33x10157 1000 1,000,000 1,000,000,000 3 3,000 *huge*! Greedy Algorithms These algorithms work by taking what seems to be the best decision at each step No backtracking is done, can’t change your mind (once a choice is made we are stuck with it) Easy to design Easy to implement Efficient (when they work) Greedy Algorithms Example 1: making change Given we have $2, $1, 50c, 20c, 10c, 5c and 1c coins; what is the best (fewest coins) way to pay any given amount? The greedy approach is to pay as much as possible using the largest coin value possible, repeatedly until the amount is paid. E.g. to pay $17.97 we pay 8 x $2 coins, (=$16. cant pay 3 of these, or its too much!) 1 x $1 coin, (=$17) 1 x 50c coin, (=$17.50) 2 x 20c coins, (=$17.90) 1 x 5c coin (=$17.95) & 2 x 1c coins(15 coins total). This is the optimal solution (although this is harder to prove than you might think). Note that this algorithm will not work with an arbitrary set of coin values. Adding a 12c coin would result in 15c being made from 1 12c and 3 1c (4 coins) instead of 1 10c and 1 5c coin (2 coins). Greedy Algorithms General Characteristics We start with a set of candidates which have not yet been considered for the solution As we proceed, we construct two further sets: Candidates that have been considered and selected Candidates that have been considered and rejected At each step, we check to see if we have reached a solution At each step, we also check to see if a solution can be reached at all At each step, we select the best acceptable candidate from the unconsidered set and move it into the selected set We also move any unacceptable candidates into the rejected set Backtracking The Eight Queens problem We wish to place 8 queens on a chessboard in such a way that no queen threatens another. In other words, no two queens may be in the same row, column or diagonal. Let us assume we have a function solution(x) which returns True when x is a solution and False when it is not. Branch & Bound This time, we try to find some guidelines or hints to help us search the tree more efficiently If we can say that a certain branch CANNOT give us a good answer, then we can just “chop it off” and look elsewhere! Branch and bound The assignment problem A set of n agents are assigned n tasks. Each agent performs exactly one task. If agent i is assigned task j then a cost cij is associated with this combination. The problem is to minimise the total cost C. Backtracking is a kind of DEPTH-FIRST travel through a tree It will find AN answer If you keep doing it, it will find ALL answers eventually Branch & Bound is a kind of BREADTH-FIRST travel through a tree It will find the BEST (optimal) answer But, because it chops whole branches off, it usually wont find ALL possible answers Heuristic strategy Backtracking and branch-and-bound are “blind” in the sense that they do not look ahead beyond a local neighborhood when expanding the state-space-tree. Heuristic strategy explores further It looks at the difference between the current state and the goal state for pruning the state-space-tree. Heuristic strategy A promising node in the space-state-tree is defined as the one having the minimum value of a cost function Cost function for node, v, is defined as Cost(v) = g(v) + h(v) Where: g(v) is the cost from the start state (root node) to v and h(v) is estimated cost from v to the goal. As the path from v to goal has not bee found yet, the best can be done is to utilize a heuristic value for h(v) Strings a linear (typically very long) sequence of characters, e.g. A name, address, telephone number A word or text file or a book 0101010100000111100 (e.g. computer graphics) CGTAAACTGCTTTAATCAAACGC (a DNA sequence) Two common types of strings Text string – characters are from an alphabet (e.g. a sub set of ASCII or Unicode character set) Binary string – a simple sequence of 0 and 1 Strings – some definitions Length of a string Number of characters in the string Empty string String with zero number of characters, or length=0 A string is often represented as an array, e.g. S[0…n-1] – this is a string S of length n Substring A continuous part of a string, e.g. S[10…19] String Matching Given a text string of length n, and a pattern which is also a string, of length m (m<=n), .. find occurrences of the pattern within the text Here, we use “text” to refer to all types of strings. Example: text T = “In one sense, text strings are quite different objects from binary strings, since they are made up of characters from a large alphabet. In another sense, though, the two types of strings are equivalent, since each text character can be coded as a binary strings” pattern P =“strings” String Matching Let T[0…n-1] be the text (size n) & P[0…m-1] be the pattern (size m), To find an occurrence of P in T is like finding a position s, such that, T[s…,s+m-1] = P[0…m-1], 0 <= s <= n-m text T pattern P a b c a b a a b c a b a n=12 S=3 m=4 a b a a Brute-Force String Matching 1. Align the pattern P (size m) against the first m characters of the text T 2. Match the corresponding pairs of characters from left to right If all m pairs of the characters match, then, an occurrence of P has been found in T & the algorithm can stop If a mismatching pair is encountered, then the pattern P is shifted one position to the right & repeat Step 2 Brute-Force String Matching Complexity Two nested loops Outer loop – indexing through all possible starting indices of P in T At most executed ( n-m+1) times Inner loop – indexing to each character in P At most executed – (m) times Worst case O((n-m+1)m) -> O(nm) O(n2) if m=n/2 Average case Do not need to compare all m characters before shifting the pattern or increasing the outer loop index, s, ~ O(n+m) Work even with a potentially unbounded alphabet Brute-Force String Matching Example: T= “abacaabaccabacabaabb” (n=20) P= “abacab” (m=6) The Brute-Force algorithm performs 27 comparisons. If we make some clever observations along the way, we can reduce this number greatly! … Horspool’s algorithm Assume the alphabet is of fixed, finite size. eg: just 26 possible letters to choose from Basic concept The comparison starts at the end of P & moves backward to the front of P Maximize the “shift” of P without risking the possibility of missing an occurrence of P By looking at the character in T, T[i], that align with the last character of P, then the shift can be determined according to what is in P. Horspool’s algorithm Calculate shift values For a particular character c in T, its shift depends on if & how it appears in the pattern P. For a fixed, finite size of alphabet, we can precompute the shifts for all possible characters in the alphabet and store the shift values in a lookup table The table will be indexed by all possible characters that be encountered in a text Horspool’s algorithm Calculate shift values, t(c) Shift by pattern’s length ‘m’, if c is not among the first m-1 characters of the pattern Otherwise: Shift by the distance from the rightmost c among the first m-1 characters of the pattern to its last character. P = “leader” P = “recorder” t(a) = 3, t(c) = 5, t(d) = 2 t(d) = 2 t(e) = 1 t(e) = 1 t(l) = 5 t(o) = 4 t(r) = 6 t(r) = 3 Horspool’s algorithm Calculate shift values, t(c) Shift by the pattern’s length, m, if c is not among the first m-1 characters of the pattern Otherwise: shift by the distance from the rightmost c among the first m1 characters of the pattern to its last character index 0 1 2 3 4 5 6 7 Pattern (m=8) R E C O R D E R 5 4 3 2 1 Shift value (t(c)) m-1-(index of the rightmost occurrence of the letter within the first m-1 characters of P) Horspool’s algorithm 1. 2. 3. For a given pattern P, of length m and the alphabet used in both pattern and text T, construct the shift table, Table Align P against the beginning of T Repeat the following until either a match of P is found or P reaches beyond the last character of T 1) 2) Start with the last character in P Compare the corresponding characters in P and T until either all m characters in P are matched or mismatching pair is encountered a. If P is fully matched, stop b. If a mismatching pair is encountered, shift the pattern by Table(c) characters to the right along the text, where c is the character in T that is aligned with the last character of the pattern P How much to jump? Move pattern to its next place Check pattern against text, starting from the right Continue till you find an error … Look up the “wrong” text letter in the jump table “jump” the pattern that far … LESS the number of characters along in the pattern you found the error .. Say the error letter was “S” and S = “jump 10” BUT, you found the “S error” 4 letters ‘into’ the Pattern THEN you jump 10 – 4 = 6 spaces c a b … h i … m … r … t … y x t(c) 5 5 3 5 4 5 2 5 5 5 1 5 5 5 b h m r i t h m r i t h m Error letter is a “b” … the Table says “for b, jump = 5” BUT, the error was found 2 places ‘back’ in the Pattern, so really only Jump 5-2 = 3 places Find a Hamiltonian circuit starting from a vertex (say “a”) from the following graph using backtracking a b c d f e Hamiltonian circuit – A path that visits all the graph’s vertices exactly once before returning to the starting vertex Complexity P NP NP-complete