DOC Summary File

Summary – DSA, KE 2008, Year 1 Helpful: Animations of all search algorithms / ADTs Author Notes: General In essence, the ADTs given in DSA should not be looked on in a case by case comparison. To reach a solution it is vital to understand that ADTs can be combined, and often more complex ADTs are only several smaller ADTs combined. An ADT is nothing more than a rhetorical description of how a object is supposed to work. NOT the description of how it should be implemented. (Think of it as JAVA abstract classes or interfaces, and per example, a JAVA Vector is a implementation of the Sequence ADT & Vector ADT.) 1 Table of Contents HELPFUL: .......................................................................................................................................................... 1 CHAPTER 1 - INTRODUCTION ............................................................................................................................ 3 1.1– PSUEDO CODE ...................................................................................................................................................... 3 1.2– ASYMPTOTIC NOTATION ......................................................................................................................................... 3 1.3- QUICK MATH REVIEW ............................................................................................................................................ 4 CHAPTER 2 – BASIC DATA STRUCTURES............................................................................................................ 5 2.1 – STACKS AND QUEUES ............................................................................................................................................ 5 2.2 – VECTORS, LISTS AND SEQUENCES............................................................................................................................. 7 2.5 DICTIONARIES AND HASH TABLES ............................................................................................................................. 28 CHAPTER 3 – SEARCH TREES AND SKIP LISTS .................................................................................................. 30 3.1 – ORDERED DICTIONARIES AND BINARY SEARCH TREES ................................................................................................ 30 3.2 – AVL TREES ....................................................................................................................................................... 33 3.4 – SPLAY TREES ..................................................................................................................................................... 35 CHAPTER 4 ..................................................................................................................................................... 37 4.1 MERGE-SORT ....................................................................................................................................................... 37 4.2 THE SET ABSTRACT DATA TYPE ................................................................................................................................ 38 4.3 QUICK SORT......................................................................................................................................................... 39 4.4 A LOWER BOUND ON COMPARISON-BASED SORTING .................................................................................................. 41 4.5 BUCKET-SORT AND RADIX-SORT .............................................................................................................................. 41 4.6 COMPARISON OF SORTING ALGORITHMS ................................................................................................................... 42 CHAPTER 5: FUNDAMENTAL TECHNIQUES ..................................................................................................... 44 5.2 DIVIDE-AND-CONQUER .......................................................................................................................................... 44 5.3 DYNAMIC PROGRAMMING ...................................................................................................................................... 44 CHAPTER 6: GRAPHS ...................................................................................................................................... 46 6.1 THE GRAPH ADT .................................................................................................................................................. 46 6.2 DATA STRUCTURES FOR GRAPHS .............................................................................................................................. 47 6.3 GRAPH TRAVERSAL ................................................................................................................................................ 48 6.4 DIRECTED GRAPH.................................................................................................................................................. 49 CHAPTER 9: TEXT PROCESSING ....................................................................................................................... 51 9.1 STRINGS AND PATTERN MATCHING ALGORITHMS........................................................................................................ 51 9.2 TRIES .................................................................................................................................................................. 51 CHAPTER 12: COMPUTATIONAL GEOMETRY .................................................................................................. 54 12.1 RANGE TREES ..................................................................................................................................................... 54 12.3 QUADTREES AND K-D TREES ................................................................................................................................. 54 2 Chapter 1 - Introduction 1.1 – Psuedo Code Psuedo Code: Code written for a human reader, not a computer. Structure:  Expressions: Assignment: ← , Comparators: <,>,=, ≤, ≥, ≠  Method Declaration: Algorithm <name> (param1, param2, … )  Decision Structurs: if (condition) then [true-action] else [false-action]  While-loops: while (condition) do [action]  Repeat-loops: repeat [action] until (condition)  For-loops: for (variable-increment-definition) do [action]  Array-Indexing: A[i], ith cell of array A  Method Calls: object.method(args)  Method returns: return (value) Average Case running time : (Worst Case running time + Best Case running time) / 2 Always count Worst Case running time 1.2 – Asymptotic Notation Ways to define running time of an algorithm:  Big-Oh: “less than or equal to” f(n) <= g(n) ; f(n) * c <= g(n) * e for all n after n, including n “f(n) is order of g(n)”, “f(n) is big-Oh of g(n)” , “ f(n) is O(g(n))”  Big-Omega: “greater than or equal to” f(n) >= g(n) ; f(n) * c >= g(n) * e for all n after n, including n) “f(n) is big-Omega of g(n)”, “ f(n) is Ω (g(n))”  Big-Theta: “equal” f(n) <= g(n) ; f(n) * c <= g(n) * e for all n after n, including n “f(n) is big-Theta of g(n)” , “ f(n) is Θ (g(n))” Difference between Big- and Little-Oh: Big-Oh: ∃ - There exists an constant c or e: f(n)*c is O(g(n)*e) Little-Oh: ∀ - For all constants c or e: f(n)*c is O(g(n)*e) Functions by Growth Rate: log n log² n √n n n log n n² n³ 2n 1.3 - Quick Math Review Log rules Log b (a) = c if a = b ^ c 1 Log b ac = log b (a) + log b (c) 2 Log b (a/c) = log b (a) – log b (c) 3 Log b (ac) = c log b (a) 4 Log b (a) = (log c (a)) / log c (b) 5 b ^ ( log c (a)) = a(log c (b)) 6 (ba)c = b(a*c) 7 (ba) *(bc) = b(a+c) 8 ba / bc = b(a-c) ⌈x ⌉ = largest integer less than or equal to x ⌊x ⌋ = smallest integer greater than or equal to x Justification techniques - Counterexample - Contra positive - Contradiction - Induction - Loop invariant Chapter 2 – Basic Data Structures 2.1 – Stacks and Queues 2.1.1 – Stack Container of objects that are inserted and removed according to last-in first out (LIFO) ADT: push(o): pop(o): Insert object o at top of stack Remove and return last object inserted into stack. Error if stack is empty size(): isEmpty(): top(): Return number of objects in stack Return Boolean indicating if stack is empty Return the last object, without removing it. Error if stack is empty Additional Information Stack uses a variable t to keep track of the number of objects within Stack. Psuedo Code - Stack Array: Algorithm push(o): If size() = N then indicate that stack-full error has occurred. t ← t+1 S[t] ← o Algorithm pop(): if isEmpty() then indicate that a stack-empty error has occurred. e ← S[t] S[t] ← null t ← t-1 return e 2.1.2 - Queue Container of objects that are inserted and removed according to first-in first-out (FIFO) Objects enter Queue at the rear, are removed from front ADT: enqueue(o): dequeue (): Insert object o at the rear of queue Remove and return object inserted at the front. Error if queue is empty size(): isEmpty(): front(): Return number of objects in queue Return Boolean indicating if queue is empty Return the front object, without removing it. Error if queue is empty Additional Information Queue uses 2 variables, f & r, to keep track of the cell storing the first object and the first free cell, respectively. N is the number of cells within the Array containing all the objects. (size of array for holding objects) Queue is empty if f = r. Psuedo Code - Stack Array: Algorithm enqueue(): If size() = N then throw a QueueFullException Q[r] ← o r ← (r+1) mod N Algorithm dequeue (o): if isEmpty() then throw a QueueEmptyException temp ← Q[f] Q[f] ← null f ← (f+1) mod N return temp 2.2 – Vectors, Lists and Sequences 2.2.1 – Vectors Linear sequence that supports access to its elements according to rank. ADT: elemAtRank(r): replaceAtRank (r,e): insertAtRank(r,e): removeAtRank(r): Return object at rank r. Error if r < 0 or r > n-1 Replace object at rank r with e. Error if r < 0 or r > n-1 Insert e into Vector, and give it rank r. Error if r < 0 or r > n Remove object at rank r. Error if r < 0 or r > n-1 size(): isEmpty(): Return number of objects in queue Return Boolean indicating if queue is empty Additional Information Vector contains n elements. [0] = first element, [n-1] = last element. Running times Method size() isEmpty() elemAtRank(r) replaceAtRank(r, e) insertAtRank(r, e) removeAtRank(r) Time O(1) O(1) O(1) O(1) O(n) O(n) Psuedo Code - Vector Array: Algorithm insertAtRank(r, e): for i = n-1, n-2, n-3, … , r do A[i+1] ← A[i] A[r] ← e n ← n+1 // Make room for new element Algorithm removeAtRank (r): temp ← A[r] for i = r, r+1, r+2, … , n-2 do A[i] ← A[i+1] n ← n-1 return temp // fill in for the removed element 2.2.2 – Lists Linear sequence that supports access to its elements according to so called Nodes. A Node is a container which keeps a reference to the node before and after it. From now on, a Node will be called a Position. ADT: first(): last(): isFirst(p): isLast(p): before(p): after(p): Return the position of the first element of List. Error if List is empty. Return the position of the last element of List. Error if List is empty. Return Boolean indicating if p is first position within list. Return Boolean indicating if p is last position within list. Return the position before the position p. Error if p is first. Return the position after the position p. Error if p is first. replaceElement(p, e): swapElements(p, q): insertFirst(e): insertLast(e): insertBefore(p, e): insertAfter(p, e): remove(p): Replace element at p with e, Return element prev. at position p. Swap elements; move p to q, q to p. Insert e into List as first element. (not replace) Insert e into List as last element. (not replace) Insert e before position p into List. Insert e after position p into List. Remove element at position p from List. Position ADT: A Position is always defined relative, ie. “after” or “before” another position. Each position contains the object we want to store at the position. element(): Return object contained within position Linked List Implementation A Linked List is a direct implementation of a List ADT. We also need to update the definition of a Position. In a singly linked-list, the position stores a reference to the position coming before it, prev(). In a doubly linked-list, the before AND after position references are stored, prev() & next(). // Note: you can also implement variable wise, instead of method wise, like: Position.next ← position.prev instead of Position.next() ← position.prev() To simplify matters, special positions called Sentinel positions are stored at the beginning and the end of the List. Psuedo Code – Linked List: Algorithm insertAfter(p, e): Create a new node v v.element ← e v.prev ← p v.next ← p.next (p.next).prev ← v p.next ← v return v // Link to predecessor // Link v to successor // link p’s old successor to v // link p to its new successor, v // the position for the element e Algorithm remove (p): t ← p.element (p.prev).next ← p.next (p.next).prev ← p.prev p.prev ← null p.next ← null return t // Temp variable for element // Link out p // invalidate position p 2.2.3 Sequences An ADT that supports both the Vector and List ADT. ADT: atRank(r): rankOf(p): Return the position of the element with rank r. Return the rank of the element at position p. Additional Information Can be either implemented with an Array or an Doubly Linked List. An Array implementation takes O(N) space, a DLList takes O(n) space. (Array is statically sized at a start and always takes N space, a DLList grows) Running times Operations size, isEmpty atRank, rankOf, elemAtRank first,last, before, after replaceElements, swapElements replaceAtRank insertAtRank, removeAtRank insertFirst, insertLast insertAfter, insertBefore remove Array O(1) O(1) O(1) O(1) O(1) O(n) O(1) O(n) O(n) List O(1) O(n) O(1) O(1) O(n) O(n) O(1) O(1) O(1) Iterator ADT An Iterator is a object that can go through a collection of elements, one element at a time. An Iterator consists of a Sequence and a current Position. (it extends Position ADT) hasNext(): nextObject(): Test whether there are elements left in the Iterator Return and remove the next element in the Iterator. 2.3 Trees 2.3.1 Tree ADT A Tree is a ADT that stores elements hierarchically. Elements are stored in a parent-child relationship. Every element has zero or more children and one parent. (except the root, which is the initial element and has no parent) Children with a same parent are siblings. Nodes are called external or leaves when they have no children. Internal when they have one or more. Subtree of a tree at v is a tree consisting of all children of v, with v as the root. Ancestor is a parent of the node, or a parent’s parent, etc…. Descendent, same, but with children. v is descendent from p, p is an ancestor of v. Ordered tree is a tree that has a linear ordering for all it’s children. (You know which element is first, second, third) Binary tree is a ordered tree where every node has at most two children. A left child and a right child. Which subsequently give a left subtree and a right subtree. ADT: // Accessor Methods root(): parent(v): children(v): Return the root of the tree. Return the parent of the node v. Error if v is root Return Iterator containing all children of node v. // Query Methods isInternal(v): isExternal(v): isRoot(v): Test whether node v is internal. Test whether node v is external. Test whether node v is the root. // Generic Methods size(): elements(): positions(): swapElements(v, w): replaceElement(v, e): Return the number of nodes Return the number of elements stored in the nodes Return Iterator containing all nodes Swap elements stored at nodes v and w Replace e and return element @ node v Additional Information Depth of a node is the amount of its ancestors, excluding node itself. Height of a tree is the maximum depth of a external node. Or; height of node v is 1 + maximum height of a child of v. Maximum height is height(T, root). Algorithm depth(T, v): If T.isRoot(v) then return 0 else return 1 + depth(T,T.parent(v)) Runs in O(1 + dv), dv: depth of node v in tree T Algorithm height(T, v): If T.isExternal(v) then return 0 else h=0 for each w ∊ T.children(v) do h = max( h, height(T, w)) return 1 + h Runs in O(n), n: number of nodes within tree T. If called on root that is. It goes through all children of node v on which it is called, thus a complete tree traversal. Running times Method root(), parent() isInternal(v), isExternal(v),isRoot(v) children(v) swapElements(v,w), replaceElement(v,e) elements(), positions() Time O(1) O(1) O(cv), cv: number of children O(1) O(n), n: nodes in tree Tree Traversal Link to tree traversal applet  very good representation of all traversal methods Preorder Just traverses the children of the starting note, visiting every node as it comes along. Gives linear order of nodes where children com after a parent. Algorithm preorder(T, v): Perform “visit” action for node v for each child w of v do preorder(T, w) Runs in O(n), same as height(T, v) traversal //Whatever you want + set node as “visited” //recursively traverse subtree at w Postorder traversal Will visit a node after it has traversed every descendent of that node. Used if you need to know the information of all children before you can compute the value of a parent. Per example: sizes of files in a directory. Algorithm postorder(T, v): for each child w of v do postorder(T, w) Perform “visit” action for node v Runs in O(n), same as height(T, v) //recursively traverse subtree at w //Whatever you want + set node as “visited” 2.3.3 Binary Trees Ordered tree in which every internal node has exactly two children. ADT: leftChild(v): rightChild(v): sibling(v): Return left child of v. Error if v is external Return right child of v. Error if v is external Return sibling node of v. Error if v is root. If tree is improper binary tree (not every internal node has 2 children), extra errors may occur. Per example, there may not be a sibling. Addition Information Node with same depth d as another node is at the same level. Number of external nodes is at least h+1 and most 2h - Number of internal nodes is at least h and most 2 h-1 - Total number of nodes is at least 2h+1 and at most 2h+1-1 - Height is at least log(n+1)-1 and at most (n-1)/2 Inorder traversal Can be seen as going through the tree “from left to right”. First left child, parent, right child. Algorithm inorder(T, v): if v is an internal node then inorder(T, T.leftChild(v)) perform the “visit” action for node v if v is an internal node then inorder(T, T.rightChild(v)) // Go through left subtree // Mark node “visited // Go through right subtree Binary tree adapted Preorder and Postorder: Algorithm binaryPreorder(T, v): perform the “visit” action for node v if v is an internal node then inorder(T, T.leftChild(v)) inorder(T, T.rightChild(v)) Algorithm binaryPostorder (T, v): if v is an internal node then inorder(T, T.leftChild(v)) inorder(T, T.rightChild(v)) perform the “visit” action for node v // Mark node “visited // Go through left subtree // Go through right subtree // Go through left subtree // Go through right subtree // Mark node “visited Euler tour traversal An algorithm performing a uniform way of traversing a tree, it will encounter every node three times: from the left, from below and from the right. A Preorder traversal is a Euler tour with the “visit” actions performing when you encounter a node from the left. A Inorder traversal is a Euler tour with the “visit” actions performing when you encounter a node from below. A Postorder traversal is a Euler tour with the “visit” actions performing when you encounter a node from the right. Algorithm eulerTour(T,v): Perform left “visit” action if v is an internal node then eulerTour(T, T.leftChild(v) Perform below “visit” action if v is an internal node then eulerTour(T, T.rightChild(v) Perform below “right” action Runs in O(n) time: n number of nodes in tree T // Traverse left subtree // Traverse right subtree 2.3.4 Data Structures for Representing Trees Vector-Based Binary Tree structure Based on the premise that every node has a number. Also known as level numbering. p(v) is the method that will return the number of the node. The Vector has the size N = Pm+1 Pm being the maximum value of p(v), +1 because the numbers start at 1, not 0. if v is root: p(v) = 1 if v is left child of u: p(v) = 2p(u) if v is right child u: p(v)=2p(u)+1 Additional Information Running times Method elements, positions swapElements, replaceElement root, parent, children leftChild, rightChild, sibling isInternal, isExternal, isRoot Time O(n), n: nodes in tree O(1) O(1) O(1) O(1) Linked Structure for Binary Tree Tree in which every node is represented by a position which contains a reference its element, and the positions of the left child, right child, and the parent. If node is the root, parent reference is null. If node is external, child references are null. Size is O(n), because there is a position for every node in the tree. Additional information Running times Method size, isEmpty elements, positions swapElements, replaceElement root, parent children(v) isInternal, isExternal, isRoot Time O(1) O(n), n: nodes in tree O(1) O(1) O(cv), cv: children of node O(1) 2.4 Priority Queue and Heaps 2.4.1 Priority Queue ADT A container of elements which gives a comparable key to an element, the moment the element is inserted into the container. Keys must follow these comparison rules, i.e. it must follow a total order relation: - Reflexive property: k ≤ k - Antisymmetric property: if k1 ≤ k2 and k2 ≤ k1, then k1 = k2 - Transitive property: if k1 ≤ k2 and k2 ≤ k3, then k1 ≤ k3 ADT: insertItem(k, e): removeMin(): minElement(): minKey(): Insert an element e with key k into Priority Queue. Return and remove element with the smallest key within PQ. Return element with the smallest key within PQ. Error: empty PQ return the smallest key within PQ. Error: empty PQ Comparator An algorithm/ADT which specifies in which way a key is compared. I.E. an object that compares keys. ADT: isLess(a, b): isLessOrEqualTo(a, b): isEqualTo(a, b): isGreater(a, b): isGreaterOrEqualTo(a, b): isComparable(a): True if and only if a is less than b. True if and only if a is less than or equal to b. True if and only if a and b are equal. True if and only if a is greater than b. True if and only if a is greater than or equal to b. True if and only if a can be compared. 2.4.2 PQ-Sort, Selection-Sort and Insertion-Sort A sorting problem is a problem in which a container C with n elements need to be sorted in increasing order, or at least non decreasing if there are ties. All elements should be comparable by a total order relationship. PQ-Sort Selection-Sort & Insertion-Sort A very simple algorithm which accepts a unsorted list, and works with an unsorted list using a Priority Queue to sort. Its output is a sorted list. 1. All elements are placed in a empty Priority Queue. Giving a key to each element. 2. All elements are extracted in non decreasing order by using removeMin(), putting them back in C. If this is implemented using a unsorted sequence, phase 1 takes O(n) and phase 2 takes O(n2) This is also known as a Selection-Sort, because selection and thus ordering is done in the second phase. If this is implemented using a sorted sequence, phase 1 takes O(n2) and phase 2 takes O(n) This is also known as a Insertion-Sort, because selection and thus ordering is done in the first phase. The difference is that SS always takes Ω(n2) and IS, in best circumstances takes O(n). (if the list is in reverse, sorted order) Algorithm PQ-Sort(C, P): Input: n-element sequence C, and PQ P to compare element, using total-order relationship. Output: sequence C sorted by total order relationship while C is not empty do //Phase 1 e ← C.removeFirst() P.insertItem(e, e) while P is not empty do e ← P.removeMin() C.insertLast(e) // remove element from C // Key is element itself //Phase 2 // remove smallest from P // Add element at end of C 2.4.3 Heap Data Structure A PQ data structure that is efficient for both insertion and removal. (insertion & selection-sort). It does this by storing all elements and keys at the internal nodes of a binary tree. Last node of the tree is the right-most, deepest node of T. Heap-Order Property: The key stored at v is ≥ key stored at v’s parent. Minimum key is thus always at root. Complete Binary Tree: For efficiency reason, we want the lowest height possible. Every level must have the maximum number of nodes. Level i: 2i nodes All internal nodes must be left to any external nodes. I.E. will be visited before external nodes in a inorder traversal Additional Information Heap PQ implementation consists of the following: - Heap: complete binary tree. Implemented using a Vector. - Last: reference to the last node of T. - Comp: a comparator that defines a total order relation for the keys. It maintains the minimum key at the root. A heap storing n keys has a height h = ⌈log(n+1)⌉ Number of nodes within a heap is at least = 2h-1 and at most = 2h - 1 First empty external node is at key = n+1. First key = 1 Last key = n Usually the insertion position, the position at which a new node is added, is key = n+1 After insertion, the new node becomes the last node of the tree. Up-Heap Bubbling (after insertion) Restores the Heap-Order Property. It checks if the parent of the new node has a higher key, and if so, swaps location with the parent. Will continue to do so until it is either at the root, or parent has a lower key. This process is called Up-Heap Bubbling. Because at maximum it will need to go to the root, it will need to go at most the height number of steps, thus O(log n) running time. n= tree height Not a correct binary tree, but correct Up-Heap-Bubbling. Down-Heap Bubbling (after removal) When removing a node using removeMin(), the last node in the tree is taken and set at the root. We then need to restore Heap-Order Property using Down-Heap Bubbling. D-H B checks if there exists a child of v that has a smaller key than v. And if so, swaps places with it. If both children have keys that are smaller, the child with the smallest key is swapped with v. It will continue swapping until there is no child of v that has a smaller key. Because at maximum it will need to go to the last node, it will need to go at most the depth of the last node. Thus O(log n) running time. n= depth of last node Not a correct binary tree, but correct Up-Heap-Bubbling. Running Times Method Time size, isEmpty O(1) minElement, minKey O(1) insertItem O(log n) removeMin O(log n) 2.4.4 Heap-Sort If you implement the PQ sorting scheme with a heap, you get an algorithm known as heap-sort. Which has the following theorem: Heap-Sort sorts a sequence of n in O(n log n) time. Heap Sort animation Heap-Sort In Place An algorithm is said to be running in place if it only uses a constant amount of memory in addition to the memory required for the objects themselves. This requires that the sequence to be sorted is implemented as an array. We then use the array to store the heap, instead of using an external heap. The outline is as following: 1. Logically divide the array into a portion in the front that contains the growing heap and the rest that contains the elements of the array that have not yet been dealt with. o Initially the heap part is empty and the not-yet-dealt-with part of the array is the entire array. o At each insertion we remove the left most entry from the array part and insert it in the heap, growing the heap to include the memory previously used by the newly inserted element. The blue line moves down. o At the end the heap uses all the space. We are making the optimization discussed before that we only store the internal nodes of the heap and do not leave the waste the first (index 0) component of the array used to store the heap. 2. Do the insertions with a normal heap-sort but change the comparison so that a maximum element is in the root (i.e., a parent is no smaller than a child). 3. Now do the removals from the heap, moving the blue line back up. o The elements removed are in order from big to small. o This is perfect since we are going to store them starting at the right of the array since that is the portion of the array that is made available by the shrinking heap. Bottom-Up Heap Construction Heap construction runs in O(n log n) time if n objects are added using insertItem(). If all elements are given in advance, bottom-up construction can occur in O(n) time. This construction will construct a complete binary tree with height = log(n+1). It is called Bottom-Up Heap construction because the algorithm begins with the external nodes and runs up the tree. Algorithm BottomUpHeap(S): Input: A sequence S storing n keys Output: a heap storing keys in S If S is empty then return an empty heap // Consisting of 1 external node Remove first key, k from S Split S into, S1 and S2, each of size (n-1)/2 T1 ← BottomUpHeap(S1) T2 ← BottumUpHeap(S2) Create binary tree T with root r storing k, left-subtree T1 and right-subtree T2 Perform a down-heap bubbling from root r of T // Restore heap order return T Bottom and Top construction Test Locator ADT In our current setup (Heap Vector based implementation) we have: - A binary tree represented as a vector. List of cells, associated with a number which contain a element. You call the cell number, to retrieve the element. - Every element is associated with a comparable key, so that it may be sorted according to total order relationships using the key as comparison. If the element itself can be compared, then it can be the key. I.E. elements which are numbers. The problem with our current implementation is that the element and the key do not know which position/cell they are in. To overcome this limitation, we implement another ADT, the locator ADT. The purpose of this ADT is to link key, element and location (cell or position) together. The locator “attaches” itself to the element, and therefore the key, and is constantly updated with a reference to the elements’ cell / position when the element changes cell / position. ADT: element(): Return the element associated with this locator. key(): Return the key associated with this locator. Locater Based PQ Methods: Logically, we can then extend the methods of the PQ to make use of this functionality. Priority Queue ADT Update: min(): insert(k, e): remove(l): replaceElement(l,e) replaceKey(l,k) Additional Information: Running times Return the locator of the element with the smallest key. Insert new item with element e and key k into PQ and return a locator referencing to the new item. Remove from PQ the locator l. Replace element in locator l with e and return previous element. Replace key in locator l with k and return previous key. Operations size, isEmpty, key, replaceElement minElement, min, minKey insertItem, insert removeMin remove replaceKey Unsorted Sequence O(1) O(n) O(1) O(n) O(1) O(1) Sorted Sequence O(1) O(1) O(n) O(1) O(1) O(n) Heap O(1) O(1) O(log n) O(log n) O(log n) O(log n) 2.5 Dictionaries and Hash Tables 2.5.1 The Unordered Dictionary ADT A dictionary is a ADT which stores elements and keys in pairs in objects called items. In general, dictionaries are allowed to store multiple elements under one key. ADT: findElement(k): insertItem(k,e): removeElement(k): return element associated with key k. return NO_SUCH_KEY element if such a element does not exist. Insert item with key k and element e into dictionary. Remove item which has key k from dictionary and return. return NO_SUCH_KEY element if element does not exist. Additional Information: The special element NO_SUCH_KEY is called a sentinel. An implementation of a dictionary with a unsorted sequence is often called a log file, or audit trail. It is used to store small amounts of information which are unlikely to change over time. This implementation is often called a unordered sequence implementation. Space usage is: Θ(n). Chapter 3 – Search Trees and Skip Lists 3.1 – Ordered Dictionaries and Binary Search Trees Dictionary: Searchable collection of key-element items. For example, an address book. Operations are as follows (As defined in section 2.5.1): findElement(k): if the dictionary has an item with key k, returns its element, else, returns the special element NO_SUCH_KEY insertItem(k, o): inserts item (k, o) into the dictionary removeElement(k): if the dictionary has an item with key k, removes it from the dictionary and returns its element, else returns the special element NO_SUCH_KEY New operations are: closestKeyBefore(k): Return the key of the item with the largest key less than or equal to k. closestElemBefore(k): Return the element for the item with largest key less than or equal to k. closestKeyAfter(k): Return the key of the item with smallest key greater than or equal to k. closestElemAfter(k): Return the element for the item with smallest key greater than or equal to k. Each of these methods returns the special NO_SUCH_KEY object if no item in the dictionary is present. 3.1.1 Sorted Tables A lookup table is a ordered dictionary implemented with a sorted sequence store the items of the dictionary in an array-based sequence, sorted by key. It is one way of implementing a dictionary. Performance: findElement O(log n) (using binary search) insertItem O(n) (shifts) removeElement O(n) (shifts) 3.1.2 Binary Search Tree As we use a lookup table, which is array-based, we can use binary search tree for the searching algorithm. Below is the pseudo-code of a Binary Search Algorithm: Algorithm BinarySearch(S, k, low, high): if low > high then return NO_SUCH_KEY else mid ← [ (low + high) / 2] if k = key (mid) then return elem(mid) else if k < key(mid) then return BinarySearch(S, k, low, mid – 1) else return BinarySearch(S, k, mid + 1, high) Method Log File Lookup Table findElement O(n) O(log n) insertItem O(1) O(n) removeElement O(n) O(n) closestKeyBefore O(n) O(log n) Denoted the number of items in the dictionary at the time a method is executed with n. Comparison of Log File and Lookup Table, when using an ordered dictionary. 3.1.3 Searching in a Binary Search Tree T = tree k = search key v = node To search for a key k: Algorithm findElement(k, v) if T.isExternal (v) return NO_SUCH_KEY if k < key(v) return findElement(k, T.leftChild(v)) else if k = key(v) return element(v) else if k > key(v) return findElement(k, T.rightChild(v)) 3.1.4 Insertion in a Binary Search Tree To perform operation insertItem(k, o): 1) Search for key k. 2) If k is not already in the tree, let w be the leaf reached by the search. 3) Insert k at node w and expand w into an internal node. 3.1.5 Removal in a Binary Search Tree To perform operation removeElement(k): 1) Search for key k. 2) If k is in the tree, let v be node storing k. 3) If v has a leaf child w, remove v and w. 4) If v has no leaf child, same as children of v are both internal, find the internal node w that follows v (inorder), copy the key(w) into v, and remove remove w and the left child z (This child is a leaf). 3.1.6 Performance in a Binary Search Tree A binary search tree with a certain height for a certain number of key-element items uses O(n) space and executes the dictionary ADT operations with following running times: h = height of tree. n = number of items. s = size of the iterators returned. Method Time Size, isEmpty O(1) findElement, insertItem, removeElement O(h) findAllElements, removeAllElements O(h + s) 3.2 – AVL Trees An AVL Tree is a self-balancing binary search tree. The reason for this tree is that we want to achieve logarithmic time for all the fundamental dictionary operations. AVL Trees follow the height-balance property: for every internal node v of T, the heights of the children of v can differ by at most 1. A subtree of an AVL tree is an AVL tree itself. The height of an AVL tree T storing n items is O(log n). 3.2.1 Update Operations Insertion Procedure is in principle the same as the insertItem operator in a binary tree. However, after the insertion, the tree may become unbalanced. Hence we need to apply Trinode Restructuring (Explained below). Removal Procedure is in principle the same as the removeElement operator in a binary tree. However, after the removal, the tree may become unbalanced. Hence we need to apply Trinode Restructuring (Explained below). Trinode Restructuring Algorithm trinodeRestructuring 1) Let (a,b,c) be a left-to-right (inorder) listing of the nodes x, y, and z, and let (T0, T1, T2, T3) be a left-to-right (inorder) listing of the four subtrees of x, y and z not rooted at x, y, or z. 2) Replace the subtree rooted at z with a new subtree rooted at b. 3) Let a be the left child of b and let T0 and T1 be the left and right subtrees of a. 4) Let c be the right child of b and let T2 and T3 by the left and right subtrees of c. Might want to look at Figure 3.14 (page 154) for an example. 3.2.2 Performance Method Time Single Restructure (using linked-structure binary tree) O(1) Find - Height of tree (no restructures needed) O(log n) Insert - Initial Find - 1 Restructure O(log n) Remove - Initial Find - Restructuring up the tree, maintaining heights O(log n) 3.4 – Splay Trees A splay tree is a self-balancing binary search tree with the additional property that recently accessed elements are quick to access again. It performs basic operations such as insertion, look-up and removal in O(log(n)) amortized time. For many non-uniform sequences of operations, splay trees perform better than other search trees, even when the specific pattern of the sequence is unknown. It is conceptually different from AVL trees, as it does not have any explicit rules to enforce it's balance. Two things to remember: - Tree might get more unbalanced - Splaying costs O(h), where h is height of the tree – which is still O(n) worst-case O(h) rotations, each of which is O(1) 3.4.1 Splaying Each particular step depends on three factors: - Whether x is the left or right child of its parent node, p, - Whether p is the root or not, and if not - Whether p is the left or right child of its parent, g (the grandparent of x). Zig Step: This step is done when p is the root. The tree is rotated on the edge between x and p. Zig steps exist to deal with the parity issue and will be done only as the last step in a splay operation and only when x has odd depth at the beginning of the operation. Zig-zig Step: This step is done when p is not the root and x and p are either both right children or are both left children. The picture below shows the case where x and p are both left children. The tree is rotated on the edge joining p with its parent g, then rotated on the edge joining x with p. Zig-zag Step: This step is done when p is not the root and x is a right child and p is a left child or vice versa. The tree is rotated on the edge between x and p, then rotated on the edge between x and its new parent g. 3.4.2 Amortized Analysis of Splaying Amortization is worst-case analysis on all possible series of operations. The "amortized running time" of the operations is the average worst-case running time of the operations in the series Amortization gives "average case" analysis, without using probabilities It is done in an accounting way, by assigning “cyber euros” to operations. The main conclusions are listed below. For in depth prove, look at lecture slide 5a. - Cost of insertion and deletion is also O(log n). - Cost of a series of m operations on a splay tree is O(m log n). - Thus, amortized cost of any splay operation is O(log n). - When items are accessed often, the amortized cost can decrease to O(1) (Theorem 3.11). Chapter 4 4.1 Merge-Sort 4.1.1 Divide-and-Conquer Merge Sort is based on Divide and conquer. The 3 steps of divide and conquer: Divide: Recur: Conquer: if number of objects > treshold => divide. (if n = 0 or 1, return object immediately) Recursively solve the subproblems. ”Merge” the sub-solutions into a solution to the original problem. Ceiling: ⌈x⌉ (smallest int n) Floor: ⌊x⌋ (largest int k) Theorem: The merge-sort tree associated with an execution of merge-sort on a sequence of size n has height ⌈log n⌉. Merge Two sorter sequences,S1 and S2, merged by iteratively removing a smallest element from one of these two and adding it to the end of the output sequence, S, until one of these two sequences is empty, at which point we copy the remainder of the other sequence to the output sequence. (fig p 222) Running time Running time per level = O(n) (the divide part and the conquer part are linear) Running time per level * Number of levels = total running time. O(n) * O(log n) = O(n log n) 4.1.2 Merge-Sort and Recurrence Equations Another way to find the running time of the merge-sort algorithm, you can find it at page 224 (I can't write it shorter). 4.2 The Set Abstract Data Type Here we introduce the ADT set. A set is a container of distinct objects. That is, there are no duplicate elements in a set, and there is no order. Sets and some of their uses First we recall: Union: Intersection: Subtractions: A⋃B = {x:x∈ A or x ∈ B}, A⋂B = { x:x∈ A and x ∈ B}, A–B = { x:x∈ A and x ∉ B}. These calculations are used if you for example enter 2 query words, then the intersection has to be computed. ADT: union(B): intersection(B): subtract(B): Replace A with (←) A⋃B. Replace A with (←) A⋂B. Replace A with (←) A–B. 4.2.1 A Simple Set Implementation A generic version of the merge algorithm takes two sequences representing the input set, and constructs a sequence representing the output set, be it the union, intersection, or subtraction of the output sets. The generic algorithm iteratively examines and compares the current elements a and b of the input sequences (A and B) and finds out whether a < b, a = b, a > b. if: a < b: a goes to the output sequence, and the next element is evaluated a = b: a goes to the output sequence, and the next element is evaluated a > b: b goes to the output sequence, and the next element is evaluated Running Times At each iteration: -Compare 2 items of two input sets (A and B) - O(1) -Possibly copy an element to the output sequence - O(1) -Advance to the next element => O(na + nb) = O(n) Theorem: The set ADT can be implemented with an ordered sequence and a generic merge scheme that supports operations union, intersection and subtraction in O(n) time, where n denotes the sum of sizes of the sets involved. 4.3 Quick Sort Three steps: -Divide: if S is larger then 1, take a specific element of S, (in practice we take the last element) which we call pivot (x). Make three subsequences: -L, all elements < than x, -E, all elements = to x, -G, all elements > than x. -Recur: Recursively sort L and G. -Conquer: Merge L, E, G together. Like the merge-sort we can make a binary tree. But unlike merge-sort, the tree height can be linear (worst case). This happens when the tree is already sorted. (x will be the biggest number then) Running time - At each level, all elements have to be compared - O(n) - The height of the tree = n (in worst case) - O(n) => (O(n) * O(n) = O(n²) In the best case we get a merge-sort like tree. This means that L and G are equal, which results in a tree height of log n which then again results in O(n log n). 4.3.1 Randomized Quick-Sort Instead of taking the last element of the sequence, we take some random number's average. Now, using a probability theory, the average pivot taken will be the average one of the whole sequence. This means again that the height will be O(log n) which again results in O(n log n). 4.4 A Lower Bound on Comparison-Based Sorting Theorem: The running time of any comparison based algorithm for sorting an n-element sequence is Ω(n log n) in the worst case. The Running Time of a comparison based algorithm must be equal or greater than the height of the tree. The height cannot be smaller than log n because you have to compare every element at least once. 4.5 Bucket-Sort and Radix-Sort These algorithms are faster than O(n log n) but they require special assumptions about the input sequence to be sorted. Even so, such scenarios often arise in practice. In this section we consider the problem of sorting a sequence of items, each a key-element pair. 4.5.1 Bucket-Sort The special assumption is that each element has a key, these keys have a range [0, N-1]. So we have sequence S with integer key's [0, N-1]. Now we create a second array, say B (bucket) which has a size of N. We then place all the elements from S into B, but we place them at B[key] = element. Now we take all elements one by one from B to place them in S. (this is necessary because a list doesn't always has as much elements as the largest key says). Stable sorting Suppose you have 2 items with the same key; they will have a specific order in the original array. Stable sorting means that they will have the same order after sorting. (And after each subsequent sorting. Elements do not move around if you sort the same sequence twice.) 4.5.1 Radix-Sort Radix-sort is used for items with 2 keys. Example S = ((3,3),(1,5),(2,5)...) Radix-sort is actually just bucket sort, but you do it twice. (two key's) In a (k1, l1) < (k2, l2) situation, total-order relation is defined as such: - k1 < k2 or - k1 = k2 and l1 < l2 The order is important, suppose you want a lexicographical* ordered list, and you first do the first one, and then the second one you would get a wrong order. (examples at page 243). *) lexicographical = dictionary Running time O(d(n+N)) 4.6 Comparison of Sorting Algorithms -Insertion-Sort: If implemented well, running time of O(n+k) (k = number of inversions) Good for small sequences (less than 50). Also quite effective for almost ordered sequences. But the O(n²) makes it a poor choice -Merge-Sort: running time: O(n log n) in worst case (optimal for comparison based algorithms) good for large sequences because you can store parts on different places (if main memory is too small). -Quick-Sort: Good choice if it fits on main memory, but the running time of O(n²) makes it less attractive in real time applications where we must make guarantees on the time needed. -Heap-Sort: So if your memory is big enough, and you need to finish on time, heap sort is an excellent choice. It has a running time of O(n log n) and it can easily be made to execute in-place. -Bucket-Sort or Radix-Sort: Excellent choice for it runs in O(d(n + N)), where [0, N-1] is the range of the key's (and d = 1 for bucket sort). Thus d(n + N) is “below” n log n (formally it is equal), then this algorithm should run faster than even quick-sort or heap-sort. Chapter 5: Fundamental Techniques 5.2 Divide-and-Conquer Divide-and-Conquer is a technique that involves solving a problem by dividing it into smaller subproblems, solve each subproblem and merge these solutions into one solution. 5.2.1 Divide-and-Conquer Recurrence Equations With a recurrence equation we determine the run-time of an algorithm, with variable the input of size n. The problems is that in recurrence equation the original function T is still in the right hand side of the equation and we want this only dependent of n. We call this the closed form equation. There are some general ways for solving such an equation in divide-and-conquer algorithms.  Iterative substitution: Here you try to substitute the function T a couple of times and hope that we will see a pattern so it can be translated into a closed form equation.  Recursion tree: This technique is almost the same as iterative substitution. The only difference is that the recursion tree is more visual, while the iterative sub. is more algebraic. In this method you draw a tree, with a node for each substitution. In addition, every node has a overhead. This corresponds to the running time of the merging of all children of the node.  Guess-and-Test: In this method you make a guess of what the closed form could be and then try to justify that guess by induction ( An example at page 266 of the book).  Master method: This method contains rules for what you should do in some cases. This cannot be summarized. If you want to study this method go to page 268 of the book. The next two subsections are applications of the master method. They cannot be summarized, because when you want to understand this you have to read the whole text. 5.3 Dynamic Programming In the book is stated that dynamic programming cannot be explained in a few sentences and they give an example. This will be shortly explained. 5.3.1 Matrix Chain-Product The matrix chain-product problem is to find a way of defining product A in such a way that you reduce multiplications. One way to do this is just to try every different product and count the number of multiplications. Of course we want to improve this performance and start with defining subproblems. For example you can first find it out for every pair how many multiplications are needed. Another observation is that every subproblem has a optimal solution, this is called subproblem optimality condition. We can’t divide the problem into more subproblems, because there is a sharing of subproblems. This is why we use dynamic programming instead of divide-andconquer. 5.3.2 The General Technique Dynamic programming is most of the time used for optimization problems. Often the number of ways of solving this problem is exponential, so brute-force isn’t possible. When we do dynamic programming there are three components which have to be taken into account:  Simple Subproblems: If all subproblems have the same structure and there is a simple way to define them  Subproblem Optimally: The subproblems have to optimized to optimize the global problem. The global solution shouldn’t contain any suboptimal subproblems.  Subproblem Overlap: Optimal solutions to unrelated subproblems can contain subproblems in common. Such overlap improves the quality of the efficiency of a d.p. algorithm that stores solutions to subproblems. This subsection does contain another example of d.p. , the knapsack problem. Chapter 6: Graphs 6.1 The Graph ADT A graph is a way to represent connections between objects. In the vertices(nodes) the objects are stored and the connections are represented by edges(arcs). Edges are either directed or undirected. Directed edges can only be traveled in one way, undirected in both. A graph with only directed edges is called directed graph, with only undirected edges is called a undirected graph and with both is called a mixed graph. Two vertices at the end of an edge are called end vertices(or endpoints). If an edge is directed then start point is called origin and endpoint destination. If two vertices are the endpoint of the same edge then they are adjacent. If an vertex is endpoint then it is incident to that edge. If an vertex has his start point at a vertex then that edge is called to outgoing edge of that vertex and incoming edge if it is the other way around. The degree of a vertex is the number of incident edges. In-degree and out-degree are the number of incoming en outgoing edges. A group of edges of a graph is called a collection. When there is more than 1 edge with the same vertices as endpoints then this edges are parallel(or multiple). A self-loop is when a loop is created between two vertices. Graphs who don’t have the last two properties are said to be simple. When you travel from one vertex to another one, your visited edges and vertices is the path. A cycle is a path that has the same start- and endpoint. When a path or cycle is distinct we call it simple. If all edges in a path of cycle are directed then it is called directed path/cycle. A subgraph is a graph whose vertices and edges are a subset of another graph. A spanning subgraph uses all vertices of the other graph. When there is a path between any two vertices then the graph is connected. If a graph isn’t connected, the connected components are the maximal connected subgrahs. A forest Is a graph without cycles. A tree is a connected forest. A spanning tree is a spanning subgraph for trees. 6.1.1 Graph Methods Notation: Graph G; Vertices v, w; Edge e; Object o General methods: - numVertices() Return the number of vertices of G. - numEdges() Return the number of edges of G. - vertices() Return an enumeration of the vertices of G. - edges() Return an enumeration of the edges of G. - avertex() Return a vertex of G. - directedEdges() Return an enumeration of all directed edges in G. - undirectedEdges() Return an enumeration of all undirected edges in G. - incidentEdges(v) Return an enumeration of all edges incident on v. - inIncidentEdges(v) Return an enumeration of all the incoming edges to v. - outIncidentEdges(v) Return an enumeration of all the outgoing edges from v - opposite(v, e) Return an endpoint of e distinct from v - degree(v) Return the degree of v. - inDegree(v) Return the in-degree of v. - outDegree(v) Return the out-degree of v. - adjacentVertices(v) Return an enumeration of the vertices adjacent to v. - inAdjacentVertices(v) Return an enumeration of the vertices adjacent to v along incoming edges. - outAdjacentVertices(v) Return an enumeration of the vertices adjacent to v along outgoing edges. - areAdjacent(v,w) Return whether vertices v and w are adjacent - endVertices(e) Return an array of size 2 storing the end vertices of e. - origin(e) Return the end vertex from which e leaves. - destination(e) Return the end vertex at which e arrives. - isDirected(e) Return true iff e is directed. Update Methods: - makeUndirected(e) Set e to be an undirected edge. - reverseDirection(e) Switch the origin and destination vertices of e. - setDirectionFrom(e, v) Sets the direction of e away from v, one of its end vertices. - setDirectionTo(e, v) Sets the direction of e toward v, one of its end vertices. - insertEdge(v, w, o) Insert and return an undirected edge between v and w, storing o at this position. - insertDirectedEdge(v, w, o) Insert and return a directed edge between v and w, storing o at this position. - insertVertex(o) Insert and return a new (isolated) vertex storing o at this position. - removeEdge(e) Remove edge e. 6.2 Data Structures for Graphs Three most popular ways to realize a graph ADT. 6.2.1 The edge List Structure Two different objects:   Vertex objects: o A reference to object stored o Counters for the number incident undirected edges, incoming and outgoing directed edges o A reference to the position of the vertex-object in container V Edge objects: o A reference to object stored o A Boolean whether it is directed or undirected o Reference to the vertex objects in V to endpoints(or origin and destination) o A reference to the position of the edge-object in container E. The edge list is a very simple implementation, but not very efficient. It looks at edge – vertex only from point of view of the edges. 6.2.2 The Adjacency List Structure   Vertex objects o All mentioned variables in edge list o Incidence container: stores references to the edges incident of vertex Edge objects: o All mentioned variables in edge list o A reference to the positions of the edge-object in incidence container Also relative simple implementation. More efficient then edge-list, because looks from point of view edges en vertices. 6.2.3 The Adjacency Matrix Structure Extends edge list structure with a matrix(2-dimensional array), which allows to determine adjacencies between pairs of vertices in constant time, but uses more space.   Vertex objects o All mentioned variables in edge list o Distinct integer, called index Edge objects: o All mentioned variables in edge list o Have 2-dimensional array A in such a way that if A[i,j], where I and j are index of vertex, exist there is an edge between them. If the edge is undirected, then A[j,i] is stored too. If no edge A[i,j] is null. 6.3 Graph traversal 6.3.1 Depth-First search Done by backtracking. Edges that are not visited yet are called tree edges(or discovery edges), already visited edges are called back edges(or cross edges). The tree edges form a spanning tree, called DFS tree. BFS is better in finding shortest paths. 6.3.2 Biconnected components A separation edge/vertex is an edge whose removal disconnects a graph. When for any pair of vertices are two disjoints paths, then the graph is biconnected. A biconnected graph satisfies one of the following properties:  A subgraph is biconnected and adding this component means it isn’t anymore.  A single edge of G consisting of a separation edge and its endpoints. 6.3.3 Breadth-First search BFS subdivides the vertices into levels. BFS is better in solving difficult connectivity problems 6.4 Directed Graph A digraph is another word for directed graph. The reachability of a graph is where when can get to. A vertex is reachable is reachable if there is a directed path from one vertex to another one. If, any to vectors are reachable, then the graph is strongly connected. When a graph has the same start and endpoint in a digraph, then this is a directed cycle. If a graph doesn’t have any directed cycle, then it is acyclic. The transitive closure of a digraph G is the digraph G* such that the vertices of G* are the same as the vertices of G, and G* has an edge (u,v), whenever G has a directed path from u to v. That is, we define G* by starting with the digraph G and adding in an extra edge (u,v) for each u and v, such that v is reachable from u. 6.4.1 Traversing a Digraph We distinguish 3 kinds of edges:  Back edges: which connect a vertex to the ancestor in the DFS tree  Forward edges: which connect a vertex to a descendent in the DFS tree(not in BFS)  Cross edges: which connect a vertex to a vertex that is neither its ancestors nor its descendent 6.4.2 Transitive closure An algorithm for finding the transitive closure can be found be using dynamic programming. It can be divided in smaller subproblems, like for every vertex, check if there is a edge for a certain vertex. If there is ok, else make an edge between them. 6.4.3 DFS and Garbage collection Once in a while the JVM checks if there is enough space left in the memory heap. If there isn’t the garbage collector start s electing the space used for dead objects. You can do this with a mark-sweep algorithm. In this case the memory heap is viewed is a digraph and uses DFS to find the objects that still live(the object are vertices and the reference are edges) and marks them(mark phase). After that the objects that are not marked will be deleted(sweep phase). 6.4.4 Directed Acyclic graphs Topological ordering is an ordering of vertices such that every edge(vi, vj), i < j. Chapter 9: Text Processing 9.1 Strings and Pattern Matching Algorithms 9.1.1 String Operations A substring is a part of a string. A proper substring is when you want to rule out that a string is a substring of itself. An empty string is called a null string. When a substring contains the first part of the string then it is called the prefix and the last part of the string is called the suffix. 9.1.2 Brute Force Pattern Matching The brute-force pattern matching just test all possible placements of the pattern. It is very simple and runs with a double loop, so O(n2). 9.1.3 The Boyer-Moore Algorithm If we want to improve the brute-force algorithm, we can do that by two time-saving heuristics:  Looking-Glass Heuristic: We begin at the back of the pattern with comparing.  Character-Jump Heuristic: If a mismatch occurs, with a character c, then the pattern is shifted until the next occurrence of c. If c is not in the pattern the pattern is shifted past c. 9.1.4 The Knuth-Morris-Pratt Algortithm(KMP) The KMP algorithm works with a failure function. The main idea is that this function pre-examines the pattern. This function calculates the next position for the pattern to shift to. How this function exactly works isn’t clearly explained in the text. 9.2 Tries A try is a tree-based data structure for pattern matching and prefix matching. The main idea is that that for a given pattern P, the whole tree is searched looking for a string with prefix P. 9.2.1 Standard Tries A standard try has the following properties:  Each node, except the root, contains a character  The ordering of the children is determined by a canonical ordering of the alphabet  An external node is the last letter of the word. The shortest path to the root b will be the string e s i a l r l d u l l y e t l o l c k p 9.2.2 Compressed Tries A compressed try is only advantageous of an auxiliary structure is used. Then the word is represented in an array and references to the words are stored in the nodes. ↓ b e s id ar el l u ll ll to y ck p Here you see an example of such an auxiliary index. The first number in the node represents the array were it refers to, the second and third are for the 0 1 2 3 4 S[0] = S[1] = S[2] = S[3] = s b s s e e e t e a r l l o c k 0 1 2 3 S[4] = S[5] = S[6] = 0 1 2 3 b u l l b u y b i d h e a r b e l l s t o p S[7] = S[8] = S[9] = substring in that array 1, 0, 0 1, 1, 1 1, 2, 3 4, 1, 1 6, 1, 2 8, 2, 3 0, 0, 0 7, 0, 3 4, 2, 3 0, 1, 1 5, 2, 2 0, 2, 2 3, 1, 2 2, 2, 3 3, 3, 4 9, 3, 3 9.2.3 Suffix Tries A suffix try represent all suffixes of a string. Here an example. Suffix try e mize i mi nimize ze nimize nimize ze ze m i n i m i z e 0 1 2 3 4 5 6 7 Compact representation: 7, 7 4, 7 1, 1 2, 7 0, 1 6, 7 2, 7 2, 7 6, 7 6, 7 9.2.4 Search engines A Web crawler is the program that gathers the information from web pages. Search engines make it possible to retrieve that information. An inverted file stores all information of the search engines in a dictionary. The information is stored in pairs, one with the key word and one reference to the web pages containing this word. Key words are called index terms and references to the web pages are called occurrence lists. Of course, a basic task of search engines is also to rank the results, but it is still a major challenge for companies to do this fast and accurate. Chapter 12: Computational Geometry 12.1 Range Trees A range-search query is a quest to retrieve all points in a multi-dimensional collection whose coordinates fall within given ranges. To keep it simple we talk about 2-dimensional range-search queries. They have a method findAllInRange(x1, x2, y1, y2), that returns all the elements in the range of the coordinates. This is called the reporting version. There is also a counting version of that query, which counts the number of elements. 12.1.1 One-Dimensional Range Searching This is done, like explained earlier, with the findAllInRange method. Only there is one coordinate given in the method. Then recur through the range tree. There are 3 possibilties:  Key(v) < k1: recur to the right child of v  K1 < key(v) < k2: report element en recur to both children.  Key(v) < k2: recur to left child In the search we recognize 3 kind of nodes:  Boundary nodes: A node belongs to the paths of searching, but do not belong to the interval.  Inside nodes: All nodes inside the interval.  Outside node: A node belongs to the left child of P1 or the right child of P2. 12.1.2 Two-dimensional Range Searching A two-dimensional tree consists of a primary structure, which is a tree, and a auxiliary structure. The primary structure represents the x-coordinate. Every node stores:  An item, which consists of coordinates and an element.  A one-dimensional tree that has the same items, but uses the y-coordinate as keys. 12.3 Quadtrees and k-D Trees 12.3.1 QuadTrees A main application for quadtrees is a set points in a picture according to an image. square is called a split. A quadtree is defined by recursively doing splits. Dividing the 12.3.2 k-D Trees The difference between a k-D tree and a quadtree is that in a k-D tree a split operation is only done with a single line perpendicular to the axis and with a quadtree you can draw more lines a one split. There are two kinds of k-D trees, region-based and point-based. Region-based is the variant on quadtrees, while point-based perform splits based on distribution of the points.

DOC Summary File

Related documents

Products

Support

DOC Summary File

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib