DOC Summary File

advertisement
Summary – DSA, KE 2008, Year 1
Helpful:
Animations of all search algorithms / ADTs
Author Notes:
General
In essence, the ADTs given in DSA should not be looked on in a case by case comparison.
To reach a solution it is vital to understand that ADTs can be combined, and often more complex
ADTs are only several smaller ADTs combined. An ADT is nothing more than a rhetorical description
of how a object is supposed to work. NOT the description of how it should be implemented.
(Think of it as JAVA abstract classes or interfaces, and per example, a JAVA Vector is a
implementation of the Sequence ADT & Vector ADT.)
1
Table of Contents
HELPFUL: .......................................................................................................................................................... 1
CHAPTER 1 - INTRODUCTION ............................................................................................................................ 3
1.1– PSUEDO CODE ...................................................................................................................................................... 3
1.2– ASYMPTOTIC NOTATION ......................................................................................................................................... 3
1.3- QUICK MATH REVIEW ............................................................................................................................................ 4
CHAPTER 2 – BASIC DATA STRUCTURES............................................................................................................ 5
2.1 – STACKS AND QUEUES ............................................................................................................................................ 5
2.2 – VECTORS, LISTS AND SEQUENCES............................................................................................................................. 7
2.5 DICTIONARIES AND HASH TABLES ............................................................................................................................. 28
CHAPTER 3 – SEARCH TREES AND SKIP LISTS .................................................................................................. 30
3.1 – ORDERED DICTIONARIES AND BINARY SEARCH TREES ................................................................................................ 30
3.2 – AVL TREES ....................................................................................................................................................... 33
3.4 – SPLAY TREES ..................................................................................................................................................... 35
CHAPTER 4 ..................................................................................................................................................... 37
4.1 MERGE-SORT ....................................................................................................................................................... 37
4.2 THE SET ABSTRACT DATA TYPE ................................................................................................................................ 38
4.3 QUICK SORT......................................................................................................................................................... 39
4.4 A LOWER BOUND ON COMPARISON-BASED SORTING .................................................................................................. 41
4.5 BUCKET-SORT AND RADIX-SORT .............................................................................................................................. 41
4.6 COMPARISON OF SORTING ALGORITHMS ................................................................................................................... 42
CHAPTER 5: FUNDAMENTAL TECHNIQUES ..................................................................................................... 44
5.2 DIVIDE-AND-CONQUER .......................................................................................................................................... 44
5.3 DYNAMIC PROGRAMMING ...................................................................................................................................... 44
CHAPTER 6: GRAPHS ...................................................................................................................................... 46
6.1 THE GRAPH ADT .................................................................................................................................................. 46
6.2 DATA STRUCTURES FOR GRAPHS .............................................................................................................................. 47
6.3 GRAPH TRAVERSAL ................................................................................................................................................ 48
6.4 DIRECTED GRAPH.................................................................................................................................................. 49
CHAPTER 9: TEXT PROCESSING ....................................................................................................................... 51
9.1 STRINGS AND PATTERN MATCHING ALGORITHMS........................................................................................................ 51
9.2 TRIES .................................................................................................................................................................. 51
CHAPTER 12: COMPUTATIONAL GEOMETRY .................................................................................................. 54
12.1 RANGE TREES ..................................................................................................................................................... 54
12.3 QUADTREES AND K-D TREES ................................................................................................................................. 54
2
Chapter 1 - Introduction
1.1 – Psuedo Code
Psuedo Code: Code written for a human reader, not a computer.
Structure:

Expressions: Assignment: ← , Comparators: <,>,=, ≤, ≥, ≠

Method Declaration: Algorithm <name> (param1, param2, … )

Decision Structurs: if (condition) then [true-action] else [false-action]

While-loops: while (condition) do [action]

Repeat-loops: repeat [action] until (condition)

For-loops: for (variable-increment-definition) do [action]

Array-Indexing: A[i], ith cell of array A

Method Calls: object.method(args)

Method returns: return (value)
Average Case running time : (Worst Case running time + Best Case running time) / 2
Always count Worst Case running time
1.2 – Asymptotic Notation
Ways to define running time of an algorithm:

Big-Oh: “less than or equal to”
f(n) <= g(n) ; f(n) * c <= g(n) * e
for all n after n, including n
“f(n) is order of g(n)”, “f(n) is big-Oh of g(n)” , “ f(n) is O(g(n))”

Big-Omega: “greater than or equal to”
f(n) >= g(n) ; f(n) * c >= g(n) * e
for all n after n, including n)
“f(n) is big-Omega of g(n)”, “ f(n) is Ω (g(n))”

Big-Theta: “equal”
f(n) <= g(n) ; f(n) * c <= g(n) * e
for all n after n, including n
“f(n) is big-Theta of g(n)” , “ f(n) is Θ (g(n))”
Difference between Big- and Little-Oh:
Big-Oh:
∃ - There exists an constant c or e: f(n)*c is O(g(n)*e)
Little-Oh:
∀ - For all constants c or e: f(n)*c is O(g(n)*e)
Functions by Growth Rate:
log n
log² n
√n
n
n log n
n²
n³
2n
1.3 - Quick Math Review
Log rules
Log b (a) = c if a = b ^ c
1
Log b ac = log b (a) + log b (c)
2
Log b (a/c) = log b (a) – log b (c)
3
Log b (ac) = c log b (a)
4
Log b (a) = (log c (a)) / log c (b)
5
b ^ ( log c (a)) = a(log c (b))
6
(ba)c = b(a*c)
7
(ba) *(bc) = b(a+c)
8
ba / bc = b(a-c)
⌈x ⌉ = largest integer less than or equal to x
⌊x ⌋ = smallest integer greater than or equal to x
Justification techniques
-
Counterexample
-
Contra positive
-
Contradiction
-
Induction
-
Loop invariant
Chapter 2 – Basic Data Structures
2.1 – Stacks and Queues
2.1.1 – Stack
Container of objects that are inserted and removed according to last-in first out (LIFO)
ADT:
push(o):
pop(o):
Insert object o at top of stack
Remove and return last object inserted into stack. Error if stack is empty
size():
isEmpty():
top():
Return number of objects in stack
Return Boolean indicating if stack is empty
Return the last object, without removing it. Error if stack is empty
Additional Information
Stack uses a variable t to keep track of the number of objects within Stack.
Psuedo Code - Stack Array:
Algorithm push(o):
If size() = N then
indicate that stack-full error has occurred.
t ← t+1
S[t] ← o
Algorithm pop():
if isEmpty() then
indicate that a stack-empty error has occurred.
e ← S[t]
S[t] ← null
t ← t-1
return e
2.1.2 - Queue
Container of objects that are inserted and removed according to first-in first-out (FIFO)
Objects enter Queue at the rear, are removed from front
ADT:
enqueue(o):
dequeue ():
Insert object o at the rear of queue
Remove and return object inserted at the front. Error if queue is empty
size():
isEmpty():
front():
Return number of objects in queue
Return Boolean indicating if queue is empty
Return the front object, without removing it. Error if queue is empty
Additional Information
Queue uses 2 variables, f & r, to keep track of the cell storing the first object and the first free cell,
respectively.
N is the number of cells within the Array containing all the objects. (size of array for holding objects)
Queue is empty if f = r.
Psuedo Code - Stack Array:
Algorithm enqueue():
If size() = N then
throw a QueueFullException
Q[r] ← o
r ← (r+1) mod N
Algorithm dequeue (o):
if isEmpty() then
throw a QueueEmptyException
temp ← Q[f]
Q[f] ← null
f ← (f+1) mod N
return temp
2.2 – Vectors, Lists and Sequences
2.2.1 – Vectors
Linear sequence that supports access to its elements according to rank.
ADT:
elemAtRank(r):
replaceAtRank (r,e):
insertAtRank(r,e):
removeAtRank(r):
Return object at rank r. Error if r < 0 or r > n-1
Replace object at rank r with e. Error if r < 0 or r > n-1
Insert e into Vector, and give it rank r. Error if r < 0 or r > n
Remove object at rank r. Error if r < 0 or r > n-1
size():
isEmpty():
Return number of objects in queue
Return Boolean indicating if queue is empty
Additional Information
Vector contains n elements. [0] = first element, [n-1] = last element.
Running times
Method
size()
isEmpty()
elemAtRank(r)
replaceAtRank(r, e)
insertAtRank(r, e)
removeAtRank(r)
Time
O(1)
O(1)
O(1)
O(1)
O(n)
O(n)
Psuedo Code - Vector Array:
Algorithm insertAtRank(r, e):
for i = n-1, n-2, n-3, … , r do
A[i+1] ← A[i]
A[r] ← e
n ← n+1
// Make room for new element
Algorithm removeAtRank (r):
temp ← A[r]
for i = r, r+1, r+2, … , n-2 do
A[i] ← A[i+1]
n ← n-1
return temp
// fill in for the removed element
2.2.2 – Lists
Linear sequence that supports access to its elements according to so called Nodes.
A Node is a container which keeps a reference to the node before and after it.
From now on, a Node will be called a Position.
ADT:
first():
last():
isFirst(p):
isLast(p):
before(p):
after(p):
Return the position of the first element of List. Error if List is empty.
Return the position of the last element of List. Error if List is empty.
Return Boolean indicating if p is first position within list.
Return Boolean indicating if p is last position within list.
Return the position before the position p. Error if p is first.
Return the position after the position p. Error if p is first.
replaceElement(p, e):
swapElements(p, q):
insertFirst(e):
insertLast(e):
insertBefore(p, e):
insertAfter(p, e):
remove(p):
Replace element at p with e, Return element prev. at position p.
Swap elements; move p to q, q to p.
Insert e into List as first element. (not replace)
Insert e into List as last element. (not replace)
Insert e before position p into List.
Insert e after position p into List.
Remove element at position p from List.
Position ADT:
A Position is always defined relative, ie. “after” or “before” another position. Each position contains
the object we want to store at the position.
element():
Return object contained within position
Linked List Implementation
A Linked List is a direct implementation of a List ADT. We also need to update the definition of a
Position. In a singly linked-list, the position stores a reference to the position coming before it,
prev(). In a doubly linked-list, the before AND after position references are stored, prev() & next().
// Note: you can also implement variable wise, instead of method wise, like:
Position.next ← position.prev
instead of
Position.next() ← position.prev()
To simplify matters, special positions called Sentinel positions are stored at the beginning and the
end of the List.
Psuedo Code – Linked List:
Algorithm insertAfter(p, e):
Create a new node v
v.element ← e
v.prev ← p
v.next ← p.next
(p.next).prev ← v
p.next ← v
return v
// Link to predecessor
// Link v to successor
// link p’s old successor to v
// link p to its new successor, v
// the position for the element e
Algorithm remove (p):
t ← p.element
(p.prev).next ← p.next
(p.next).prev ← p.prev
p.prev ← null
p.next ← null
return t
// Temp variable for element
// Link out p
// invalidate position p
2.2.3 Sequences
An ADT that supports both the Vector and List ADT.
ADT:
atRank(r):
rankOf(p):
Return the position of the element with rank r.
Return the rank of the element at position p.
Additional Information
Can be either implemented with an Array or an Doubly Linked List.
An Array implementation takes O(N) space, a DLList takes O(n) space.
(Array is statically sized at a start and always takes N space, a DLList grows)
Running times
Operations
size, isEmpty
atRank, rankOf, elemAtRank
first,last, before, after
replaceElements, swapElements
replaceAtRank
insertAtRank, removeAtRank
insertFirst, insertLast
insertAfter, insertBefore
remove
Array
O(1)
O(1)
O(1)
O(1)
O(1)
O(n)
O(1)
O(n)
O(n)
List
O(1)
O(n)
O(1)
O(1)
O(n)
O(n)
O(1)
O(1)
O(1)
Iterator ADT
An Iterator is a object that can go through a collection of elements, one element at a time.
An Iterator consists of a Sequence and a current Position. (it extends Position ADT)
hasNext():
nextObject():
Test whether there are elements left in the Iterator
Return and remove the next element in the Iterator.
2.3 Trees
2.3.1 Tree ADT
A Tree is a ADT that stores elements hierarchically. Elements are stored in a parent-child
relationship. Every element has zero or more children and one parent. (except the root, which is the
initial element and has no parent)
Children with a same parent are siblings. Nodes are called external or leaves when they have no
children. Internal when they have one or more.
Subtree of a tree at v is a tree consisting of all children of v, with v as the root.
Ancestor is a parent of the node, or a parent’s parent, etc….
Descendent, same, but with children. v is descendent from p, p is an ancestor of v.
Ordered tree is a tree that has a linear ordering for all it’s children. (You know which element is first,
second, third)
Binary tree is a ordered tree where every node has at most two children.
A left child and a right child. Which subsequently give a left subtree and a right subtree.
ADT:
// Accessor Methods
root():
parent(v):
children(v):
Return the root of the tree.
Return the parent of the node v. Error if v is root
Return Iterator containing all children of node v.
// Query Methods
isInternal(v):
isExternal(v):
isRoot(v):
Test whether node v is internal.
Test whether node v is external.
Test whether node v is the root.
// Generic Methods
size():
elements():
positions():
swapElements(v, w):
replaceElement(v, e):
Return the number of nodes
Return the number of elements stored in the nodes
Return Iterator containing all nodes
Swap elements stored at nodes v and w
Replace e and return element @ node v
Additional Information
Depth of a node is the amount of its ancestors, excluding node itself.
Height of a tree is the maximum depth of a external node.
Or; height of node v is 1 + maximum height of a child of v. Maximum height is height(T, root).
Algorithm depth(T, v):
If T.isRoot(v) then
return 0
else
return 1 + depth(T,T.parent(v))
Runs in O(1 + dv), dv: depth of node v in tree T
Algorithm height(T, v):
If T.isExternal(v) then
return 0
else
h=0
for each w ∊ T.children(v) do
h = max( h, height(T, w))
return 1 + h
Runs in O(n), n: number of nodes within tree T. If called on root that is. It goes through all
children of node v on which it is called, thus a complete tree traversal.
Running times
Method
root(), parent()
isInternal(v), isExternal(v),isRoot(v)
children(v)
swapElements(v,w), replaceElement(v,e)
elements(), positions()
Time
O(1)
O(1)
O(cv), cv: number of children
O(1)
O(n), n: nodes in tree
Tree Traversal
Link to tree traversal applet  very good representation of all traversal methods
Preorder
Just traverses the children of the starting note, visiting every node as it comes along.
Gives linear order of nodes where children com after a parent.
Algorithm preorder(T, v):
Perform “visit” action for node v
for each child w of v do
preorder(T, w)
Runs in O(n), same as height(T, v)
traversal
//Whatever you want + set node as “visited”
//recursively traverse subtree at w
Postorder traversal
Will visit a node after it has traversed every descendent of that node.
Used if you need to know the information of all children before you can compute the value of a
parent. Per example: sizes of files in a directory.
Algorithm postorder(T, v):
for each child w of v do
postorder(T, w)
Perform “visit” action for node v
Runs in O(n), same as height(T, v)
//recursively traverse subtree at w
//Whatever you want + set node as “visited”
2.3.3 Binary Trees
Ordered tree in which every internal node has exactly two children.
ADT:
leftChild(v):
rightChild(v):
sibling(v):
Return left child of v. Error if v is external
Return right child of v. Error if v is external
Return sibling node of v. Error if v is root.
If tree is improper binary tree (not every internal node has 2 children), extra errors may occur.
Per example, there may not be a sibling.
Addition Information
Node with same depth d as another node is at the same level.
Number of external nodes is at least h+1 and most 2h
- Number of internal nodes is at least h and most 2 h-1
- Total number of nodes is at least 2h+1 and at most 2h+1-1
- Height is at least log(n+1)-1 and at most (n-1)/2
Inorder traversal
Can be seen as going through the tree “from left to right”. First left child, parent, right child.
Algorithm inorder(T, v):
if v is an internal node then
inorder(T, T.leftChild(v))
perform the “visit” action for node v
if v is an internal node then
inorder(T, T.rightChild(v))
// Go through left subtree
// Mark node “visited
// Go through right subtree
Binary tree adapted Preorder and Postorder:
Algorithm binaryPreorder(T, v):
perform the “visit” action for node v
if v is an internal node then
inorder(T, T.leftChild(v))
inorder(T, T.rightChild(v))
Algorithm binaryPostorder (T, v):
if v is an internal node then
inorder(T, T.leftChild(v))
inorder(T, T.rightChild(v))
perform the “visit” action for node v
// Mark node “visited
// Go through left subtree
// Go through right subtree
// Go through left subtree
// Go through right subtree
// Mark node “visited
Euler tour traversal
An algorithm performing a uniform way of traversing a tree, it will encounter every node three
times: from the left, from below and from the right.
A Preorder traversal is a Euler tour with the “visit” actions performing when you encounter a node
from the left.
A Inorder traversal is a Euler tour with the “visit” actions performing when you encounter a node
from below.
A Postorder traversal is a Euler tour with the “visit” actions performing when you encounter a node
from the right.
Algorithm eulerTour(T,v):
Perform left “visit” action
if v is an internal node then
eulerTour(T, T.leftChild(v)
Perform below “visit” action
if v is an internal node then
eulerTour(T, T.rightChild(v)
Perform below “right” action
Runs in O(n) time: n number of nodes in tree T
// Traverse left subtree
// Traverse right subtree
2.3.4 Data Structures for Representing Trees
Vector-Based Binary Tree structure
Based on the premise that every node has a number. Also known as level numbering.
p(v) is the method that will return the number of the node. The Vector has the size N = Pm+1
Pm being the maximum value of p(v), +1 because the numbers start at 1, not 0.
if v is root: p(v) = 1
if v is left child of u: p(v) = 2p(u)
if v is right child u: p(v)=2p(u)+1
Additional Information
Running times
Method
elements, positions
swapElements, replaceElement
root, parent, children
leftChild, rightChild, sibling
isInternal, isExternal, isRoot
Time
O(n), n: nodes in tree
O(1)
O(1)
O(1)
O(1)
Linked Structure for Binary Tree
Tree in which every node is represented by a position which contains a reference its element, and
the positions of the left child, right child, and the parent.
If node is the root, parent reference is null. If node is external, child references are null.
Size is O(n), because there is a position for every node in the tree.
Additional information
Running times
Method
size, isEmpty
elements, positions
swapElements, replaceElement
root, parent
children(v)
isInternal, isExternal, isRoot
Time
O(1)
O(n), n: nodes in tree
O(1)
O(1)
O(cv), cv: children of node
O(1)
2.4 Priority Queue and Heaps
2.4.1 Priority Queue ADT
A container of elements which gives a comparable key to an element, the moment the element is
inserted into the container.
Keys must follow these comparison rules, i.e. it must follow a total order relation:
- Reflexive property: k ≤ k
- Antisymmetric property: if k1 ≤ k2 and k2 ≤ k1, then k1 = k2
- Transitive property: if k1 ≤ k2 and k2 ≤ k3, then k1 ≤ k3
ADT:
insertItem(k, e):
removeMin():
minElement():
minKey():
Insert an element e with key k into Priority Queue.
Return and remove element with the smallest key within PQ.
Return element with the smallest key within PQ. Error: empty PQ
return the smallest key within PQ. Error: empty PQ
Comparator
An algorithm/ADT which specifies in which way a key is compared. I.E. an object that compares keys.
ADT:
isLess(a, b):
isLessOrEqualTo(a, b):
isEqualTo(a, b):
isGreater(a, b):
isGreaterOrEqualTo(a, b):
isComparable(a):
True if and only if a is less than b.
True if and only if a is less than or equal to b.
True if and only if a and b are equal.
True if and only if a is greater than b.
True if and only if a is greater than or equal to b.
True if and only if a can be compared.
2.4.2 PQ-Sort, Selection-Sort and Insertion-Sort
A sorting problem is a problem in which a container C with n elements need to be sorted in
increasing order, or at least non decreasing if there are ties. All elements should be comparable by a
total order relationship.
PQ-Sort Selection-Sort & Insertion-Sort
A very simple algorithm which accepts a unsorted list, and works with an unsorted list using a Priority
Queue to sort. Its output is a sorted list.
1. All elements are placed in a empty Priority Queue. Giving a key to each element.
2. All elements are extracted in non decreasing order by using removeMin(), putting them back
in C.
If this is implemented using a unsorted sequence, phase 1 takes O(n) and phase 2 takes O(n2)
This is also known as a Selection-Sort, because selection and thus ordering is done in the second
phase.
If this is implemented using a sorted sequence, phase 1 takes O(n2) and phase 2 takes O(n)
This is also known as a Insertion-Sort, because selection and thus ordering is done in the first phase.
The difference is that SS always takes Ω(n2) and IS, in best circumstances takes O(n).
(if the list is in reverse, sorted order)
Algorithm PQ-Sort(C, P):
Input: n-element sequence C, and PQ P to compare element, using total-order relationship.
Output: sequence C sorted by total order relationship
while C is not empty do
//Phase 1
e ← C.removeFirst()
P.insertItem(e, e)
while P is not empty do
e ← P.removeMin()
C.insertLast(e)
// remove element from C
// Key is element itself
//Phase 2
// remove smallest from P
// Add element at end of C
2.4.3 Heap Data Structure
A PQ data structure that is efficient for both insertion and removal. (insertion & selection-sort).
It does this by storing all elements and keys at the internal nodes of a binary tree.
Last node of the tree is the right-most, deepest node of T.
Heap-Order Property: The key stored at v is ≥ key stored at v’s parent.
Minimum key is thus always at root.
Complete Binary Tree: For efficiency reason, we want the lowest height possible.
Every level must have the maximum number of nodes. Level i: 2i nodes
All internal nodes must be left to any external nodes. I.E. will be visited before
external nodes in a inorder traversal
Additional Information
Heap PQ implementation consists of the following:
- Heap: complete binary tree. Implemented using a Vector.
- Last: reference to the last node of T.
- Comp: a comparator that defines a total order relation for the keys. It maintains the
minimum key at the root.
A heap storing n keys has a height h = ⌈log(n+1)⌉
Number of nodes within a heap is at least = 2h-1 and at most = 2h - 1
First empty external node is at key = n+1.
First key = 1 Last key = n
Usually the insertion position, the position at which a new node is added, is key = n+1
After insertion, the new node becomes the last node of the tree.
Up-Heap Bubbling (after insertion)
Restores the Heap-Order Property. It checks if the parent of the new node has a higher key, and if
so, swaps location with the parent. Will continue to do so until it is either at the root, or parent has a
lower key. This process is called Up-Heap Bubbling. Because at maximum it will need to go to the
root, it will need to go at most the height number of steps, thus O(log n) running time. n= tree height
Not a correct binary tree, but
correct Up-Heap-Bubbling.
Down-Heap Bubbling (after removal)
When removing a node using removeMin(), the last node in the tree is taken and set at the root.
We then need to restore Heap-Order Property using Down-Heap Bubbling.
D-H B checks if there exists a child of v that has a smaller key than v. And if so, swaps places with it.
If both children have keys that are smaller, the child with the smallest key is swapped with v. It will
continue swapping until there is no child of v that has a smaller key.
Because at maximum it will need to go to the last node, it will need to go at most the depth of the
last node. Thus O(log n) running time. n= depth of last node
Not a correct binary tree, but
correct Up-Heap-Bubbling.
Running Times
Method
Time
size, isEmpty
O(1)
minElement, minKey
O(1)
insertItem
O(log n)
removeMin
O(log n)
2.4.4 Heap-Sort
If you implement the PQ sorting scheme with a heap, you get an algorithm known as heap-sort.
Which has the following theorem: Heap-Sort sorts a sequence of n in O(n log n) time.
Heap Sort animation
Heap-Sort In Place
An algorithm is said to be running in place if it only uses a constant amount of memory in addition to
the memory required for the objects themselves.
This requires that the sequence to be sorted is implemented as an array. We then use the array to
store the heap, instead of using an external heap. The outline is as following:
1. Logically divide the array into a portion in the front that contains the growing heap and the
rest that contains the elements of the array that have not yet been
dealt with.
o Initially the heap part is empty and the not-yet-dealt-with
part of the array is the entire array.
o At each insertion we remove the left most entry from the
array part and insert it in the heap, growing the heap to
include the memory previously used by the newly inserted
element. The blue line moves down.
o At the end the heap uses all the space. We are making the
optimization discussed before that we only store the
internal nodes of the heap and do not leave the waste the
first (index 0) component of the array used to store the
heap.
2. Do the insertions with a normal heap-sort but change the
comparison so that a maximum element is in the root (i.e., a parent
is no smaller than a child).
3. Now do the removals from the heap, moving the blue line back up.
o The elements removed are in order from big to small.
o This is perfect since we are going to store them starting at
the right of the array since that is the portion of the array
that is made available by the shrinking heap.
Bottom-Up Heap Construction
Heap construction runs in O(n log n) time if n objects are added using
insertItem().
If all elements are given in advance, bottom-up construction can occur in O(n) time.
This construction will construct a complete binary tree with height = log(n+1).
It is called Bottom-Up Heap construction because the algorithm begins with the external nodes and
runs up the tree.
Algorithm BottomUpHeap(S):
Input: A sequence S storing n keys
Output: a heap storing keys in S
If S is empty then
return an empty heap
// Consisting of 1 external node
Remove first key, k from S
Split S into, S1 and S2, each of size (n-1)/2
T1 ← BottomUpHeap(S1)
T2 ← BottumUpHeap(S2)
Create binary tree T with root r storing k, left-subtree T1 and right-subtree T2
Perform a down-heap bubbling from root r of T
// Restore heap order
return T
Bottom and Top construction Test
Locator ADT
In our current setup (Heap Vector based implementation) we have:
- A binary tree represented as a vector.
List of cells, associated with a number which contain a element.
You call the cell number, to retrieve the element.
- Every element is associated with a comparable key, so that it may be sorted according to total
order relationships using the key as comparison.
If the element itself can be compared, then it can be the key. I.E. elements which are numbers.
The problem with our current implementation is that the element and the key do not know which
position/cell they are in.
To overcome this limitation, we implement another ADT, the locator ADT.
The purpose of this ADT is to link key, element and location (cell or position) together.
The locator “attaches” itself to the element, and therefore the key, and is constantly updated with a
reference to the elements’ cell / position when the element changes cell / position.
ADT:
element():
Return the element associated with this locator.
key():
Return the key associated with this locator.
Locater Based PQ Methods:
Logically, we can then extend the methods of the PQ to make use of this functionality.
Priority Queue ADT Update:
min():
insert(k, e):
remove(l):
replaceElement(l,e)
replaceKey(l,k)
Additional Information:
Running times
Return the locator of the element with the smallest key.
Insert new item with element e and key k into PQ and return a
locator referencing to the new item.
Remove from PQ the locator l.
Replace element in locator l with e and return previous element.
Replace key in locator l with k and return previous key.
Operations
size, isEmpty, key, replaceElement
minElement, min, minKey
insertItem, insert
removeMin
remove
replaceKey
Unsorted
Sequence
O(1)
O(n)
O(1)
O(n)
O(1)
O(1)
Sorted
Sequence
O(1)
O(1)
O(n)
O(1)
O(1)
O(n)
Heap
O(1)
O(1)
O(log n)
O(log n)
O(log n)
O(log n)
2.5 Dictionaries and Hash Tables
2.5.1 The Unordered Dictionary ADT
A dictionary is a ADT which stores elements and keys in pairs in objects called items.
In general, dictionaries are allowed to store multiple elements under one key.
ADT:
findElement(k):
insertItem(k,e):
removeElement(k):
return element associated with key k. return NO_SUCH_KEY
element if such a element does not exist.
Insert item with key k and element e into dictionary.
Remove item which has key k from dictionary and return.
return NO_SUCH_KEY element if element does not exist.
Additional Information:
The special element NO_SUCH_KEY is called a sentinel.
An implementation of a dictionary with a unsorted sequence is often called a log file, or audit trail.
It is used to store small amounts of information which are unlikely to change over time.
This implementation is often called a unordered sequence implementation. Space usage is: Θ(n).
Chapter 3 – Search Trees and Skip Lists
3.1 – Ordered Dictionaries and Binary Search Trees
Dictionary: Searchable collection of key-element items. For example, an address book.
Operations are as follows (As defined in section 2.5.1):
findElement(k): if the dictionary has an item with key k, returns its element, else, returns the
special element NO_SUCH_KEY
insertItem(k, o): inserts item (k, o) into the dictionary
removeElement(k): if the dictionary has an item with key k, removes it from the dictionary
and returns its element, else returns the special element NO_SUCH_KEY
New operations are:
closestKeyBefore(k): Return the key of the item with the largest key less than or equal to k.
closestElemBefore(k): Return the element for the item with largest key less than or equal to
k.
closestKeyAfter(k): Return the key of the item with smallest key greater than or equal to k.
closestElemAfter(k): Return the element for the item with smallest key greater than or equal
to k.
Each of these methods returns the special NO_SUCH_KEY object if no item in the dictionary is
present.
3.1.1 Sorted Tables
A lookup table is a ordered dictionary implemented with a sorted sequence
store the items of the dictionary in an array-based sequence, sorted by key.
It is one way of implementing a dictionary.
Performance:
findElement O(log n) (using binary search)
insertItem
O(n) (shifts)
removeElement
O(n) (shifts)
3.1.2 Binary Search Tree
As we use a lookup table, which is array-based, we can use binary search tree for the
searching algorithm.
Below is the pseudo-code of a Binary Search Algorithm:
Algorithm BinarySearch(S, k, low, high):
if low > high then
return NO_SUCH_KEY
else
mid ← [ (low + high) / 2]
if k = key (mid) then
return elem(mid)
else if k < key(mid) then
return BinarySearch(S, k, low, mid – 1)
else
return BinarySearch(S, k, mid + 1, high)
Method
Log File
Lookup Table
findElement
O(n)
O(log n)
insertItem
O(1)
O(n)
removeElement
O(n)
O(n)
closestKeyBefore
O(n)
O(log n)
Denoted the number of items in the dictionary at the time a method is executed with n.
Comparison of Log File and Lookup Table, when using an ordered
dictionary.
3.1.3 Searching in a Binary Search Tree
T = tree
k = search key
v = node
To search for a key k:
Algorithm findElement(k, v)
if T.isExternal (v)
return NO_SUCH_KEY
if k < key(v)
return findElement(k, T.leftChild(v))
else if k = key(v)
return element(v)
else if k > key(v)
return findElement(k, T.rightChild(v))
3.1.4 Insertion in a Binary Search Tree
To perform operation insertItem(k, o):
1) Search for key k.
2) If k is not already in the tree, let w be the leaf reached by the search.
3) Insert k at node w and expand w into an internal node.
3.1.5 Removal in a Binary Search Tree
To perform operation removeElement(k):
1) Search for key k.
2) If k is in the tree, let v be node storing k.
3) If v has a leaf child w, remove v and w.
4) If v has no leaf child, same as children of v are both internal, find the
internal node
w that follows v (inorder), copy the key(w) into v, and remove remove w and the left child z (This
child is a leaf).
3.1.6 Performance in a Binary Search Tree
A binary search tree with a certain height for a certain number of key-element items uses O(n)
space and executes the dictionary ADT operations with following running times:
h = height of tree.
n = number of items.
s = size of the iterators returned.
Method
Time
Size, isEmpty
O(1)
findElement, insertItem, removeElement
O(h)
findAllElements, removeAllElements
O(h + s)
3.2 – AVL Trees
An AVL Tree is a self-balancing binary search tree. The reason for this tree is that we want to achieve
logarithmic time for all the fundamental dictionary operations.
AVL Trees follow the height-balance property: for every internal node v of T, the heights of the
children of v can differ by at most 1.
A subtree of an AVL tree is an AVL tree itself.
The height of an AVL tree T storing n items is O(log n).
3.2.1 Update Operations
Insertion
Procedure is in principle the same as the insertItem operator in a binary tree.
However, after the insertion, the tree may become unbalanced. Hence we need to apply Trinode
Restructuring (Explained below).
Removal
Procedure is in principle the same as the removeElement operator in a binary tree.
However, after the removal, the tree may become unbalanced. Hence we need to apply Trinode
Restructuring (Explained below).
Trinode Restructuring
Algorithm trinodeRestructuring
1) Let (a,b,c) be a left-to-right (inorder) listing of the nodes x, y, and z, and
let (T0, T1, T2,
T3)
be a left-to-right (inorder) listing of the four subtrees of
x, y and z not rooted at x, y,
or z.
2) Replace the subtree rooted at z with a new subtree rooted at b.
3) Let a be the left child of b and let T0 and T1 be the left and right
subtrees of a.
4) Let c be the right child of b and let T2 and T3 by the left and right
subtrees of c.
Might want to look at Figure 3.14 (page 154) for an example.
3.2.2 Performance
Method
Time
Single Restructure (using linked-structure binary tree)
O(1)
Find
- Height of tree (no restructures needed)
O(log n)
Insert
- Initial Find
- 1 Restructure
O(log n)
Remove
- Initial Find
- Restructuring up the tree, maintaining heights
O(log n)
3.4 – Splay Trees
A splay tree is a self-balancing binary search tree with the additional property that recently accessed
elements are quick to access again. It performs basic operations such as insertion,
look-up and
removal in O(log(n)) amortized time. For many non-uniform sequences of
operations, splay
trees perform better than other search trees, even when the specific pattern of
the sequence
is unknown. It is conceptually different from AVL trees, as it does not have any explicit rules to
enforce it's balance.
Two things to remember:
- Tree might get more unbalanced
- Splaying costs O(h), where h is height of the tree – which is still O(n) worst-case
O(h) rotations, each of which is O(1)
3.4.1 Splaying
Each particular step depends on three factors:
- Whether x is the left or right child of its parent node, p,
- Whether p is the root or not, and if not
- Whether p is the left or right child of its parent, g (the grandparent of x).
Zig Step:
This step is done when p is the root. The tree is rotated on the edge between x and p. Zig steps exist
to deal with the parity issue and will be done only as the last step in a splay operation and only when
x has odd depth at the beginning of the operation.
Zig-zig Step:
This step is done when p is not the root and x and p are either both right children or are both left
children. The picture below shows the case where x and p are both left children. The tree is rotated
on the edge joining p with its parent g, then rotated on the edge joining x with p.
Zig-zag Step:
This step is done when p is not the root and x is a right child and p is a left child or vice versa. The
tree is rotated on the edge between x and p, then rotated on the edge between x and its new parent
g.
3.4.2 Amortized Analysis of Splaying
Amortization is worst-case analysis on all possible series of operations. The "amortized running time"
of the operations is the average worst-case running time of the operations in the series
Amortization gives "average case" analysis, without using probabilities
It is done in an accounting way, by assigning “cyber euros” to operations. The main conclusions are
listed below. For in depth prove, look at lecture slide 5a.
- Cost of insertion and deletion is also O(log n).
- Cost of a series of m operations on a splay tree is O(m log n).
- Thus, amortized cost of any splay operation is O(log n).
- When items are accessed often, the amortized cost can decrease to O(1) (Theorem 3.11).
Chapter 4
4.1 Merge-Sort
4.1.1 Divide-and-Conquer
Merge Sort is based on Divide and conquer.
The 3 steps of divide and conquer:
Divide:
Recur:
Conquer:
if number of objects > treshold => divide. (if n = 0 or 1, return object
immediately)
Recursively solve the subproblems.
”Merge” the sub-solutions into a solution to the original problem.
Ceiling: ⌈x⌉ (smallest int
n)
Floor: ⌊x⌋ (largest int k)
Theorem: The merge-sort tree associated with an execution of merge-sort on a sequence of
size n has height ⌈log n⌉.
Merge
Two sorter sequences,S1 and S2, merged by iteratively removing a smallest element from
one of these two and adding it to the end of the output sequence, S, until one of these two
sequences is empty, at which point we copy the remainder of the other sequence to the
output sequence. (fig p 222)
Running time
Running time per level = O(n) (the divide part and the conquer part are linear)
Running time per level * Number of levels = total running time.
O(n) * O(log n) = O(n log n)
4.1.2 Merge-Sort and Recurrence Equations
Another way to find the running time of the merge-sort algorithm, you can find it at page
224 (I can't write it shorter).
4.2 The Set Abstract Data Type
Here we introduce the ADT set. A set is a container of distinct objects. That is, there are no
duplicate elements in a set, and there is no order.
Sets and some of their uses
First we recall:
Union:
Intersection:
Subtractions:
A⋃B = {x:x∈ A or x ∈ B},
A⋂B = { x:x∈ A and x ∈ B},
A–B = { x:x∈ A and x ∉ B}.
These calculations are used if you for example enter 2 query words, then the intersection
has to be computed.
ADT:
union(B):
intersection(B):
subtract(B):
Replace A with (←) A⋃B.
Replace A with (←) A⋂B.
Replace A with (←) A–B.
4.2.1 A Simple Set Implementation
A generic version of the merge algorithm takes two sequences representing the input set,
and constructs a sequence representing the output set, be it the union, intersection, or
subtraction of the output sets.
The generic algorithm iteratively examines and compares the current elements a and b of
the input sequences (A and B) and finds out whether a < b, a = b, a > b.
if:
a < b: a goes to the output sequence, and the next element is evaluated
a = b: a goes to the output sequence, and the next element is evaluated
a > b: b goes to the output sequence, and the next element is evaluated
Running Times
At each iteration:
-Compare 2 items of two input sets (A and B) - O(1)
-Possibly copy an element to the output sequence -
O(1)
-Advance to the next element
=> O(na + nb) = O(n)
Theorem: The set ADT can be implemented with an ordered sequence and a generic merge
scheme that supports operations union, intersection and subtraction in O(n) time, where n
denotes the sum of sizes of the sets involved.
4.3 Quick Sort
Three steps:
-Divide: if S is larger then 1, take a specific element of S,
(in practice we take the last element)
which we call pivot (x).
Make three subsequences:
-L, all elements < than x,
-E, all elements = to x,
-G, all elements > than x.
-Recur: Recursively sort L and G.
-Conquer: Merge L, E, G together.
Like the merge-sort we can make a binary tree. But unlike merge-sort, the tree height can
be linear (worst case). This happens when the tree is already sorted.
(x will be the biggest number then)
Running time
- At each level, all elements have to be compared - O(n)
- The height of the tree = n (in worst case) - O(n)
=> (O(n) * O(n) = O(n²)
In the best case we get a merge-sort like tree. This means that L and G are equal, which
results in a tree height of log n which then again results in O(n log n).
4.3.1 Randomized Quick-Sort
Instead of taking the last element of the sequence, we take some random number's
average. Now, using a probability theory, the average pivot taken will be the average one of
the whole sequence.
This means again that the height will be O(log n) which again results in O(n log n).
4.4 A Lower Bound on Comparison-Based Sorting
Theorem: The running time of any comparison based algorithm for sorting an n-element
sequence is Ω(n log n) in the worst case.
The Running Time of a comparison based algorithm must be equal or greater than the
height of the tree.
The height cannot be smaller than log n because you have to compare every element at
least once.
4.5 Bucket-Sort and Radix-Sort
These algorithms are faster than O(n log n) but they require special assumptions about the
input sequence to be sorted. Even so, such scenarios often arise in practice.
In this section we consider the problem of sorting a sequence of items, each a key-element
pair.
4.5.1 Bucket-Sort
The special assumption is that each element has a key, these keys have a range [0, N-1].
So we have sequence S with integer key's [0, N-1].
Now we create a second array, say B (bucket) which has a size of N.
We then place all the elements from S into B, but we place them at B[key] = element.
Now we take all elements one by one from B to place them in S.
(this is necessary because a list doesn't always has as much elements as the largest key says).
Stable sorting
Suppose you have 2 items with the same key; they will have a specific order in the original
array.
Stable sorting means that they will have the same order after sorting.
(And after each subsequent sorting. Elements do not move around if you sort the same
sequence twice.)
4.5.1 Radix-Sort
Radix-sort is used for items with 2 keys.
Example S = ((3,3),(1,5),(2,5)...)
Radix-sort is actually just bucket sort, but you do it twice. (two key's)
In a (k1, l1) < (k2, l2) situation, total-order relation is defined as such:
- k1 < k2 or
- k1 = k2 and l1 < l2
The order is important, suppose you want a lexicographical* ordered list, and you first do
the first one, and then the second one you would get a wrong order. (examples at page
243). *) lexicographical = dictionary
Running time
O(d(n+N))
4.6 Comparison of Sorting Algorithms
-Insertion-Sort:
If implemented well, running time of O(n+k) (k = number of
inversions)
Good for small sequences (less than 50). Also quite effective for almost ordered
sequences.
But the O(n²) makes it a poor choice
-Merge-Sort:
running time:
O(n log n) in worst case (optimal for comparison based algorithms)
good for large sequences because you can store parts on different places (if main
memory is too small).
-Quick-Sort:
Good choice if it fits on main memory, but the running time of O(n²)
makes it less
attractive in real time applications where we must make guarantees on the time needed.
-Heap-Sort:
So if your memory is big enough, and you need to finish on time, heap sort is an
excellent choice. It has a running time of O(n log n)
and it can easily be made to
execute in-place.
-Bucket-Sort or Radix-Sort:
Excellent choice for it runs in O(d(n + N)), where [0, N-1] is the range of the key's
(and d = 1 for bucket sort). Thus d(n + N) is “below” n log n (formally it is equal), then this
algorithm should
run faster than even quick-sort or heap-sort.
Chapter 5: Fundamental Techniques
5.2 Divide-and-Conquer
Divide-and-Conquer is a technique that involves solving a problem by dividing it into smaller
subproblems, solve each subproblem and merge these solutions into one solution.
5.2.1 Divide-and-Conquer Recurrence Equations
With a recurrence equation we determine the run-time of an algorithm, with variable the input of
size n. The problems is that in recurrence equation the original function T is still in the right hand side
of the equation and we want this only dependent of n. We call this the closed form equation. There
are some general ways for solving such an equation in divide-and-conquer algorithms.

Iterative substitution: Here you try to substitute the function T a couple of times and hope
that we will see a pattern so it can be translated into a closed form equation.

Recursion tree: This technique is almost the same as iterative substitution. The only
difference is that the recursion tree is more visual, while the iterative sub. is more algebraic.
In this method you draw a tree, with a node for each substitution. In addition, every node
has a overhead. This corresponds to the running time of the merging of all children of the
node.

Guess-and-Test: In this method you make a guess of what the closed form could be and then
try to justify that guess by induction ( An example at page 266 of the book).

Master method: This method contains rules for what you should do in some cases. This
cannot be summarized. If you want to study this method go to page 268 of the book.
The next two subsections are applications of the master method. They cannot be summarized,
because when you want to understand this you have to read the whole text.
5.3 Dynamic Programming
In the book is stated that dynamic programming cannot be explained in a few sentences and they
give an example. This will be shortly explained.
5.3.1 Matrix Chain-Product
The matrix chain-product problem is to find a way of defining product A in such a way that you
reduce multiplications. One way to do this is just to try every different product and count the
number of multiplications. Of course we want to improve this performance and start with defining
subproblems. For example you can first find it out for every pair how many multiplications are
needed. Another observation is that every subproblem has a optimal solution, this is called
subproblem optimality condition. We can’t divide the problem into more subproblems, because
there is a sharing of subproblems. This is why we use dynamic programming instead of divide-andconquer.
5.3.2 The General Technique
Dynamic programming is most of the time used for optimization problems. Often the number of
ways of solving this problem is exponential, so brute-force isn’t possible. When we do dynamic
programming there are three components which have to be taken into account:

Simple Subproblems: If all subproblems have the same structure and there is a simple way
to define them

Subproblem Optimally: The subproblems have to optimized to optimize the global problem.
The global solution shouldn’t contain any suboptimal subproblems.

Subproblem Overlap: Optimal solutions to unrelated subproblems can contain subproblems
in common. Such overlap improves the quality of the efficiency of a d.p. algorithm that stores
solutions to subproblems.
This subsection does contain another example of d.p. , the knapsack problem.
Chapter 6: Graphs
6.1 The Graph ADT
A graph is a way to represent connections between objects. In the vertices(nodes) the objects are
stored and the connections are represented by edges(arcs). Edges are either directed or undirected.
Directed edges can only be traveled in one way, undirected in both. A graph with only directed edges
is called directed graph, with only undirected edges is called a undirected graph and with both is
called a mixed graph. Two vertices at the end of an edge are called end vertices(or endpoints). If an
edge is directed then start point is called origin and endpoint destination. If two vertices are the
endpoint of the same edge then they are adjacent. If an vertex is endpoint then it is incident to that
edge. If an vertex has his start point at a vertex then that edge is called to outgoing edge of that
vertex and incoming edge if it is the other way around. The degree of a vertex is the number of
incident edges. In-degree and out-degree are the number of incoming en outgoing edges. A group of
edges of a graph is called a collection. When there is more than 1 edge with the same vertices as
endpoints then this edges are parallel(or multiple). A self-loop is when a loop is created between
two vertices. Graphs who don’t have the last two properties are said to be simple. When you travel
from one vertex to another one, your visited edges and vertices is the path. A cycle is a path that has
the same start- and endpoint. When a path or cycle is distinct we call it simple. If all edges in a path
of cycle are directed then it is called directed path/cycle. A subgraph is a graph whose vertices and
edges are a subset of another graph. A spanning subgraph uses all vertices of the other graph. When
there is a path between any two vertices then the graph is connected. If a graph isn’t connected, the
connected components are the maximal connected subgrahs. A forest Is a graph without cycles. A
tree is a connected forest. A spanning tree is a spanning subgraph for trees.
6.1.1 Graph Methods
Notation: Graph G; Vertices v, w; Edge e; Object o
General methods:
- numVertices()
Return the number of vertices of G.
- numEdges()
Return the number of edges of G.
- vertices()
Return an enumeration of the vertices of G.
- edges()
Return an enumeration of the edges of G.
- avertex()
Return a vertex of G.
- directedEdges()
Return an enumeration of all directed edges in G.
- undirectedEdges()
Return an enumeration of all undirected edges in G.
- incidentEdges(v)
Return an enumeration of all edges incident on v.
- inIncidentEdges(v)
Return an enumeration of all the incoming edges to v.
- outIncidentEdges(v) Return an enumeration of all the outgoing edges from v
- opposite(v, e)
Return an endpoint of e distinct from v
- degree(v)
Return the degree of v.
- inDegree(v)
Return the in-degree of v.
- outDegree(v)
Return the out-degree of v.
- adjacentVertices(v) Return an enumeration of the vertices adjacent to v.
- inAdjacentVertices(v) Return an enumeration of the vertices adjacent to v along incoming edges.
- outAdjacentVertices(v) Return an enumeration of the vertices adjacent to v along outgoing edges.
- areAdjacent(v,w)
Return whether vertices v and w are adjacent
- endVertices(e)
Return an array of size 2 storing the end vertices of e.
- origin(e)
Return the end vertex from which e leaves.
- destination(e)
Return the end vertex at which e arrives.
- isDirected(e)
Return true iff e is directed.
Update Methods:
- makeUndirected(e) Set e to be an undirected edge.
- reverseDirection(e) Switch the origin and destination vertices of e.
- setDirectionFrom(e, v) Sets the direction of e away from v, one of its end vertices.
- setDirectionTo(e, v) Sets the direction of e toward v, one of its end vertices.
- insertEdge(v, w, o)
Insert and return an undirected edge between v and w, storing o at this position.
- insertDirectedEdge(v, w, o) Insert and return a directed edge between v and w, storing o at this position.
- insertVertex(o)
Insert and return a new (isolated) vertex storing o at this position.
- removeEdge(e)
Remove edge e.
6.2 Data Structures for Graphs
Three most popular ways to realize a graph ADT.
6.2.1 The edge List Structure
Two different objects:


Vertex objects:
o
A reference to object stored
o
Counters for the number incident undirected edges, incoming and outgoing directed
edges
o
A reference to the position of the vertex-object in container V
Edge objects:
o
A reference to object stored
o
A Boolean whether it is directed or undirected
o
Reference to the vertex objects in V to endpoints(or origin and destination)
o
A reference to the position of the edge-object in container E.
The edge list is a very simple implementation, but not very efficient. It looks at edge – vertex only
from point of view of the edges.
6.2.2 The Adjacency List Structure


Vertex objects
o
All mentioned variables in edge list
o
Incidence container: stores references to the edges incident of vertex
Edge objects:
o
All mentioned variables in edge list
o
A reference to the positions of the edge-object in incidence container
Also relative simple implementation. More efficient then edge-list, because looks from point of view
edges en vertices.
6.2.3 The Adjacency Matrix Structure
Extends edge list structure with a matrix(2-dimensional array), which allows to determine
adjacencies between pairs of vertices in constant time, but uses more space.


Vertex objects
o
All mentioned variables in edge list
o
Distinct integer, called index
Edge objects:
o
All mentioned variables in edge list
o
Have 2-dimensional array A in such a way that if A[i,j], where I and j are index of
vertex, exist there is an edge between them. If the edge is undirected, then A[j,i] is
stored too. If no edge A[i,j] is null.
6.3 Graph traversal
6.3.1 Depth-First search
Done by backtracking. Edges that are not visited yet are called tree edges(or discovery edges),
already visited edges are called back edges(or cross edges). The tree edges form a spanning tree,
called DFS tree.
BFS is better in finding shortest paths.
6.3.2 Biconnected components
A separation edge/vertex is an edge whose removal disconnects a graph. When for any pair of
vertices are two disjoints paths, then the graph is biconnected. A biconnected graph satisfies one of
the following properties:

A subgraph is biconnected and adding this component means it isn’t anymore.

A single edge of G consisting of a separation edge and its endpoints.
6.3.3 Breadth-First search
BFS subdivides the vertices into levels. BFS is better in solving difficult connectivity problems
6.4 Directed Graph
A digraph is another word for directed graph. The reachability of a graph is where when can get to.
A vertex is reachable is reachable if there is a directed path from one vertex to another one. If, any to
vectors are reachable, then the graph is strongly connected. When a graph has the same start and
endpoint in a digraph, then this is a directed cycle. If a graph doesn’t have any directed cycle, then it
is acyclic. The transitive closure of a digraph G is the digraph G* such that the vertices of G* are the
same as the vertices of G, and G* has an edge (u,v), whenever G has a directed path from u to v. That
is, we define G* by starting with the digraph G and adding in an extra edge (u,v) for each u and v,
such that v is reachable from u.
6.4.1 Traversing a Digraph
We distinguish 3 kinds of edges:

Back edges:
which connect a vertex to the ancestor in the DFS tree

Forward edges:
which connect a vertex to a descendent in the DFS tree(not in BFS)

Cross edges:
which connect a vertex to a vertex that is neither its ancestors nor its descendent
6.4.2 Transitive closure
An algorithm for finding the transitive closure can be found be using dynamic programming. It can be divided in
smaller subproblems, like for every vertex, check if there is a edge for a certain vertex. If there is ok, else make an
edge between them.
6.4.3 DFS and Garbage collection
Once in a while the JVM checks if there is enough space left in the memory heap. If there isn’t the garbage
collector start s electing the space used for dead objects. You can do this with a mark-sweep algorithm. In this
case the memory heap is viewed is a digraph and uses DFS to find the objects that still live(the object are vertices
and the reference are edges) and marks them(mark phase). After that the objects that are not marked will be
deleted(sweep phase).
6.4.4 Directed Acyclic graphs
Topological ordering is an ordering of vertices such that every edge(vi, vj), i < j.
Chapter 9: Text Processing
9.1 Strings and Pattern Matching Algorithms
9.1.1 String Operations
A substring is a part of a string. A proper substring is when you want to rule out that a string is a
substring of itself. An empty string is called a null string. When a substring contains the first part of
the string then it is called the prefix and the last part of the string is called the suffix.
9.1.2 Brute Force Pattern Matching
The brute-force pattern matching just test all possible placements of the pattern. It is very simple
and runs with a double loop, so O(n2).
9.1.3 The Boyer-Moore Algorithm
If we want to improve the brute-force algorithm, we can do that by two time-saving heuristics:

Looking-Glass Heuristic: We begin at the back of the pattern with comparing.

Character-Jump Heuristic: If a mismatch occurs, with a character c, then the pattern is
shifted until the next occurrence of c. If c is not in the pattern the pattern is shifted past c.
9.1.4 The Knuth-Morris-Pratt Algortithm(KMP)
The KMP algorithm works with a failure function. The main idea is that this function pre-examines
the pattern. This function calculates the next position for the pattern to shift to. How this function
exactly works isn’t clearly explained in the text.
9.2 Tries
A try is a tree-based data structure for pattern matching and prefix matching. The main idea is that
that for a given pattern P, the whole tree is searched looking for a string with prefix P.
9.2.1 Standard Tries
A standard try has the following properties:
 Each node, except the root, contains a character
 The ordering of the children is determined
by a canonical ordering of the alphabet
 An external node is the last letter of the
word. The shortest path to the root
b
will be the string
e
s
i
a
l
r
l
d
u
l
l
y
e
t
l
o
l
c
k
p
9.2.2 Compressed Tries
A compressed try is only advantageous of an auxiliary
structure is used. Then the word is represented in an array and
references to the words are stored in the nodes.
↓
b
e
s
id
ar
el
l
u
ll
ll
to
y
ck
p
Here you see an example of such an auxiliary index.
The first number in the node represents the array
were it refers to, the second and third are for the
0 1 2 3 4
S[0] =
S[1] =
S[2] =
S[3] =
s
b
s
s
e
e
e
t
e
a r
l l
o c k
0 1 2 3
S[4] =
S[5] =
S[6] =
0 1 2 3
b u l l
b u y
b i d
h e a r
b e l l
s t o p
S[7] =
S[8] =
S[9] =
substring in that array
1, 0, 0
1, 1, 1
1, 2, 3
4, 1, 1
6, 1, 2
8, 2, 3
0, 0, 0
7, 0, 3
4, 2, 3
0, 1, 1
5, 2, 2
0, 2, 2
3, 1, 2
2, 2, 3
3, 3, 4
9, 3, 3
9.2.3 Suffix Tries
A suffix try represent all suffixes of a string. Here an example.
Suffix try
e
mize
i
mi
nimize
ze
nimize
nimize
ze
ze
m i n i m i z e
0 1 2 3 4 5 6 7
Compact representation:
7, 7
4, 7
1, 1
2, 7
0, 1
6, 7
2, 7
2, 7
6, 7
6, 7
9.2.4 Search engines
A Web crawler is the program that gathers the information from web pages. Search engines make it
possible to retrieve that information. An inverted file stores all information of the search engines in a
dictionary. The information is stored in pairs, one with the key word and one reference to the web
pages containing this word. Key words are called index terms and references to the web pages are
called occurrence lists. Of course, a basic task of search engines is also to rank the results, but it is
still a major challenge for companies to do this fast and accurate.
Chapter 12: Computational Geometry
12.1 Range Trees
A range-search query is a quest to retrieve all points in a multi-dimensional collection whose
coordinates fall within given ranges. To keep it simple we talk about 2-dimensional range-search
queries. They have a method findAllInRange(x1, x2, y1, y2), that returns all the elements in the range
of the coordinates. This is called the reporting version. There is also a counting version of that query,
which counts the number of elements.
12.1.1 One-Dimensional Range Searching
This is done, like explained earlier, with the findAllInRange method. Only there is one coordinate
given in the method. Then recur through the range tree. There are 3 possibilties:

Key(v) < k1:
recur to the right child of v

K1 < key(v) < k2:
report element en recur to both children.

Key(v) < k2:
recur to left child
In the search we recognize 3 kind of nodes:

Boundary nodes:
A node belongs to the paths of searching, but do not
belong to the interval.

Inside nodes:
All nodes inside the interval.

Outside node:
A node belongs to the left child of P1 or the right child of P2.
12.1.2 Two-dimensional Range Searching
A two-dimensional tree consists of a primary structure, which is a tree, and a auxiliary structure. The
primary structure represents the x-coordinate. Every node stores:

An item, which consists of coordinates and an element.

A one-dimensional tree that has the same items, but uses the y-coordinate as keys.
12.3 Quadtrees and k-D Trees
12.3.1 QuadTrees
A main application for quadtrees is a set
points in a picture according to an image.
square is called a split. A quadtree is defined by
recursively doing splits.
Dividing the
12.3.2 k-D Trees
The
difference between a k-D tree and a quadtree is
that in a k-D tree a split operation is only done with a single line perpendicular to the axis and with a
quadtree you can draw more lines a one split. There are two kinds of k-D trees, region-based and
point-based. Region-based is the variant on quadtrees, while point-based perform splits based on
distribution of the points.
Download