Recurrence Relations

Data Types and Data Structures
• Data Types
– Containers
– Dictionaries
– Priority Queue
• Data Structures
– Hash Tables
– Binary Search Trees
Data types & structures
There are numerous options for data structures for
many commonly used abstract data types:
Changing data structures should not change the
correctness of a program, but it can have a dramatic
effect on the speed.
Choosing a Data Structure
It is important to choose the proper data structure when
you first design an algorithm.
There are many data structures that can handle common
operations: insertion, deletion, sorting, searching, finding
the maximum or minimum, predecessor or successor, etc.
Different data structure will each take their own time for
the different operations.
• Building an algorithm around a properly chosen data
structure leads to both a clean algorithm and good
• Using an incorrect data structure can be disastrous, but you
don’t always need the best structure.
• Sorting is at the heart of many good algorithms.
• Common algorithm design paradigms include divide-andconquer, randomization, incremental construction, and
dynamic programming.
Fundamental Data Types
An abstract data type is a collection of well-defined
operations that can be performed on a particular
Different data structures make different tradeoffs that
make certain operations (say, insertion ) faster at the cost
of others (say, searching.) Often there will be other
considerations that will make one structure more
desirable over others.
• Hold data for later retrieval
• Operations:
– Insert(item)
– Retrieve(); typically removing item from container
• Simple data structures for implementing containers
– Stack: LIFO
– Queue: FIFO
– Table: retrieve by index
• Implementation
– Linked list or array
• Dictionaries are a form of container that permits access to data items
by content (key).
• Operations:
– Insert(key)
– delete(pointer to item)
– search(key)
• Linked list implementation (no sorting)
– Insert:
– Delete:
– Search
• Sorted array implementation
– Insert:
– Delete:
– Search:
Priority Queues
Insert(x) : Given an item x, insert it into the
priority Queue.
Find-Maximum( ) : Return the item with the
maximal priority.
Delete-Maximum( ) : Remove the item from the
queue whose key is maximum.
Data Structures
• Ways to implement data types
Linked lists
Arrays with auxilary data
Hash table
Binary search tree
Others, of course
Hash Tables
• Maintain an array to hold your items
• “Hash” the key to determine the index the
specific item should be stored at
• Good hash functions
• Methods for dealing with collisions
– Chaining
• Universal hash functions
– Open addressing
Direct-address hash table
• Assumptions
– Universe of keys is small (size m)
– Set of keys can be mapped to {0, 1, …, m-1}
– No elements have the same key
• Use an array of size m
– Array contents can be pointer to element
– Array can directly store element
Hash Functions
• Problem with direct-addressed tables
– Universe of possible keys U is too large
– Set of keys used K may be much smaller
• Hash function
– Use an array of size Q(m)
– Use function h(k) = x to determine slot x
– h: U  {0, 1, …, m-1}
• Collision
– When h(k1) = h(k2)
Good Hash Functions
• Each key is equally likely to hash to any of the m
slots independently of where any other key has
hashed to
• Difficult to achieve as this requires knowledge of
distribution of keys
• Good characteristics
– Must be able to evaluate quickly
– May want keys that are “close” to map to slots that are
far apart
Hashing by Height
Collisions unavoidable
Even if we have a good function, we will still have
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
• Create a linked list to store all elements that
map to same table slot
• Running time
– Insert(T,k): how long? what assumptions?
– Search(T,k): how long?
– Delete(T,x): pointer to element x, how long,
what assumptions?
Search time
• Notation
– n items
– m slots
– load factor a = n/m
• Worst-case search time?
– What is worst case?
• Expected search time
– Simple uniform hashing: each element is equally likely
to hash to any of the m slots, independent of where any
other element has hashed to.
– Expected search time?
Universal hashing
• In the worst-case, for any hash function, the keys
may be exactly the worst-case for your function
• Avoid this by choosing the hash function
randomly independent of the keys to be hashed
• Key distinction from probabilistic analysis
– Universal hash function will work well with high
probability on EVERY input instance but may perform
poorly with low probablity on EVERY input instance
– Probabilistic analysis of static hash function h says h
will work well on most input instances every time but
may perform poorly on some input instances every time
Definition and analysis
• Let H be a finite collection of hash functions that map U
into {0, …, m-1}
• This collection is universal if for each pair of distinct keys
k and q in U, the number of hash functions h in H for
which h(k) = h(q) is at most |H|/m.
• If we choose our hash function randomly from H, this
implies that there is at most a 1/m chance that h(k) = h(q).
• This leads to the expect length of a chain being n/m
– Note we assume chaining and not open addressing in analysis
An example of universal hash
• Choose prime p larger than all possible keys
• Let Zp = {0, …, p-1} and Zp* = {1, …, p-1}
– Clearly p > m. Why?
• ha,b for any a in Zp* and b in Zp
– ha,b(k) = ((ak+b) mod p) mod m
• Hp,m = {ha,b | a in Zp* and b in Zp}
– This family has a total of p(p-1) hash functions
• This family of hash functions is universal
Open addressing
• Store all elements in the table
• Probe the hash table in event of a collision
• Key idea: probe sequence is NOT the same
for each element, depends on initial key
• h: U x {0, 1, …, m-1}  {0, 1, …, m-1}
• Permutation requirement
– h(k,0), h(k,1), …, h(k,m-1) is a permutation of
(0, …, m-1)
• Insert, search straightforward
• Why can we not simply mark a slot as
– If keys need to be deleted, open addressing may
not be the right choice
Probing schemes
• uniform hashing: each of m! permutations equally likely
– not typically achieved
• linear probing: h(k,i) = (h’(k) + i) mod m
– Clustering effect
– Only m possible probe sequences are considered
• quadratic probing: h(k,i) = (h’(k)+ci+di2) mod m
– constraints on c, d, m
– better than linear probing as clustering effect is not as bad
– Only m possible probe sequences are considered, and keys that map to
same position do have identical probe sequences
• double hashing: h(k,i) = (h(k) + iq(k)) mod m
– q(k) must be relatively prime wrt m
– m2 probe sequences considered
– Much closer to uniform hashing
Search time
• Preliminaries
– n elements, m slots, a = n/m with n <= m
– Assumption of uniform hashing
• Expected search time on a miss
– Given that h(k,i) is non-empty, what is the probability that h(k,i+1)
is empty?
– What is expected search time then?
• Expect insertion time is essentially the same. Why?
• Expected search time on a hit
– If entry was ith element added, expected search time is 1/(1 – i/m)
= m/(m-i)
– Sum this over all m and you get 1/a (Hm – Hm-n)
– This can be bounded by 1/a ln 1/(1-a)
Binary search trees
• Supports search, min, max, predecessor, successor,
insert, delete, and list all efficiently
• Thus can be used for more than just dictionary
• Basic tree property
– For any node x
• left subtree has nodes <= x
• right subtree has nodes >= x
Binary Trees
Example Search Trees
• Search procedure?
– search time?
• Minimum node in tree rooted at node x?
– search time?
• Maximum node in tree rooted at node x?
– search time?
• Listing all nodes in sorted order?
– time to list?
Successor and Predecessor
Successor: Find the minimal entry in the right sub-tree, if
there is a right sub-tree. Otherwise find the first ancestor
v such that the entry is in v’s left sub-tree.
Predecessor: Find the maximal entry in the left sub-tree,
if there is a left sub-tree. Otherwise find the first ancestor
v such that the entry is in v’s right sub-tree.
In either test, if the root node is reached, no predecessor/
successor exists.
Simple Insertion and Deletion
Insertion: Traverse the tree as you would when searching.
When the required branch does not exist, attach the new
entry at that location.
Deletion: Three possible cases exist:
a) Entry is a leaf : Just delete it.
b) Entry has one child : Remove entry replacing it with
c) Entry had two children : Replace entry with
successor. Successor has at most one child (why?); use
step a or b on it.
Simple binary search trees
• What is the expected height of a binary
search tree?
• Difficult to compute if we allow both
insertions and deletions
• With insertions, analysis of section 12.4
shows that expected height is O(log n)
Tree-Balancing Algorithms
• Red-Black Trees
• Splay Trees
• Others
– AVL Trees
– 2-3 Trees and 2-3-4 Trees
Manipulating Search Trees
Red-Black Trees
• All nodes in the tree are either red or black.
• Every null-child is included and colored black.
• All red nodes must have two black children.
• Every path from the root to a leaf must have the same
number of black nodes.
How balanced of a tree will this produce? How hard
will it be to maintain?
Example Red-Black Tree
Splay trees
• No adjustment is done in a splay tree when nodes
are inserted or removed.
• All rotations occur within the Search function the element being searched for is rotated to the
root of the tree.
• Individual operations may take O(n) time
• However, it can be shown that any sequence of m
operations including n insertions starting with an
empty tree take O(m log n) time
Splay trees
• Dynamic optimality conjecture: splay trees are as
asymptotically fast on any sequence of operations as any
other type of search tree with rotations.
• What does this mean?
– Worst case sequence of splay tree operations takes amortized O(log
n) time per operation
– Some sequences of operations take less.
• Accessing the same ten items over and over again
– Splay tree should then take less on these sequences as well.
• One special case that has been proven:
– search in order from the smallest key to the largest key, the total
time for all n operations is O(n).
Splay Tree Example
Splay Tree Example
Specialized Data Structures
Geometric shapes