Recurrence Relations

advertisement
Data Types and Data Structures
• Data Types
– Containers
– Dictionaries
– Priority Queue
• Data Structures
– Hash Tables
– Binary Search Trees
Data types & structures
There are numerous options for data structures for
many commonly used abstract data types:
Containers
Dictionaries
Priority Queues
Changing data structures should not change the
correctness of a program, but it can have a dramatic
effect on the speed.
Choosing a Data Structure
It is important to choose the proper data structure when
you first design an algorithm.
There are many data structures that can handle common
operations: insertion, deletion, sorting, searching, finding
the maximum or minimum, predecessor or successor, etc.
Different data structure will each take their own time for
the different operations.
Guidelines...
• Building an algorithm around a properly chosen data
structure leads to both a clean algorithm and good
performance
• Using an incorrect data structure can be disastrous, but you
don’t always need the best structure.
• Sorting is at the heart of many good algorithms.
• Common algorithm design paradigms include divide-andconquer, randomization, incremental construction, and
dynamic programming.
Fundamental Data Types
An abstract data type is a collection of well-defined
operations that can be performed on a particular
structure.
Different data structures make different tradeoffs that
make certain operations (say, insertion ) faster at the cost
of others (say, searching.) Often there will be other
considerations that will make one structure more
desirable over others.
Containers
• Hold data for later retrieval
• Operations:
– Insert(item)
– Retrieve(); typically removing item from container
• Simple data structures for implementing containers
– Stack: LIFO
– Queue: FIFO
– Table: retrieve by index
• Implementation
– Linked list or array
Dictionaries
• Dictionaries are a form of container that permits access to data items
by content (key).
• Operations:
– Insert(key)
– delete(pointer to item)
– search(key)
• Linked list implementation (no sorting)
– Insert:
– Delete:
– Search
• Sorted array implementation
– Insert:
– Delete:
– Search:
Priority Queues
Insert(x) : Given an item x, insert it into the
priority Queue.
Find-Maximum( ) : Return the item with the
maximal priority.
Delete-Maximum( ) : Remove the item from the
queue whose key is maximum.
Data Structures
• Ways to implement data types
–
–
–
–
–
Linked lists
Arrays with auxilary data
Hash table
Binary search tree
Others, of course
Hash Tables
• Maintain an array to hold your items
• “Hash” the key to determine the index the
specific item should be stored at
• Good hash functions
• Methods for dealing with collisions
– Chaining
• Universal hash functions
– Open addressing
Direct-address hash table
• Assumptions
– Universe of keys is small (size m)
– Set of keys can be mapped to {0, 1, …, m-1}
– No elements have the same key
• Use an array of size m
– Array contents can be pointer to element
– Array can directly store element
Hash Functions
• Problem with direct-addressed tables
– Universe of possible keys U is too large
– Set of keys used K may be much smaller
• Hash function
– Use an array of size Q(m)
– Use function h(k) = x to determine slot x
– h: U  {0, 1, …, m-1}
• Collision
– When h(k1) = h(k2)
Good Hash Functions
• Each key is equally likely to hash to any of the m
slots independently of where any other key has
hashed to
• Difficult to achieve as this requires knowledge of
distribution of keys
• Good characteristics
– Must be able to evaluate quickly
– May want keys that are “close” to map to slots that are
far apart
Hashing by Height
1’
2’
3’
4’
5’
6’
7’
8’
9’
Collisions unavoidable
Even if we have a good function, we will still have
collisions:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Chaining
• Create a linked list to store all elements that
map to same table slot
• Running time
– Insert(T,k): how long? what assumptions?
– Search(T,k): how long?
– Delete(T,x): pointer to element x, how long,
what assumptions?
Search time
• Notation
– n items
– m slots
– load factor a = n/m
• Worst-case search time?
– What is worst case?
• Expected search time
– Simple uniform hashing: each element is equally likely
to hash to any of the m slots, independent of where any
other element has hashed to.
– Expected search time?
Universal hashing
• In the worst-case, for any hash function, the keys
may be exactly the worst-case for your function
• Avoid this by choosing the hash function
randomly independent of the keys to be hashed
• Key distinction from probabilistic analysis
– Universal hash function will work well with high
probability on EVERY input instance but may perform
poorly with low probablity on EVERY input instance
– Probabilistic analysis of static hash function h says h
will work well on most input instances every time but
may perform poorly on some input instances every time
Definition and analysis
• Let H be a finite collection of hash functions that map U
into {0, …, m-1}
• This collection is universal if for each pair of distinct keys
k and q in U, the number of hash functions h in H for
which h(k) = h(q) is at most |H|/m.
• If we choose our hash function randomly from H, this
implies that there is at most a 1/m chance that h(k) = h(q).
• This leads to the expect length of a chain being n/m
– Note we assume chaining and not open addressing in analysis
An example of universal hash
functions
• Choose prime p larger than all possible keys
• Let Zp = {0, …, p-1} and Zp* = {1, …, p-1}
– Clearly p > m. Why?
• ha,b for any a in Zp* and b in Zp
– ha,b(k) = ((ak+b) mod p) mod m
• Hp,m = {ha,b | a in Zp* and b in Zp}
– This family has a total of p(p-1) hash functions
• This family of hash functions is universal
Open addressing
• Store all elements in the table
• Probe the hash table in event of a collision
• Key idea: probe sequence is NOT the same
for each element, depends on initial key
• h: U x {0, 1, …, m-1}  {0, 1, …, m-1}
• Permutation requirement
– h(k,0), h(k,1), …, h(k,m-1) is a permutation of
(0, …, m-1)
Operations
• Insert, search straightforward
• Why can we not simply mark a slot as
deleted?
– If keys need to be deleted, open addressing may
not be the right choice
Probing schemes
• uniform hashing: each of m! permutations equally likely
– not typically achieved
• linear probing: h(k,i) = (h’(k) + i) mod m
– Clustering effect
– Only m possible probe sequences are considered
• quadratic probing: h(k,i) = (h’(k)+ci+di2) mod m
– constraints on c, d, m
– better than linear probing as clustering effect is not as bad
– Only m possible probe sequences are considered, and keys that map to
same position do have identical probe sequences
• double hashing: h(k,i) = (h(k) + iq(k)) mod m
– q(k) must be relatively prime wrt m
– m2 probe sequences considered
– Much closer to uniform hashing
Search time
• Preliminaries
– n elements, m slots, a = n/m with n <= m
– Assumption of uniform hashing
• Expected search time on a miss
– Given that h(k,i) is non-empty, what is the probability that h(k,i+1)
is empty?
– What is expected search time then?
• Expect insertion time is essentially the same. Why?
• Expected search time on a hit
– If entry was ith element added, expected search time is 1/(1 – i/m)
= m/(m-i)
– Sum this over all m and you get 1/a (Hm – Hm-n)
– This can be bounded by 1/a ln 1/(1-a)
Binary search trees
• Supports search, min, max, predecessor, successor,
insert, delete, and list all efficiently
• Thus can be used for more than just dictionary
applications
• Basic tree property
– For any node x
• left subtree has nodes <= x
• right subtree has nodes >= x
Binary Trees
Example Search Trees
Operations
• Search procedure?
– search time?
• Minimum node in tree rooted at node x?
– search time?
• Maximum node in tree rooted at node x?
– search time?
• Listing all nodes in sorted order?
– time to list?
Successor and Predecessor
Successor: Find the minimal entry in the right sub-tree, if
there is a right sub-tree. Otherwise find the first ancestor
v such that the entry is in v’s left sub-tree.
Predecessor: Find the maximal entry in the left sub-tree,
if there is a left sub-tree. Otherwise find the first ancestor
v such that the entry is in v’s right sub-tree.
In either test, if the root node is reached, no predecessor/
successor exists.
Simple Insertion and Deletion
Insertion: Traverse the tree as you would when searching.
When the required branch does not exist, attach the new
entry at that location.
Deletion: Three possible cases exist:
a) Entry is a leaf : Just delete it.
b) Entry has one child : Remove entry replacing it with
child.
c) Entry had two children : Replace entry with
successor. Successor has at most one child (why?); use
step a or b on it.
Simple binary search trees
• What is the expected height of a binary
search tree?
• Difficult to compute if we allow both
insertions and deletions
• With insertions, analysis of section 12.4
shows that expected height is O(log n)
Tree-Balancing Algorithms
• Red-Black Trees
• Splay Trees
• Others
– AVL Trees
– 2-3 Trees and 2-3-4 Trees
Manipulating Search Trees
Red-Black Trees
• All nodes in the tree are either red or black.
• Every null-child is included and colored black.
• All red nodes must have two black children.
• Every path from the root to a leaf must have the same
number of black nodes.
How balanced of a tree will this produce? How hard
will it be to maintain?
Example Red-Black Tree
Splay trees
• No adjustment is done in a splay tree when nodes
are inserted or removed.
• All rotations occur within the Search function the element being searched for is rotated to the
root of the tree.
• Individual operations may take O(n) time
• However, it can be shown that any sequence of m
operations including n insertions starting with an
empty tree take O(m log n) time
Splay trees
• Dynamic optimality conjecture: splay trees are as
asymptotically fast on any sequence of operations as any
other type of search tree with rotations.
• What does this mean?
– Worst case sequence of splay tree operations takes amortized O(log
n) time per operation
– Some sequences of operations take less.
• Accessing the same ten items over and over again
– Splay tree should then take less on these sequences as well.
• One special case that has been proven:
– search in order from the smallest key to the largest key, the total
time for all n operations is O(n).
Splay Tree Example
Splay Tree Example
Specialized Data Structures
•
•
•
•
•
Strings
Geometric shapes
Graphs
Sets
Schedules
Download