DS_And_Algos_IITD_03

advertisement
 Data Structures and Algorithms
Dr. Amit Kumar and Dr. Amitabha Bagchi
March 13, 2008
IIT Delhi
1
Teaser
Given two (2,4) trees such that each key in the first tree is at most any key in the second tree. How do you merge these to get a single (2,4) tree in O
(log n + log m) time. Here n and m are the number of nodes in the two trees.
March 13, 2008
2
(2,4) Trees

Properties:




Each node has at most 4 children
All external nodes have same depth
Height h of (2,4) tree is O(log n).
Search, Insert, delete in O(log n) time. March 13, 2008
12
5 10
3 4
6 8
15
11
1 31 4
17
3
Beyond (2,4) Trees
What do we know about (2,4)Trees?

Balanced

O(log n) search time

Different node structures

Question: Can we get the (2,4) tree advantages in a binary tree format???

Welcome to the world of Red­Black Trees!!!
March 13, 2008
4
Red­Black Tree
A red­black tree is a binary search tree with the following properties:

edges are colored red or black

no two consecutive red edges on any root­leaf path 
same number of black edges on any root­leaf path (black height)

edges connecting leaves are black
March 13, 2008
5
(2,4) Tree Evolution

Note how (2,4)­trees relate to red­
black trees

A red­black tree can be viewed as a representation of a (2,4)­tree that breaks nodes with 3 or 4 children into two levels of nodes.
March 13, 2008
6
Red­Black Tree Properties
Notation:
N is # of internal nodes
L is # leaves (= N + 1)
H is height
B is black height (the height if red nodes are not counted.)
March 13, 2008
7
Insertion into Red­Black
1)Perform a standard search to find the leaf where the key should be added
2)Replace the leaf with an internal node with the new key
3)Color the incoming edge of the new node red
4)Add two new leaves, and color their incoming edges black
5)If the parent had an incoming red edge, we now have two consecutive red edges! We must reorganize tree to remove that violation. What must be done depends on the sibling of the parent.
March 13, 2008
8
Insertion ­ Plain and Simple
March 13, 2008
9
Restructuring
We call this a “rotation”
 No further work necessary
 Inorder remains unchanged
 Black depth is preserved for all leaves
 No more consecutive red edges!
 Corrects “malformed” 4­node in the associated (2,4) tree
March 13, 2008
10
More Rotations
g
g
p
p
n
p
p
g
n
n
g
n
g
n
p
g
p
n
March 13, 2008
11
Promotion




We call this a “recoloring”
The black depth remains unchanged for all the descendants of g
This process will continue upward beyond g if necessary: rename g as n and repeat.
Splits 5­node of the associated (2,4) tree
March 13, 2008
12
Summary of Insertion

If two red edges are present, we do either


a restructuring (with a simple or double rotation) and stop, or
 a recoloring and continue
A restructuring takes constant time and is performed at most once. It reorganizes an off­balanced section of the tree.
Recolorings may continue up the tree and are executed O(log N) times.

The time complexity of an insertion is O(log N).

March 13, 2008
13
An Example

Start by inserting “REDSOX” into an empty tree
Now, let’s insert “C U B S”...
March 13, 2008
14
Example
March 13, 2008
15
Example
What should we do?
March 13, 2008
16
March 13, 2008
17
March 13, 2008
18
March 13, 2008
19
March 13, 2008
20
E
C
B
R
D O
U
S
S
E
C
B
X
BIFF!
R
D O
U
S
March 13, 2008
X
S
21
E
C
B
Rotation
R
D
O
U
S
X
S
R
U
E
C
March 13, 2008
B
O
D
S
X
S
22
Setting Up Deletion



As with binary search trees, we can always delete a node that has at least one external child
If the key to be deleted is stored at a node that has no external children, we move there the key of its inorder predecessor (or successor), and delete that node instead
Example: to delete key 7, we move key 5 to node u, and delete node v
E
RR
C
B
D
U
O
S
March 13, 2008
X
23
Deletion Algorithm

Remove v where w is a leaf child. 
If v was red or u is red, color u black. Else, color u double black
v
u
w
v
u
While a double black edge exists, perform one of the
following actions ..
March 13, 2008
u
u
24
How to Eliminate the Double Black Edge





The intuitive idea is to perform a “color compensation’’
Find a red edge nearby, and change the pair (red , double black) into (black , black)
As for insertion, we have two cases:
 restructuring, and
 recoloring (demotion, inverse of promotion)
Restructuring resolves the problem locally, while recoloring may propagate it two levels up
Slightly more complicated than insertion, since two restructurings may occur (instead of just one)
March 13, 2008
25
Case 1: black sibling with a red child

If sibling is black and one of its children is red, perform a restructuring p
v
p
s
v
z
p
v
March 13, 2008
z
s
z
z
p
s
v
26
(2,4) Tree Interpretation
x
... 3 0 ...
...
30
...
y
1020
z
20
40
r
10
40
. . . 2 0 . . b.
...
...
a
1 0
3 0
1 0
2 0
c
3 0
r
4 0
4 0
March 13, 2008
27
Case 2: black sibling with black childern

If sibling and its children are black, perform a recoloring

If parent becomes double black, continue upward
p
p
v
s
v
p
v
March 13, 2008
s
p
s
v
s
28
(2,4) Tree Interpretation
10
x
1 0 3 0 ...
30
...
y
r
20
40
20
40
1 0
x
1 0 ...
3 0
...
y
r
2 0
4 0
2 03 0
4 0
March 13, 2008
29
Case 3: red sibling

If sibling is red, perform an adjustment

Now the sibling is black and one the of previous cases applies
If the next case is recoloring, there is no propagation upward (parent is now red)

s
p
v
p
s
v
March 13, 2008
30
How About an Example?
6
Remove 9
4
2
8
7
5
9
6
4
2
March 13, 2008
8
5
7
31
Example

What do we know?
 Sibling is black with black children

What do we do?
 Recoloring
6
6
4
2
March 13, 2008
8
5
7
4
2
8
5
7
32
Example

Delete 8
no double black
6
6
8
4
2
March 13, 2008
5
7
4
2
7
5
33
Example
Delete 7
 Restructuring
6
6
7
4
2
4
5
2
5
4
6
2
5
March 13, 2008
34
Example
March 13, 2008
35
Example
March 13, 2008
36
Summary of Red­Black Trees

An insertion or deletion may cause a local perturbation (two consecutive red edges, or a double­black edge)

The perturbation is either
 resolved locally (restructuring), or
 propagated to a higher level in the tree by recoloring (promotion or demotion)
O(1) time for a restructuring or recoloring
At most one restructuring per insertion, and at most two restructurings per deletion
O(log N) recolorings
Total time: O(log N)




March 13, 2008
37
Tries
Data­structure for dictionary operations on a set of strings. March 13, 2008
38
Standard Tries

The standard trie for a set of strings S is an ordered tree such that:



each node but the root is labeled with a character
the children of a node are alphabetically ordered
the paths from the external nodes to the root yield the strings of S
Example: standard trie for the set of strings
S = { bear, bell, bid, bull,
buy, sell, stock, stop }

March 13, 2008
39
Standard Tries
•A standard trie uses O(n) space. •Operations (find, insert, remove) take time O(dm) each, where:
­n = total size of the strings in S,
­m =size of the string parameter of the operation
­d =alphabet size, March 13, 2008
40
Applications of Tries
A standard trie supports the following operations on a preprocessed text in time O(m), where m = |X|
­word matching : find the first occurence of word
X in the text
­prefix matching: find the first occurrence of the longest prefix of word X in the text


Each operation is performed by tracing a path in the trie starting at the root
March 13, 2008
41
Applications of Tries
March 13, 2008
42
Compressed Tries


Trie with nodes of degree at least 2
Obtained from standard trie by compressing chains of redundant nodes
Standard Trie:
Compressed Trie:
March 13, 2008
43
Compact Storage of Compressed Tries

A compressed trie can be stored in space O(s), where s = |S|, by using O(1) space index ranges at the nodes
March 13, 2008
44
Insertion and Deletion
March 13, 2008
45
Suffix Tries
A suffix trie is a compressed trie for all the suffixes of a text
Example:

Compact representation:
March 13, 2008
46
Properties of Suffix Tries

The suffix trie for a text X of size n from an alphabet of size d
­stores all the n(n­1)/2 suffixes of X in O(n) space
­supports arbitrary pattern matching and prefix matching queries in O(dm) time, where m is the length of the pattern
­can be constructed in O(dn) time
March 13, 2008
47
Tries and Web Search Engines

The index of a search engine (collection of all searchable words) is stored into a compressed trie

Each leaf of the trie is associated with a word and has a list of pages (URLs) containing that word, called occurrence list
The trie is kept in internal memory
The occurrence lists are kept in external memory and are ranked by relevance
Boolean queries for sets of words (e.g., Java and coffee) correspond to set operations (e.g., intersection) on the occurrence lists
Additional information retrieval techniques are used, such as

stopword elimination (e.g., ignore “the” “a” “is”)

stemming (e.g., identify “add” “adding” “added”)

link analysis (recognize authoritative pages)




March 13, 2008
48
Tries and Internet Routers








Computers on the internet (hosts) are identified by a unique 32­bit IP ( internet protocol) addres, usually written in “dotted­quad­decimal” notation
E.g., 10.20.25.70
Use nslookup on Unix to find out IP addresses
An organization uses a subset of IP addresses with the same prefix, e.g., IITD uses 10.20.*.*.
Data is sent to a host by fragmenting it into packets. Each packet carries the IP address of its destination.
The internet whose nodes are routers, and whose edges are communication links.
A router forwards packets to its neighbors using IP prefix matching rules. E.g., a packet with IP prefix 10.20 should be forwarded to the IIT gateway router.
Routers use tries on the alphabet 0,1 to do prefix matching.
March 13, 2008
49
Priority Queues

A priority queue is an ADT(abstract data type) for maintaining a set S of elements, each with an associated value called key

A PQ supports the following operations

Insert(S,x) insert element x in set S (S←S∪{x})

Maximum(S) returns the element of S with the largest key

Extract­Max(S) returns and removes the element of S with the largest key
March 13, 2008
50
Priority Queues


Applications:  job scheduling shared computing resources (Unix)
 Event simulation
 As a building block for other algorithms
A Heap can be used to implement a PQ
March 13, 2008
51
Heaps

Can be viewed as a nearly complete binary tree



All levels, except the lowest one are completely filled
The key in root is greater or equal than all its children, and the left and right subtrees are again binary heaps
Height ? March 13, 2008
52
Heaps


Binary heap data structure A
 Can be implemented as an array
Two attributes
 length[A]
 heap­size[A]
March 13, 2008
53
Heaps
Parent (i)
return i/2
Left (i)
return 2i
Right (i)
return 2i+1
Heap property:
A[Parent(i)] ≥ A[i]
1 2 3
16 15 10
4
8
5
7
6
9
7
3
8
2
9
4
10
1
Level: 3 2 1 0
March 13, 2008
54
Heaps


Notice the implicit tree links; children of node i are 2i and 2i+1
Why is this useful?
 In a binary representation, a multiplication/division by two is left/right shift
 Adding 1 can be done by adding the lowest bit
March 13, 2008
55
Heapify




i is index into the array A
Binary trees rooted at Left(i) and Right(i) are heaps
But, A[i] might be smaller than its children, thus violating the heap property
The method Heapify makes A a heap once more by moving A[i] down the heap until the heap property is satisfied again
March 13, 2008
56
Heapify Example
March 13, 2008
57
Heapify: Running Time

Running time on a node of height h: O(h)
March 13, 2008
58
Extract Max

Removal of max takes constant time on top of Heapify
Θ(lg n)
March 13, 2008
59
Insertion

Insertion of a new element


enlarge the PQ and propagate the new element from last place ”up” the PQ
tree is of height lg n, running time: Θ(lg n)
March 13, 2008
60
Insertion (Example)
March 13, 2008
61
Teaser
Given a heap as an array, how do you find the k largest
element in O(k log k) time. March 13, 2008
th
62
Graphs – Definition

A graph G = (V,E) is composed of:

V: set of vertices

E⊂ V× V: set of edges connecting the vertices

An edge e = (u,v) is a pair of vertices

(u,v) is ordered, if G is a directed graph
March 13, 2008
63
Applications



Electronic circuits, pipeline networks
Transportation and communication networks
Modeling any sort of relationtionships (between components, people, processes, concepts)
March 13, 2008
64
(Undirected) Graph Terminology


adjacent vertices: connected by an edge (neighbors)
degree (of a vertex): # of adjacent vertices
∑ deg(v) = 2(# of edges)
v∈V
Since adjacent vertices each count the adjoining edge, it will be counted twice

path: sequence of vertices v1 ,v2 ,. . .vk such that consecutive vertices vi and vi+1 are adjacent
March 13, 2008
65
Graph Terminology 
simple path: no repeated vertices
March 13, 2008
66
Graph Terminology 
cycle: simple path, except that the last vertex is the same as the first vertex

connected graph: any two vertices are connected by some path
March 13, 2008
67
Graph Terminology 

subgraph: subset of vertices and edges forming a graph
connected component: maximal connected subgraph. E.g., the graph below has 3 connected components
March 13, 2008
68
Graph Terminology 

(free) tree ­ connected graph without cycles
forest ­ collection of trees
March 13, 2008
69
Data Structures for Graphs

How can we represent a graph?

To start with, we can store the vertices and the edges in two containers, and we store with each edge object references to its start and end vertices
March 13, 2008
70
Edge List

The edge list  Easy to implement
 Finding the edges incident on a given vertex is inefficient since it requires examining the entire edge sequence
March 13, 2008
71
Adjacency List


The Adjacency list of a vertex v: a sequence of vertices adjacent to v
Represent the graph by the adjacency lists of all its vertices
Space = Θ(n + ∑ deg(v)) = Θ( n + m)
March 13, 2008
72
Adjacency Matrix




Matrix M with entries for all pairs of vertices
M[i,j] = true – there is an edge (i,j) M[i,j] = false – there is no edge (i,j) Space = O(n2)
March 13, 2008
73
Graph Searching Algorithms




Systematic search of every edge and vertex of the graph
Graph G = (V,E) is either directed or undirected
Basic Question : given two vertices u and v, find a path from u to v. Applications
 Compilers
 Graphics
 Maze­solving
 Mapping
 Networks: routing, searching, clustering, etc.
March 13, 2008
74
Graph Searching Algorithms
Traverse (v) {
visit v; for each neighbour u of v Traverse (u);
}
what is wrong ? Need to remember if a vertex has been visited or not!
March 13, 2008
75
Graph Searching Algorithms
Visited[] : initialized to FALSE. Traverse (v) {
visit v; visited[v] = TRUE;
for each neighbour u of v if (visited[u] == FALSE) Traverse (u);
}
Called DEPTH FIRST SEARCH (DFS)
March 13, 2008
76
Examples
March 13, 2008
77
DFS
What if graph has many connected components ?
Visited[] : initialized to FALSE. For v = 1, …, N do if visited[v] = false Traverse (v) {
visit v; visited[v] = TRUE;
for each neighbour u of v if (visited[u] == FALSE) Traverse (u);
}
March 13, 2008
78
Running Time O(n + m) March 13, 2008
79
Depth­First Search

A depth­first search (DFS) in an undirected graph G is like wandering in a labyrinth with a string and a can of paint  We start at vertex s, tying the end of our string to the point and painting s “visited
(discovered)”. Next we label s as our current vertex called u
 Now, we travel along an arbitrary edge (u,v).
 If edge (u,v) leads us to an already visited vertex v we return to u
 If vertex v is unvisited, we unroll our string, move to v, paint v “visited”, set v as our current vertex, and repeat the previous steps
March 13, 2008
80
Depth­First Search 
Eventually, we will get to a point where all incident edges on u lead to visited vertices 
We then backtrack by unrolling our string to a previously visited vertex v. Then v becomes our current vertex and we repeat the previous steps
Then, if all incident edges on v lead to visited vertices, we backtrack as we did before. We continue to backtrack along the path we have traveled, finding and exploring unexplored edges, and repeating the procedure

March 13, 2008
81
Depth­First Search Using DFS, we can
• check if a graph is connected. • find the connected components in a graph
• check if there is a path from u to v. How do we find a path from u to v ??
When we recursively call Traverse(u), remember who is responsible for calling Traverse(u). March 13, 2008
82
DFS
Visited[] : initialized to FALSE. Traverse (v) {
visit v; visited[v] = TRUE;
for each neighbour u of v if (visited[u] == FALSE) {
p[u] = v;
Traverse (u);
}
}
March 13, 2008
83
Depth­First Search  To find a path from u to v, Start a DFS from u. x = v; while (x ! = u) x = p[x];  If create a new graph where we add an edge between u and p[u], what do we get ? DFS spanning tree !
March 13, 2008
84
Teaser Let T be a DFS tree of G. Show that if (u,v) is an edge in the graph G which is not in T, then either u is an ancestor of v or v is an ancestor of u. March 13, 2008
85
Breadth First Search



BFS in an undirected graph G is like wandering in a labyrinth with a string.
starting vertex s is assigned a distance 0.
In the first round, the string is unrolled the length of one edge, and all of the edges that are only one edge away from the anchor are visited (discovered), and assigned distances of 1
March 13, 2008
86
Breadth­First Search 


In the second round, all the new edges that can be reached by unrolling the string 2 edges are visited and assigned a distance of 2
This continues until every vertex has been assigned a level
The label of any vertex v corresponds to the length of the shortest path (in terms of edges) from s to v
March 13, 2008
87
BFS Example
March 13, 2008
88
BFS
Visited[] : initialized to FALSE. s : starting vertex. Q : queue
Initially contains just s. While (Q is not empty) {
x = Q. dequeue(); visited[x] = true; for each neighbor y of x if visited[x] = false
Q.insert(y)
}
What is missing ? March 13, 2008
89
BFS
Visited[] : initialized to FALSE. s : starting vertex. Q : queue
Initially contains just s. inQueue[v] : is v in queue ? While (Q is not empty) {
x = Q. dequeue(); visited[x] = true; for each neighbor y of x if (visited[x] = false AND inQueue(y) = false)
Q.insert(y); p[y] = x;
}
March 13, 2008
90
BFS Tree Properties No edge in the graph can cross more than 1 level. WHY ?
March 13, 2008
91
BFS Tree Properties BFS tree is a shortest path tree. March 13, 2008
92
BFS Properties

Given a graph G = (V,E), BFS discovers all vertices reachable from a source vertex s

It computes the shortest distance to all reachable vertices

It computes a breadth­first tree that contains all such reachable vertices
For any vertex v reachable from s, the path in the breadth first tree from s to v, corresponds to a shortest path in G

March 13, 2008
93
Directed Graphs
Edges are ordered pairs. Can use adjacency matrix or adjacency list representation. March 13, 2008
94
Directed Graphs Terminology
Outdegree, indegree of a vertex. What is the sum of the indegrees of all the edges ?
Directed paths, directed cycles. What about connectivity ? A directed graph is strongly connected if given any two vertices u and v
there is a path from u to v. March 13, 2008
95
DFS on directed graphs. DFS (G,v)
visited[v] ← true
pre[v] ← clock; clock++
for each out­neighbor u of v
if not visited(u) then DFS(G,u)
post[v] ← clock; clock++
vertex Iv := [pre[v],post[v]]
What properties do these intervals have ? March 13, 2008
96
Example
A
B
C
March 13, 2008
D
97
Example
A
B
C
March 13, 2008
D
98
Example
A
Back edge
B
C
March 13, 2008
D
99
Example
A
Back edge
Forward edge
B
D
C
Cross edge
March 13, 2008
100
Applications How do we find a cycle in a directed graph ?
How do we check if a graph is strongly connected ? March 13, 2008
101
Topological Sorting

For each edge (u,v), vertex u is visited before vertex v
w ak e up
1
2
eat
cs16 meditation
7
play
4
w ork
8
cs16 program
9
mak e cookies
for cs16 HT A
March 13, 2008
10
sleep
3
A typical student day
5
more cs16
6
cxhe xtris
11
dream of cs16
102
Topological Sorting

Topological sorting may not be unique
A B C D
A
or
C
B
A C B D
D
March 13, 2008
103
Topological Sorting

Labels are increasing along a directed path

A digraph has a topological sorting if and only if it is acyclic (i.e., a dag)
1
A
2
3
B
C
4
D
March 13, 2008
5
E
104
Topological Sorting

Can you use DFS for topological sorting ? 1
A
2
3
B
C
4
D
March 13, 2008
5
E
105
Download