Scapegoat tree

advertisement
Outline
• Scapegoat Trees ( O(log n) amortized time)
• 2-4 Trees ( O(log n) worst case time)
• Red Black Trees ( O(log n) worst case time)
Review Skiplists and Treaps
• So far, we have seen treaps and skiplists
• Randomized structures
• Insert/delete/search in O(log n) expected
time
• Expectation depends on random choices made
by the data structure
• Coin tosses made by a skiplist
• Random priorities assigned by a treap
Scapegoat trees
• Deterministic data structure
• Lazy data structure
• Only does work when search paths get too long
• Search in O(log n) worst-case time
• Insert/delete in O(log n) amortized time
• Starting with an empty scapegoat tree, a
sequence of m insertions and deletions takes
O(mlog n) time
Scapegoat philosophy
• We follow a simple strategy.
• If the tree is not optimal rebuild.
• Is this a good binary
7
search tree?
15
16
3
11
1
0
5
2
4
9
6
8
13
10
12
14
• It has 17 nodes and 5 levels
• Any binary tree with 17 nodes has at least 5
levels (A binary tree with 4 levels has at most 24 - 1 = 15 nodes)
• This is an “optimal" binary search tree.
Scapegoat philosophy
• Rebuild the tree cost O(n) time
• We cannot do it to often if we want to keep the
order of O(log n) amortized time.
• Scapegoat
trees
keepwhen
two counters:
• How to
know
we need
1.
to
n: the number
itemstree?
in the tree (size)
rebuildof the
2. q: an overestimate of n
• We maintain the following two invariants:
1.
q/2 ≤ n ≤ q
2. No node has depth greater than log3/2 q
Search and Delete
• How can we perform a search in a Scapegoat
tree?
• How can we delete a value x from a Scapegoat
tree?
1. run the standard deletion algorithm for
binary search trees.
2. decrement n
3. if n < q/2 then
• rebuild the entire tree and set q=n
• How can we insert a value x into a Scapegoat
tree?
Insert
• How can we insert a value x into a Scapegoat
tree?
• To insert the value x into a ScapegoatTree:
1.
Create a node u and insert in the normal way.
2.
Increment n and q
3.
If the depth of u is greater than log3/2 q,
then
• Walk up to the root from u until reaching a node w
with size(w) > (2/3) size(w:parent)
• Rebuild the subtree rooted at w.parent
Inserting into a Scapegoat tree
( easy case )
n = q = 10
11
5
u=3.5
8
2
1
0
7
4
3
3.5
9
6
u
1.
Create a node u and insert in the normal way.
2.
Increment n and q
3.
depth(u) = 4 ≤ log3/2 q = 5.913
Inserting into a Scapegoat tree
( bad case )
7
6
5
2
1
n = q = 11
8
u=3.5
9
d(u) = 6 > log3/2 q = 5.913
4
0
3
w
3.5
1 ≤ (2/3)2 = 1.33
size(w) > (2/3) size(w.parent)
Inserting into a Scapegoat tree
( bad case )
7
6
5
2
1
0
n = q = 11
8
u=3.5
9
d(u) = 6 > log3/2 q = 5.913
4
3
w
2 ≤ (2/3)3 = 2
3.5
size(w) > (2/3) size(w.parent)
Inserting into a Scapegoat tree
( bad case )
7
6
5
2
1
0
4
3
w
n = q = 11
8
u=3.5
9
d(u) = 6 > log3/2 q = 5.913
3 ≤ (2/3)6 = 4
3.5
size(w) > (2/3) size(w.parent)
Inserting into a Scapegoat tree
( bad case )
7
6
5
2
1
0
n = q = 11
8
u=3.5
9
d(u) = 6 > log3/2 q = 5.913
w
4
6 > (2/3)7 = 4.67
3
3.5
( Scapegoat )
size(w) > (2/3) size(w.parent)
Inserting into a Scapegoat tree
( bad case )
7
6
n = q = 11
8
3
9
4
1
0
u=3.5
2
3.5
5
• How can we be sure that the scapegoat node
always exist?
Why is there always a
scapegoat?
• Lemma: if d > log3/2 q then there exists a scapegoat
node.
• Proof by contradiction
• Assume (for contradiction) that we don't find a
scapegoat node.
• Then size(w) ≤ (2/3) size(w.parent) for all nodes w on
the path to u
• The size of a node at depth i is at most n(2/3)I
• But d > log3/2 q ≥ log3/2 n, so
size(u) ≤ n(2/3)d < n(2/3)log3/2 n n = n/n = 1
• Contradiction! (Since size(u)=1) So there must be a
scapegoat node.
• So far, we know
Summary
• Insert and delete maintain the invariants:
• the depth of any node is at most log3/2 q
• q < 2n
• So the depth of any node is most
log3/2 2n ≤ 2 + log3/2 n
• So, we can search in a scapegoat tree in O(log n) time
• Some issues still to resolve
• How do we keep track of size(w) for each node w?
• How much time is spent rebuilding nodes during
deletion and insertion?
Keeping track of the size
• There are two possible solutions:
• Solution 1: Each node keeps an extra counter
for its size
• During insertion, each node on the path to u
gets its counter incremented
• During deletion, each node on the path to u
gets its counter decremented
• We calculate sizes bottom-up during a
rebuild
• Solution 2: Each node doesn't keep an extra
counter for its size
(Not) keeping track of the size
• We only need the size(w) while looking for a scapegoat
• Knowing size(w), we can compute size(w.parent) by
traversing the subtree rooted at sibling(w)
7
• So, in O(size(v)), we know all sizes up to
the scapegoat node time
• But we do O(size(v)) work when we
rebuild v anyway, so this doesn't
2
add anything to the cost of
1
rebuilding
0
6
5
8
9
4
3
3.5
Analysis of deletion
• When deleting, if n < q/2, then we rebuild the
whole tree
• This takes O(n) time
• If n < q/2 then we have done at least q - n > n/2
deletions
• The amortized (average) cost of rebuilding (due
to deletions) is O(1) per deletion
Analysis of insertion
• If no rebuild is necessary the cost of the
insertion is log( n )
• After rebuilding a sub tree containing node v,
both of its children have de same size*.
• If the subtree rooted in v has size n we needed
at least n/3 insertion the previous rebuilding
process.
• The rebuild cost n(log n) operations
• Thus the cost of the insertion is O(log n)
amortized time.
Scapegoat trees summary
• Theorem:
− The cost to search in a scapegoat tree is
O(log n) in the worst-case.
− The cost of insertion and deletion in a
scapegoat tree are O(log n) amortized time per
operation.
• Scapegoat trees often work even better than
expected
• If we get lucky, then no rebuilding is required
Review: Maintaining
Sorted Sets
• We have seen the following data structures for
implementing a SortedSet
− Skiplists: find(x)/add(x)/remove(x) in
O(log n) expected time per operation
− Treaps: find(x)/add(x)/remove(x) in
O(log n) expected time per operation
− Scapegoat trees: find(x) in
O(log n) worst-case time per operation,
add(x)/remove(x) in
O(log n) amortized time per operation
Review: Maintaining
Sorted Sets
• No data structures course would be complete
without covering
− 2-4 trees: find(x)/add(x)/remove(x) in
O(log n) worst-case time per operation
− Red-black trees: find(x)/add(x)/remove(x) in
O(log n) worst-case time per operation
The height of 2-4 Trees
• A 2-4 tree is a tree in which
• Each internal node has 2, 3, or 4 children
• All the leaves are at the same level
Binary Trees
• Lemma: A 2-4 tree of height h ≥ 0 has at least
2h leaves
• Proof: The number of nodes at level i is at
least 2i
• Corollary: A 2-4 tree with n > 0 leaves has
height at most log2 n
• Proof: n ≥ 2h ↔ log2 ≥ h
≥20=1
≥21=2
≥22=4
≥23=8
Add a leaf to a 2-4 Trees
• To add a leaf w as a child of a node u in a 2-4 tree:
1. Add w as a child of u
Add a leaf to a 2-4 Trees
• To add a leaf w as a child of a node u in a 2-4 tree:
1. Add w as a child of u
2. While u has 5 children do:
1. Split u into two nodes with 2 and 3 children,
respectively, and make them children of u.parent
2. Set u = u.parent
3. If root was split, create new root with 2 children
• This runs in O(h) = O(log n) time
Deleting a leaf to a 2-4 Trees
• To delete a leaf w from a 2-4 tree:
1. Remove w from its parent u
2. While u has 1 child and u != root
1. If u has a sibling v with 3 or more children then
borrow a child from v
2. Else merge u with its sibling v, remove v from
u.parent and set u = u.parent
3. If u == root and u has 1 child, then set root = u.child[0]
Deleting a leaf to a 2-4 Trees
• To delete a leaf w from a 2-4 tree:
1. Remove w from its parent u
2. While u has 1 child and u != root
1. If u has a sibling v with 3 or more children then
borrow a child from v
2. Else merge u with its sibling v, remove v from
u.parent and set u = u.parent
3. If u == root and u has 1 child, then set root = u.child[0]
• This runs in
O(h) = O(log n) time
2-4 trees can act as
search trees
3-5
0-1-2
0
1
2
6-7-8
-43
4
5
6
7
8
9
• How?
• All n keys are stored in the leaves
• Internal nodes store 1, 2, or 3 values to direct searches
to the correct subtree
• Searches take O(h) = O(log n) time
• Theorem: A 2-4 tree supports the operations find(x),
add(x), and remove(x) in O(log n) time per operation
Red-Black Trees
• 2-4 trees are nice, but they aren't binary trees
• How can we made it binary
Red-black trees
binary version of 2-4 trees
Red-Black Trees
• A red-black tree is a binary search tree in
which each node is colored red or black
1. Each red node has 2 black children
2. The number of black nodes on every root-toleaf path is the same
• null (external) nodes are considered black
• the root is always black
Red-Black trees and 2-4 trees
• A red-black tree is an encoding of 2-4 tree as a
binary tree
• Red nodes are “virtual nodes" that allow 3 and 4
children per black node
The height of Red-Black Trees
• Red-black trees
properties:
1. Each red node has 2 black children
2. The number of black nodes on
every root-to-leaf path is the same
• Theorem:
A red-black tree with
n nodes has height at
most: 2 log2(n + 1)
• A red-black tree is an encoding
of a 2-4 tree with n + 1 leaves
• Black height is at most log2(n + 1)
• Red nodes at most double this
height
Red-Black Trees
• Adding and removing in a red-black tree
simulates adding/deleting in a 2-4 tree
Red-Black Trees
• Adding and removing in a red-black tree
simulates adding/deleting in a 2-4 tree
• This results in a lot of cases
• To get fewer cases, we add an extra property:
• If u has a red child then u.left is red
Adding to a read black tree
• To add a new value to a red-black tree:
1. create a new red node u and insert it as
usual (as a leaf)
2. call insertFixup(u) to restore
1. no red-red edge
2. if u has a red child then u.left is red
• Each iteration of addFixup(u) moves u up in tree
• Finishes after O(log n) iterations in O(log n) time
Insertion cases
• Case 1:The new node N is the root.
• We color N as black.
N
• All the properties are still
satisfied.
?
?
?
?
?
?
Insertion cases
• Case 2:The parent P of the new node N is Black.
P
• All the properties are still
satisfied.
N
?
?
?
?
?
Insertion cases
• Case 3:The parent P of the new node N and the
uncle U are both red.
G
G
• Red property is not
satisfied.
• P and U become blacks.
• Path property is not
satisfied.
U
P
N
?
?
?
• P`s parent G become red. ?
• Are all the properties satisfied now?.
• The process is repeated recursively until
reach case 1
?
Insertion cases
• Case 4:The parent P of the new node N is red but the
uncle U is black.
P is the left child of G and N is the left child of P.
• Rotate to the right P.
P
G
P
N
1
3
4
G
N
U
5
2
• Change colors of P and G.
1
2
U
3
4
5
Insertion cases
• Case 5:The parent P of the new node N is red but the
uncle U is black.
P is the left child of G and N is the right child of P.
• Rotate to the left N and reach case 4
G
G
P
N
1
2
N
U
4
3
P
5
1
U
3
2
4
5
Removing from a read black tree
• To remove a value from a red-black tree:
1. remove a node w with 0 or 1 children
2. set u=w.parent and make u blacker
• red becomes black
• black becomes double-black
3. call removeFixup(u) to restore
1. no double-black nodes
2. if u has a red child then u.left is red
• Each iteration of addFixup(u) moves u up in tree
• Finishes after O(log n) iterations in O(log n) time
Removing simple cases
• If the node N to be removed has two children
we change it from its successor and remove the
successor (as in any binary tree).
• We can assume N has at
most one child.
• If N is red just remove it.
• If N`s child is red color it
black and remove N.
N
N
?
?
N
• All the properties are still satisfied.
?
?
Removing complex cases
• Both N and its child are black
• We remove N and replace it by its children
(we will call now N to its child and S to its new brother).
P
P
N
S
C
?
?
?
C
N
?
?
S
?
?
?
Insertion cases
• Case 1:N is the new root.
• Everything is done.
• All the properties are satisfied.
N
?
?
?
?
?
?
Insertion cases
• Case 2:The node S is red.
• Rotate to the left S and swap colors between S and P
P
S
N
1
P
S
Sl
2
3
N
Sr
4
5
Sr
6
1
• Is the path property satisfied?
• We pass to case 4, 5 or 6.
Sl
2
3
5
4
6
Insertion cases
• Case 3:All N, P, S and the children of S are black.
• We color S as red.
P
P
N
1
N
S
Sl
2
3
Sr
4
5
1
6
S
Sl
2
3
Sr
4
5
6
• Is the path property satisfied?
• We recursively repeat the checking process with node P
Insertion cases
• Case 4: N, S and the children of S are black but P is red.
• We swap the colors of nodes S and P.
P
P
N
1
N
S
Sl
2
3
Sr
4
5
1
6
S
Sl
2
3
• Is the path property satisfied?
• Yes all the properties are satisfied. Why?
Sr
4
5
6
Insertion cases
• Case 5: N is a left child of P and S and its right child are
black but its left child is black
• We rotate to the right at S.
P
P
N
Sl
N
S
S
1
Sl
2
3
Sr
4
5
1
6
• We swap colors of S and its parent.
• We move to the case 6
2
Sr
3
4
5
6
Insertion cases
• Case 5: N is a left child of P and S is black and its right
child is red.
• We rotate to the left at P.
S
P
N
1
P
S
2
3
Sr
Sr
N
4
5
3
4
5
1
2
• Set the right child of S to black and swap colors of P
and S.
• All the properties are satisfied.
Summary
− Key point: there exist data structures (2-4
trees and red-black trees) that support
SortedSet operations in O(log n) worst-case time
per operation
− Implementation difficulty is considerably higher
than Scapegoat trees/skiplists/treaps
− Look more closely at addFixup(u) and
removeFixup(u)
− Amortized analysis shows that they do only O(1)
work on average
Summary
− Key point: there exist data structures (2-4
trees and red-black trees) that support
SortedSet operations in O(log n) worst-case time
per operation
− Theorem: Starting with an empty red-black
tree, any sequence of m add(x)/remove(x)
operations performs only O(m) rotations and color
changes
− This is useful if want to apply persistence to
remember old versions of the tree for later use
Summary
− Skiplists:
find(x)/add(x)/remove(x) in O(log n) expected time
per operation.
− Treaps:
find(x)/add(x)/remove(x) in O(log n) expected time
per operation.
− Scapegoat trees:
find(x) in O(log n) worst-case time per operation,
add(x)/remove(x) in O(log n) amortized time per
operation.
− Red-black trees:
find(x)/add(x)/remove(x) in O(log n) worst-case time
per operation
− All structures, except scapegoat trees do O(1)
amortized (or expected) restructuring per
add(x)/remove(x) operation
Download