Augmenting AVL trees

advertisement
Augmenting AVL trees
How we’ve thought about trees so far
Good for determining ancestry
Can be good for quickly finding an element
Other kinds of uses?
• Any thoughts?
• Finding a minimum/maximum…
– (heaps are probably just as good or better)
• Finding an average?
• More complicated things?!!!11one
Enter: idea of
augmenting a tree
Augmenting
• Can quickly compute many global properties
that seem to need knowledge of the whole tree!
• Examples:
– size of any sub-tree
– height of any sub-tree
– averages of keys/values in a sub-tree
– min+max of keys/values in any sub-tree, …
• Can quickly compute any function f(x) so long
as you only need to know f(x.left) and f(x.right)!
Augmenting an AVL tree
• Can augment any kind of tree
• Only balanced trees are guaranteed to be fast
• After augmenting an AVL tree to compute
f(x), we can still do all operations in O(lg n)!
Simple first example
• We are going to do one simple example
A regular AVL tree
• Then, you will help with a harder
one!
already
does this
• Problem: augment an AVL tree so we can do:
– Insert(key): add key in O(lg n)
– Delete(key): remove key in O(lg n)
– Height(node): get height of sub-tree rooted at
node in O(1)
Store some extra data at
How do we do this?
each node… but what?
Can we compute this function quickly?
• Function we want to compute: Height(u) = H(u)
• If someone gives us H(uL) and H(uR),
can we compute H(u)?
u
• What formula should we use?
• If u = null then
– H(u) = 0
• Else
– H(u) = max{H(uL), H(uR)}+1
uL
uR
H(u)=?
H(uL)
H(uR)
Augmenting AVL tree to compute H(u)
• Each node u contains
– key: the key
The usual stuff…
– left, right: child pointers
– h: height of sub-tree rooted at u
• How?
d, 2
0
1
Secret sauce!
d, 2
Insert(d)
Insert(a)
a, 1
0
b, 0
1
e, 0
e, 0
Insert(e)
Insert(b)
Insert(c)
b, 0
a, 0
1
c, 0
c, 0
Algorithm idea:
• From the last slide, we develop an algorithm
Insert(u):
1 BST search for where to put u
2 Insert u into place like in a regular AVL tree
3 Fix balance factors and rotate as you would in
AVL insert, but fix heights at the same time.
(Remember to fix heights all the way to the root.
Don’t stop before reaching the root!)
• (When you rotate, remember to fix heights of all
nodes involved, just like you fix balance factors!)
Harder problem: overpaid employees!
• You have a record for
each employee in your
company with salary and
age.
• We want to quickly find
overpaid employees of
age <= A.
Breaking the problem down
• You must design a data structure D of
employee records to efficiently do:
– Insert(D; x): If x is a pointer to a new employee
record, insert the record pointed to by x into D.
– Delete(D; x): If x is a pointer to an existing
employee record in D, remove that record from D.
– MaxSal(D; A): Return the largest salary, amongst
all employees in D whose age is <= A; if there is no
such employee, then return -1.
• All functions must run in O(lg n)
Part 1
• Give a precise and full description of your data
structure
• Illustrate this data structure by giving an
example of it on some collection of employee
records of your own choice.
Figuring out the data structure
• Iterative process…
hard to get it right the first time!
• What function do we want to compute?
• Maximum salary for employees with age <= A
• What info should we store at each node u?
• Maybe maximum salary for all employees in the
sub-tree rooted at u? Let’s call this value MS(u).
• How can we compute MS(u) quickly (i.e., by
looking at a small number of descendents of u)?
Organizing nodes inside the tree
• How can we turn maximum salary in a sub-tree
into maximum salary for those with age <= A?
• For sure, need to relate age with sub-trees!
– Things are organized into sub-trees by key…
• So, age must be related to key.
– Let age be the key of each node…
– May have 2+ nodes with same key, but that’s okay.
Age
1100
1247
1590
1590
3801
17000
30000
30000
Salary
31000
29000
28000
48000
41000
36000
90000
75000
Answer to part 1
Age=17000,
Age=17000,
salary=36000,
MS=
MS=90000
MS=
Age=30000,
Age=24100,
salary=75000,
MS=
MS=90000
MS=
Age=1590,
Age=1590,
salary=28000,
MS=
MS=
MS=48000
Age=1100,
Age=1100,
salary=31000,
MS=
MS=31000
MS=
Age=1247,
Age=1247,
salary=29000,
MS=
MS=29000
MS=
Age=3801,
Age=3801,
salary=41000,
MS=
MS=48000
MS=
Age=1590,
Age=1590,
salary=48000,
MS=
MS=48000
MS=
Age=30000,
Age=30000,
salary=90000,
MS=
MS=90000
MS=
Part 2
• Explain how to implement Insert(D, x), Delete(D, x) and
MaxSal(D, A) in O(log n) time
What does “fix MS(u)” mean?
What formula do we use?
Insert:
1 Do regular AVL-insert(D, <x.age, x.salary, MS=0>), except:
2 As you move up the tree fixing balance factors, fix MS(u)
for every node on the path to the root
Set each
MS(u)
= max{u.salary,
MS(uR)}
3 After
rotation,
fix each MS(u)MS(u
for each
L),node
that was involved in the rotation
4 Remember to fix heights all the way to the root.
(Don’t stop early, even if done balance factors/rotations!)
Delete:
Same as Insert, but “Do regular AVL-delete[…]”
How do we do MaxSal???
How can we compute MaxSal?
• Write a recursive function!
– Recursion is often very natural for trees
• Big problem: want MaxSal(D, A) =
largest salary for age <= A in tree
• Smaller problem: MaxSal(u, A) =
largest salary for age <= A in sub-tree rooted at u
• Recursive/inductive measure of problem size:
size of sub-tree
• Therefore, smaller problem = smaller sub-tree.
Code for MaxSal
// u is a node, A is an age
u: salary, age, MS
MaxSal(u, A)
if u = null then
uR: salary, age, MS
uL: salary, age, MS
return -1
else if A < u.age then
Calling MaxSal on BOTH subreturn MaxSal(uL, A)
trees can take O(n) time!
else
return max{u.salary, MaxSal(uR,A), MaxSal(uR,A)}
Code for MaxSal
// u is a node, A is an age
u: salary, age, MS
MaxSal(u, A)
if u = null then
uR: salary, age, MS
uL: salary, age, MS
return -1
else if A < u.age then
return MaxSal(uL, A)
else
return max{u.salary, MaxSal(uR,A), MS(uL)}
Why O(lg n) time?
• Insert/Delete: normal AVL operation = O(lg n)
– PLUS: update MS(u) for each u on path to the root
• Length of this path <= tree height, so O(lg n) in an AVL tree
– PLUS: update MS(u) for each node involved in a
rotation
• At most O(lg n) rotations (one per node on the path from
the root <= tree height)
• Each rotation involves a constant number of nodes
• Therefore, constant * O(lg n), which is O(lg n).
• MaxSal
– Constant work + recursive call on ONE child
– Single recursive call means O(tree height) = O(lg n)
Download