One-Time Binary Search Tree Balancing

advertisement
Spreadsheet-Aided Numerical Experimentation:
Analytic Formula for Fibonacci Numbers
Timothy J. Rolfe
Computer Science Department
Eastern Washington University
202 Computer Sciences Building
Cheney, WA 99004-2412
Timothy.Rolfe@ewu.edu
http://penguin.ewu.edu/~trolfe/
Abstract
Spreadsheet representations of recurrences allow numerical experimentation with potential
analytic solutions to those recurrences. This paper uses a very simple recurrence for which the
analytic solution is quite obvious when one examines the values generated by the recurrence, and
then examines another recurrence for which the solution is not obvious.
Easy Case: the Towers of Hanoi Recurrence
The recurrence for the number of disk movements during the solution of the Towers of Hanoi problem is the
following
Base Case:
Hanoi(0) = 0
NO disks move if there are no disks to move!
Recurrence:
Hanoi(n) = Hanoi(n–1) + 1 + Hanoi(n–1)
(a) Move all but one disk out of the way
(b) Move that disk to its destination
(c) Move the rest of the disks on top of it
If spreadsheet column A contains the values for n, and column B contains the values for Hanoi(n), then the
set-up is straight forward, and after the first row containing numbers (row 2, allowing row 1 for column labels), the
subsequent rows can contain formulas for recurrences: column A will have the successor recurrence (that is,
successor(n) = successor(n–1) + 1), while column B will have the Hanoi recurrence (Hanoi(n) = 2*Hanoi(n–1) + 1).
I will use row 3 as the specimen row and give the spreadsheet formulas in A3 and B3:
A3:
B3:
=1+A2
=1+B2*2
Spreadsheets typically provide an easy means to propagate values and formulas downward. Once we propagate
row 3 through rows 4 through 7, we see the following:
n
0
1
2
3
4
5
Hanoi(n)
0
1
3
7
15
31
Someone who has been working in the computer field for very long will immediately see that column B contains
values one less than the powers of 2, suggesting this solution: Hanoi(n) = 2n – 1. We can add a new column to the
spreadsheet, and write in our analytic solution so that it references only values in column A — specifically in this
case to the same row in column A.
C2:
=2^A2–1
Propagating that formula from row 2 down through row 7, we get the following:
n Hanoi(n) Formula
0
0
0
1
1
1
2
3
3
3
7
7
4
15
15
5
31
31
For a mathematician, of course, this only proves the formula for the range of n=0 to n=5. The final step is the
inductive analytical proof that the formula is indeed the analytic solution to the recurrence. The Towers of Hanoi
provides one of the standard recurrences used in teaching such inductive proofs.
More Difficult Case: The Fibonacci Recurrence
The Fibonacci recurrence is fairly similar to the Hanoi one, except that two previous values are involved, and we
don’t have the “+1” (though a closely related recurrence describing worst-case AVL tree depth even has the “+1”):
Base Cases:
Recurrence:
Fib(0) = 0
NO rabbits at the start
Fib(1) = 1
ONE immature breeding pair of rabbits
Fib(n+1) = Fib(n) + Fib(n–1)
(a) All the rabbits we just had
(b) Rabbits born to the mature breeding pairs
Again, spreadsheet column A contains the values for n, and column B contains the values for Fib(n). We have
two rows with numbers because we have two base cases: rows 2 and 3. Column A still has the successor recurrence,
while Column B contains the Hanoi recurrence.
A4:
B4:
=1+A3
=B3+B2
The values obtained, though, don’t show a sequence whose solution that jumps out at us.
n Fib(n)
0
0
1
1
2
1
3
2
4
3
5
5
One advantage of working with spreadsheets, though, is that they make it extremely easy to examine data
graphically. If we extend the series up to Fib(20) = 6765, and then plot the result, we see the following:
7000
6000
5000
4000
3000
2000
1000
0
0
5
10
15
20
That certainly looks like exponential explosion! In other words, for larger values of n, the function is dominated
by an exponential part — Approx(n) = k * cn.
One check is to compare ratios of adjacent values — they should approach c: (k * cn) / (k * cn-1) — everything
cancels but one c. We use Fib(n) itself as a backwards “estimate” of Approx(n). Thus, for row 4, we have
C4:
=B4/B3
We find that we converge to a particular value. “c” is the “golden ratio” ( 
1 5
), one of the solutions of
2
“ x2 = x + 1 ”
The next step is the discover the value of k —rearranging the Approx(n) equation above so as to isolate k:
k = Approx(n) / n — again using Fib(n) itself to stand in for Approx(n).
D4:
= B4/$G$1^A4
Note that cell G1 contains the calculated value of .
We achieve convergence this time as well, and playing around a little discover that “k” is “1/5” — not
surprising, considering the occurrence of “5” in  itself.
n Fib(n)
Adjacent
Estimate
0
0
Ratio
k-value
1
1
2
1
1
0.381966
3
2
2
0.472136
4
3
1.5
0.437694
5
5
1.666667
0.45085
6
8
1.6
0.445825
7
13
1.625
0.447744
8
21
1.615385
0.447011
9
34
1.619048
0.447291
10 55
1.617647
0.447184
Thus, we have obtained the following function:
Approx(n) = n /5
We can now generate a column using that approximation.
C7:
= $G$1^A7/$G$2
Note that cell G2 contains the calculated value for √5.
We see that it successively overshoots and undershoots the exact value, but by less and less as n increases. It’s
easy to handle an alternating sign, so let’s examine the absolute value of the error:
n Fib(n) Approx(n) | Error(n) |
0
0
0.447214 0.447214
1
1
0.723607 0.276393
2
1
1.17082 0.17082
3
2
1.894427 0.105573
4
3
3.065248 0.065248
5
5
4.959675 0.040325
6
8
8.024922 0.024922
7
13
12.9846 0.015403
8
21 21.00952 0.009519
9
34 33.99412 0.005883
10 55 55.00364 0.003636
Again we have what looks like exponential behavior, but in this case a decaying exponential. For that, we can
use exactly the same approach as described above to characterize that exponential. As it turns out, we end up with
exactly the same c and k, except that in this case we have c–n rather than c n. We can get the alternating sign from (–
1) –n.
n
n
Fib (n) 
 ( )

5
Again, for the mathematician, the spreadsheet can only prove equality for a finite set of values. It is necessary to
do the inductive analytical proof to establish the solution for all values.
In this proof, it will be necessary to use the special properties of :  –1 =  – 1, and  2 =  + 1.
Base cases:
Fib(0) = (  0 –  –0 ) / 5 = ( 1 – 1 ) / 5 = 0
Fib(1) = (  1 +  –1 ) / 5 = (  +  – 1 ) / 5
= 5 / 5
= 1
QED
substituting for  –1
definition of 
QED
The inductive proof is left as an exercise for the reader. Beyond the substitutions for  2 and  –1, it is mostly (as
usual) just a matter of algebra! This is an example of “strong induction” — to prove Fib(n+1) we need to use Fib(n)
and Fib(n-1), not just Fib(n).
The above formula can be simplified by taking advantage of the other root of the equation “ x 2 = x + 1 ”. Let
a 
1 5
. It turns out that using a simplifies the equation appreciably, because a = –1 /  :
2
 n  an *
Fib (n) 
5
A final side note of possible interest:
Fib (n) 

n
 cos( n )  n
provides an alternative continuous
5
form of the Fibonacci function even for negative n. For n < 0, it generates the same numeric values as Fib(n) for
*
Derived analytically in Gilles Brassard and Paul Bratley, Fundamental of Algorithmics (1996), pp. 120-21.
n > 0, but with alternating signs — consistent with a backwards recurrence Fib(n–1) = Fib(n+1) – Fib(n). That is,
for n < 0, the value of Fib is dominated by the term cos(n)  –n.
3
2
1
0
-4
-3
-2
-1
0
1
2
3
4
-1
-2
-3
Continuous Fib.
Discrete Fib.
A Messier Recurrence: Binary Search Analysis
If we want to find the average number of loop iterations to find an array entry using the binary search with early exit,
then we need to find the total of loop iterations to find all array entries for a given array size n, and then divide that
by n. The structure of the problem makes it easier to deal only with complete binary trees. In that case, where we
consider the binary tree root be at level 1 (rather than 0), we can write the summation (which is, of course, an
implicit recurrence) to find the total for a complete tree of height ht:
Q(ht ) 
k
1 ht
k 2

2 k 1
A colleague and good friend of mine, Dr. Brian Carlson (who is, unfortunately, no longer with us), discovered
an analytic solution to this while doing experimentation exactly like that discussed here.
ht
Q(ht )  (ht  1)  2  1
It is discussed in greater detail in an earlier paper in a companion ACM SIG newsletter (which has since ceased
publication — I don’t think it was this paper that did it in!).**
**
Timothy J. Rolfe, “Analytic Derivation of Comparisons in Binary Search”, SIGNUM Newsletter, Vol. 32, No. 4
(October 1997), pp. 15-19. Text available through http://penguin.ewu.edu/~trolfe/.
Download