ppt - Columbia University

advertisement
Algorithms and Complexity
Zeph Grunschlag
Copyright © Zeph Grunschlag,
2001-2002.
Announcements
HW 3 Due
First ½ of HW4 available.
Only first 6 problems of HW4 appear on
Midterm
Midterm 1, two weeks from today.
Late HW4’s can’t be accepted after
Friday, 3/1 (solutions go up early)
L8
2
Agenda
Section 1.8: Growth of Functions



Big-O
Big- (Omega)
Big- (Theta)
Section 2.1: Algorithms


Defining Properties
Pseudocode
Section 2.2: Complexity of Algorithms
L8
3
Section 1.8
Big-O, Big-, Big-
Big-O notation is a way of comparing
functions. Useful for computing
algorithmic complexity, i.e. the amount
of time that it takes for computer
program to run.
L8
4
Notational Issues
The notation is unconventional. Seemingly, an
equation is established; however, this is not
the intention.
EG: 3x 3 + 5x 2 – 9 = O (x 3)
Doesn’t mean that there’s a function O (x 3) and
that it equals 3x 3 + 5x 2 – 9. Rather the
example is read as:
“3x 3+5x 2 –9 is big-Oh of x 3”
Which actually means:
“3x 3+5x 2 –9 is asymptotically dominated by x 3”
L8
5
Intuitive Notion of Big-O
Asymptotic notation captures the behavior of
functions for large values of x.
EG: Dominant term of 3x 3+5x 2 –9 is x 3.
For small x not clear why x 3 dominates more
than x 2 or even x; however, as x becomes
larger and larger, other terms become
insignificant, and only x 3 remains in the
picture:
L8
6
Intuitive Notion of Big-O
domain – [0,2]
y = 3x 3+5x 2 –9
y=x3
y=x2
y=x
L8
7
Intuitive Notion of Big-O
domain – [0,5]
y = 3x 3+5x 2 –9
y=x3
y=x2
y=x
L8
8
Intuitive Notion of Big-O
domain – [0,10]
y = 3x 3+5x 2 –9
y=x3
y=x2
y=x
L8
9
Intuitive Notion of Big-O
domain – [0,100]
y = 3x 3+5x 2 –9
y=x3
L8
y=x2
y=x
10
Intuitive Notion of Big-O
In fact, 3x 3+5x 2 –9 is smaller than 5x
for large enough values of x:
3
y = 5x 3
y = 3x 3+5x 2 –9
y=x2
y=x
L8
11
Big-O. Formal Definition
The intuition motivates the idea that a function
f (x ) is asymptotically dominated by g (x ) if
some constant multiple of g (x ) is actually
bigger than f (x ) for all large x. Formally:
DEF: Let f and g be functions with domain R0
or N and codomain R. If there are constants
C and k such
 x > k, |f (x )|  C  |g (x )|
i.e., past k, f is less than or equal to a
multiple of g, then we write:
f (x ) = O ( g (x ) )
L8
12
Common Misunderstanding
It’s true that 3x 3 + 5x 2 – 9 = O (x 3) as we’ll
prove shortly. However, also true are:



3x 3 + 5x 2 – 9 = O (x 4)
x 3 = O (3x 3 + 5x 2 – 9)
sin(x) = O (x 4)
NOTE: In CS, use of big-O typically involves
mentioning only the most dominant term.
“The running time is O (x 2.5)”
Mathematically big-O is more subtle. It’s a way
of comparing arbitrary pairs of functions.
L8
13
Big-O. Example
EG: Show that 3x 3 + 5x 2 – 9 = O (x 3).
From the previous graphs it makes sense
to let C = 5. Let’s find k so that
3x 3 + 5x 2 – 9  5x 3 for x > k :
L8
14
EG: Show that
3x 3 + 5x 2 – 9 = O (x 3).
From the previous graphs it makes sense
to let C = 5. Let’s find k so that
3x 3 + 5x 2 – 9  5x 3 for x > k :
1. Collect terms: 5x 2 ≤ 2x 3 + 9
L8
15
EG: Show that
3x 3 + 5x 2 – 9 = O (x 3).
From the previous graphs it makes sense
to let C = 5. Let’s find k so that
3x 3 + 5x 2 – 9  5x 3 for x > k :
1. Collect terms: 5x 2 ≤ 2x 3 + 9
2. What k will make 5x 2 ≤ x 3 past k ?
L8
16
EG: Show that
3x 3 + 5x 2 – 9 = O (x 3).
From the previous graphs it makes sense
to let C = 5. Let’s find k so that
3x 3 + 5x 2 – 9  5x 3 for x > k :
1. Collect terms: 5x 2 ≤ 2x 3 + 9
2. What k will make 5x 2 ≤ x 3 past k ?
3. k = 5 !
L8
17
EG: Show that
3x 3 + 5x 2 – 9 = O (x 3).
From the previous graphs it makes sense
to let C = 5. Let’s find k so that
3x 3 + 5x 2 – 9  5x 3 for x > k :
1. Collect terms: 5x 2 ≤ 2x 3 + 9
2. What k will make 5x 2 ≤ x 3 past k ?
3. k = 5 !
4. So for x > 5, 5x 2 ≤ x 3 ≤ 2x 3 + 9
L8
18
EG: Show that
3x 3 + 5x 2 – 9 = O (x 3).
From the previous graphs it makes sense
to let C = 5. Let’s find k so that
3x 3 + 5x 2 – 9  5x 3 for x > k :
1. Collect terms: 5x 2 ≤ 2x 3 + 9
2. What k will make 5x 2 ≤ x 3 past k ?
3. k = 5 !
4. So for x > 5, 5x 2 ≤ x 3 ≤ 2x 3 + 9
5. Solution: C = 5, k = 5 (not unique!)
L8
19
Big-O. Negative Example
x 4  O (3x 3 + 5x 2 – 9) :
Show that no C, k can exist such that past k,
C (3x 3 + 5x 2 – 9)  x 4 is always true.
Easiest way is with limits (yes Calculus is
good to know):
L8
20
Big-O. Negative Example
x 4  O (3x 3 + 5x 2 – 9) :
Show that no C, k can exist such that past k,
C (3x 3 + 5x 2 – 9)  x 4 is always true.
Easiest way is with limits (yes Calculus is
good to know):
x4
x
lim
 lim
3
2
x  C (3 x  5 x  9)
x  C (3  5 / x  9 / x 3 )
L8
21
Big-O. Negative Example
x 4  O (3x 3 + 5x 2 – 9) :
Show that no C, k can exist such that past k,
C (3x 3 + 5x 2 – 9)  x 4 is always true.
Easiest way is with limits (yes Calculus is
good to know):
x4
x
lim
 lim
3
2
x  C (3 x  5 x  9)
x  C (3  5 / x  9 / x 3 )
x
1
 lim

 lim x  
x  C (3  0  0)
3C x
L8
22
Big-O. Negative Example
x 4  O (3x 3 + 5x 2 – 9) :
Show that no C, k can exist such that past k,
C (3x 3 + 5x 2 – 9)  x 4 is always true.
Easiest way is with limits (yes Calculus is
good to know):
x4
x
lim
 lim
3
2
x  C (3 x  5 x  9)
x  C (3  5 / x  9 / x 3 )
x
1
 lim

 lim x  
x  C (3  0  0)
3C x
Thus no-matter C, x 4 will always catch up and
eclipse C (3x 3 + 5x 2 – 9) •
L8
23
Big-O and limits
Knowing how to use limits can help to prove
big-O relationships:
LEMMA: If the limit as x   of the
quotient |f (x) / g (x)| exists (so is noninfinite) then f (x ) = O ( g (x ) ).
EG: 3x 3 + 5x 2 – 9 = O (x 3 ). Compute:
x3
1
1
lim 3
 lim

2
3
x  3 x  5 x  9
x  3  5 / x  9 / x
3
L8
…so big-O relationship proved.
24
Big- and Big-
Big- is just the reverse of big-O. I.e.
f (x ) = (g (x ))  g (x ) = O (f (x ))
So big- says that asymptotically f (x )
dominates g (x ).
Big- says that both functions dominate eachother so are asymptotically equivalent. I.e.
f (x ) = (g (x ))

f (x ) = O (g (x ))  f (x ) = (g (x ))
Synonym for f = (g): “f is of order g ”
L8
25
Useful facts
Any polynomial is big- of its largest
term

EG: x 4/100000 + 3x 3 + 5x 2 – 9 = (x 4)
The sum of two functions is big-O of
the biggest

EG: x 4 ln(x ) + x 5 = O (x 5)
Non-zero constants are irrelevant:

L8
EG: 17x 4 ln(x ) = O (x 4 ln(x ))
26
Big-O, Big-, Big-.
Examples
Q: Order the following from smallest to
largest asymptotically. Group together
all functions which are big- of each
other:
1
1
x
e
x
x  sin x, ln x, x  x , ,13  ,13  x, e , x , x
x
x
20
2
( x  sin x)( x  102), x ln x, x(ln x) , lg 2 x
L8
27
Big-O, Big-, Big-.
Examples
A:
1.1 x
2.13  1 x
3. ln x, lg 2 x (change of base formula)
4. x  sin x, x  x ,13  x
5. x ln x
2
6. x(ln x)
e
7. x
8. ( x  sin x)( x 20  102)
x
9. e
10. x x
L8
28
Incomparable Functions
Given two functions f (x ) and g (x ) it is
not always the case that one dominates
the other so that f and g are
asymptotically incomparable.
E.G:
f (x) = |x 2 sin(x)| vs. g (x) = 5x 1.5
L8
29
Incomparable Functions
2500
y = x2
2000
1500
y = |x 2 sin(x)|
y = 5x 1.5
1000
500
L8
30
0
0
5
10
15
20
25
30
35
40
45
50
Incomparable Functions
4
4
x 10
3.5
y = x2
3
2.5
y = 5x 1.5
2
y = |x 2 sin(x)|
1.5
1
0.5
0
0
L8
20
40
60
80
100
120
140
160
180
200
31
Example for Section 1.8
Link to example proving big-Omega of a
sum.
L8
32
Section 2.1
Algorithms and Pseudocode
By now, most of you are well adept at
understanding, analyzing and creating
algorithms. This is what you did in your (prerequisite) 1 semester of programming!
DEF: An algorithm is a finite set of precise
instructions for performing a computation or
solving a problem.
Synonyms for a algorithm are: program, recipe,
procedure, and many others.
L8
33
Hallmarks of Algorithms
The text book lays out 7 properties that an
algorithm should have to satisfy the notion
“precise instructions” in such a way that no
ambiguity arises:
1. Input. Spell out what the algorithm eats
2. Output. Spell out what sort of stuff it spits
out
3. Determinism (or Definiteness) At each
point in computation, should be able to tell
exactly what happens next
L8
34
Hallmarks of Algorithms
4. Correctness. Algorithm should do what it
5.
6.
7.
Q:
L8
claims to be doing.
Finiteness. Finite number of steps for
computation no matter what input.
Effectiveness. Each step itself should be
doable in a finite amount of time.
Generality. Algorithm should be valid on all
possible inputs.
Which of the conditions guarantee that an
algorithm has no infinite loops?
35
Hallmarks of Algorithms
A: Finiteness & Effectiveness
L8
36
Pseudocode
Starting out, students often find it confusing to
write pseudocode, but find it easy to read
pseudocode. Sometimes, students are so
anti-pseudocode that they’d rather write
compiling code in homework and exams.
This is a major waste of time!
I strongly suggest you use pseudocode. You
don’t have to learn the pseudo-Pascal in
Appendix 2 of the text-book.
A possible alternative: pseudo-Java…
L8
37
Pseudo-Java
Start by representing an algorithm as a Java
method. It’s clear that Java specific modifiers
such as final, static, etc. are not worth
worrying about. So consider the method:
int f(int[] a){
int x = a[0];
for(int i=1; i<a.length; i++){
if(x > a[i])
x = a[i];
}
return x;
}
Q: What does this algorithm do?
L8
38
Pseudo-Java
A: It finds the minimum of a sequence of
numbers.
int f(int[] a){
int x = a[0];
for(int i=1; i<a.length; i++){
if(x > a[i])
x = a[i];
}
return x;
}
Q: Is there an input that causes Java to throw
an exception?
L8
39
Pseudo-Java
A: Yes, arrays of length 0.
This illustrates one of the dangers of
relying on real code. Forces you to
worry about language specific
technicalities, exceptions, etc.
L8
40
Pseudo-Java
int f(int[] a){
int x = a[0];
for(int i=1; i<a.length; i++){
if(x > a[i])
x = a[i];
}
return x;
}
Q: Are there any statements that are too long
winded, or confusing to non-Java people?
L8
41
Pseudo-Java
A: Yes, several:
1. int f(int[] a)
 Non-Java people may not realize that this
says that input should be an array of int’s
 FIX (also helps with some problems below)
spell out exactly how the array looks and
use subscripts, which also tells us the size
of the array:
“integer f( integer_array (a1, a2, …, an) )”
 Type declaration notation “integer f()”
is a succinct way of specifying the output .
L8
42
Pseudo-Java
2. a[0]



L8
brackets annoying in index evaluation.
Why are we starting at 0?
FIX: Use subscripts: “ai” and start with
index 1.
43
Psuedo-Java
3. for(int i=1; i<a.length; i++)



Counter in for-loop is obviously an int
 Change “int i=1” to “i=1”
Really annoying if constantly have to refer to array length
using the period operator
 Subscripts fixed this. Now “a.length” becomes “n ”
Can’t we just directly iterate over whole sequence
and not worry about “i++” (that way don’t have
to remember peculiar for-loop syntax)
 Change whole statement to “for(i=1 to n)”
Q: Were there any unintended limitations that
using Java created?
L8
44
Pseudo-Java
A: Yes. We restricted the input/output to be of
type int. Probably wanted algorithm to
work on arbitrary integers, not just those that
can be encapsulated in 32 bits!
Java solution: use class
java.math.BigInteger
…NOT!
Way too gnarly! Just output “integer” instead
of int.
Resulting pseudo-code (also removed
semicolons):
L8
45
Pseudo-Java
integer f(integer_array (a1, a2, …, an) ){
x = a1
for(i =2 to n){
if(x > ai)
x = ai
}
return x
}
L8
46
Ill-defined Algorithms.
Examples.
Q: What’s wrong with each example below?
1. integer power(a, b) {…rest is okay…}
2. integer power(integer a, rational b)
{…rest is okay…}
3. boolean f(real x){
// “real” = real no.
for(each digit d in x){
if(d == 0)
return true;
}
return false;
}L8
47
Ill-defined Algorithms.
Examples.
A:
1. Input ill-defined.
2. Must be incorrect as output of
fractional power such as 2½ is usually
not an integer.
3. Number of steps infinite on certain
inputs. EG: 2.121212121212121212…
L8
48
Ill-defined Algorithms.
Examples.
Q: What’s wrong with:
String f(integer a) {
if(a == 0){
return “Zero”
OR return “The cardinality of {}”
}
}
L8
49
Ill-defined Algorithms.
Examples.
A: Two problems


L8
Non-deterministic –what should we
output when reading 0?
Non-general –what happens to the input
1?
50
Algorithm for Surjectivity
Give an algorithm that determines whether a
function from a finite set to another is onto.
Hint : Assume the form of the sets is as simple
as possible, e.g. sets of numbers starting
from 1. Algorithm doesn’t depend on type of
elements since could always use a “look-up
table” to convert between one form and
another. This assumption also creates
cleaner, clearer pseudocode.
L8
51
Algorithm for Surjectivity
boolean isOnto( function f: (1, 2,…, n)  (1, 2,…, m) ){
if( m > n ) return false
// can’t be onto
soFarIsOnto = true
for( j = 1 to m ){
soFarIsOnto = false
for(i = 1 to n ){
if ( f(i ) == j )
soFarIsOnto = true
if( !soFarIsOnto ) return false;
}
}
return true;
L8
52
}
Improved Algorithm for
Surjectivity
boolean isOntoB( function f: (1, 2,…, n)  (1, 2,…, m) ){
if( m > n ) return false // can’t be onto
for( j = 1 to m )
beenHit[ j ] = false; // does f ever output j ?
for(i = 1 to n )
beenHit[ f(i ) ] = true;
for(j = 1 to m )
if( !beenHit[ j ] )
return false;
return true;
}
L8
53
Section 2.2
Algorithmic Complexity
Q: Why is the second algorithm better
than the first?
L8
54
Section 2.2
Algorithmic Complexity
A: Because the second algorithm runs
faster. Even under the criterion of
code-length, algorithm 2 is better.
Let’s see why:
L8
55
Running time of 1st algorithm
boolean isOnto( function f: (1,
2,…, n)  (1, 2,…, m) ){
if( m > n ) return false
soFarIsOnto = true
for( j = 1 to m ){
soFarIsOnto = false
for(i = 1 to n ){
if ( f(i ) == j )
soFarIsOnto = true
if( !soFarIsOnto )
return false
}
}
return true;
}
L8
1 step OR:
1 step (assigment)
m loops: 1 increment plus
1 step (assignment)
n loops: 1 increment plus
1 step possibly leads to:
1 step (assignment)
1 step possibly leads to:
1 step (return)
possibly 1 step
56
Running time of 1st algorithm
WORST-CASE running time:
1 step (m>n) OR:
1 step (assigment)
m loops: 1 increment plus
1 step (assignment)
n loops: 1 increment plus
1 step possibly leads to:
1 step (assignment)
1 step possibly leads to:
1 step (return)
possibly 1 step
L8
Number of steps = 1 OR 1+
1+
m·
(1+ 1 +
n·
(1+1
+ 1
+1
+ 1
)
+1
)
= 1 (if m>n) OR 5mn+3m+2
57
Running time of 2nd algorithm
boolean isOntoB( function f: (1,
2,…, n)  (1, 2,…, m) ){
if( m > n ) return false
for( j = 1 to m )
beenHit[ j ] = false
for(i = 1 to n )
beenHit[ f(i ) ] = true
for(j = 1 to m )
if( !beenHit[ j ] )
return false
return true
}
L8
1 step OR:
m loops: 1 increment plus
1 step (assignment)
n loops: 1 increment plus
1 step (assignment)
m loops: 1 increment plus
1 step possibly leads to:
1 step
possibly 1 step
.
58
Running time of 2nd algorithm
WORST-CASE running time:
1 step (m>n) OR:
m loops: 1 increment plus
1 step (assignment)
n loops: 1 increment plus
1 step (assignment)
m loops: 1 increment plus
1 step possibly leads to:
1 step
possibly 1 step
.
L8
Number of steps = 1 OR 1+
m · (1+
1)
+ n · (1+
1)
+ m · (1+
1
+ 1)
+ 1
= 1 (if m>n) OR 5m + 2n + 2
59
Comparing Running Times
The first algorithm requires at most 5mn+3m+2
steps, while the second algorithm requires at
most 5m+2n+2 steps. In both cases, for worst
case times we can assume that m  n as this
is the longer-running case. This reduces the
respective running times to 5n 2+3n+2 and
5n+2n+2= 8n+2.
To tell which algorithm better, find the most
important terms using big- notation:

5n 2+3n+2 = (n 2) –quadratic time complexity

8n+2 = (n) –linear time complexity
WINNER
Q: Any issues with this line of reasoning?
L8
60
Comparing Running Times.
Issues
1. Inaccurate to summarize running times
5n 2+3n+2 , 8n+2 only by biggest term. For
example, for n=1 both algorithms take 10
steps.
2. Inaccurate to count the number of “basic
steps” without measuring how long each
basic step takes. Maybe the basic steps of
the second algorithm are much longer than
those of the first algorithm so that in
actuality first algorithm is faster.
L8
61
Comparing Running Times.
Issues
3. Surely the running time depends on the
platform on which it is executed. E.g., Ccode on a Pentium IV will execute much
faster than Java on a Palm-Pilot.
4. The running time calculations counted many
operations that may not occur. In fact, a
close look reveals that we can be certain
the calculations were an over-estimate since
certain conditional statements were
mutually exclusive. Perhaps we overestimated so much that algorithm 1 was
actually a linear-time algorithm.
L8
62
Comparing Running Times.
Responses
1. Big- inaccurate: Quadratic time Cn 2
will always take longer than linear
time Dn for large enough input, no
matter what C and D are;
furthermore, it is the large input sizes
that give us the real problems so are
of most concern.
L8
63
Comparing Running Times.
Responses
2. “Basic steps” counting inaccurate: True that
we have to define what a basic step is.
EG: Does multiplying numbers constitute a
basic step or not. Depending on the
computing platform, and the type of
problem (e.g. multiplying int’s vs.
multiplying arbitrary integers) multiplication
may take a fixed amount of time, or not.
When this is ambiguous, you’ll be told
explicitly what a basic step is.
Q: What were the basic steps in previous
algorithms?
L8
64
Comparing Running Times
A: Basic steps—
Assignment
Increment
Comparison
Negation
Return
Random array access
Function output access
Each may in fact require a different number bit
operations –the actual operations that can
be carried out in a single cycle on a
processor. However, since each operation is
itself O (1) --i.e. takes a constant amount of
time, asymptotically as if each step was in
fact 1 time-unit long!
L8
65
Comparing Running Times.
Issues
3. Platform dependence: Turns out there is usually a
4.
L8
constant multiple between the various basic
operations in one platform and another. Thus, bigO erases this difference as well.
Running time is too pessimistic: It is definitely true
that when m > n the estimates are over-kill. Even
when m=n there are cases which run much faster
than the big-Theta estimate. However, since we
can always find inputs which do achieve the bigTheta estimates (e.g. when f is onto), and the
worst-case running time is defined in terms of the
worst possible inputs, the estimates are valid.
66
Worst Case vs. Average Case
The time complexity described above is worst case
complexity. This kind of complexity is useful when
one needs absolute guarantees for how long a
program will run. The worst case complexity for a
given n is computed from the case of size n that
takes the longest.
On other hand, if a method needs to be run repeatedly
many times, average case complexity is most
suitable. The average case complexity is the avg.
complexity over all possible inputs of a given size.
Usually computing avg. case complexity requires
probability theory.
Q: Does one of the two surjectivity algorithms perform
better on average than worst case?
L8
67
Worst Case vs. Average Case
A: Yes. The first algorithm performs
better on average. This is because
surjective functions are actually rather
rare, and the algorithm terminates early
when a non-hit element is found near
the beginning.
With probability theory will be able to
show that when m = n, the first
algorithm has O (n) average complexity.
L8
68
Big-O
A Grain of Salt
Big-O notation gives a good first guess for
deciding which algorithms are faster. In
practice, the guess isn’t always correct.
Consider time functions n 6 vs. 1000n 5.9.
Asymptotically, the second is better. Often
catch such examples of purported advances
in theoretical computer science publications.
The following graph shows the relative
performance of the two algorithms:
L8
69
Running-time
In days
Big-O
A Grain of Salt
T(n) =
1000n 5.9
Assuming each operation
takes a nano-second, so
computer runs at 1 GHz
T(n) = n
6
Input size n
L8
70
Big-O
A Grain of Salt
In fact, 1000n 5.9 only catches up to n 6
when 1000n 5.9 = n 6, i.e.:
1000= n 0.1, i.e.:
n = 100010 = 1030 operations
= 1030/109 = 1021 seconds
 1021/(3x107)  3x1013 years
 3x1013/(2x1010)
 1500 universe lifetimes!
L8
71
Download