CSC 430: Basics of Algorithm Analysis

advertisement
CSC 430: Basics of Algorithm
Runtime Analysis
1
What is Runtime Analysis?
• Fundamentally, it is the process of counting the
number of “steps” it takes for an algorithm to
terminate
• “Step” = primitive computer operation
–
–
–
–
Arithmetic
Read/Write memory
Make a comparison
…
• The number of steps needed depends on:
– The fundamental algorithmic structure
– The data structures you use
2
Other Types of Algorithm Analysis
•
•
•
•
•
Algorithm space analysis
Approximate-solution analysis
Parallel algorithm analysis
A PROBLEM’S run-time analysis
Complexity class analysis
5
Runtime Motivates Design
• Easiest Algorithm to implement for any
problem: Brute force.
– check every possible solution.
– Typically takes 2N time or worse for inputs of size
N.
– Unacceptable in practice.
• Usually, through insight, we can do much
better
• What do we mean by “better”?
6
Worst-Case Analysis
• Best case running time:
– Rarely used; often superfluous
• Average case running time:
– Obtain bound on running time of algorithm on random
input as a function of input size N.
– Hard (or impossible) to accurately model real instances by
random distributions.
– Acceptable in certain, easily-justified scenarios
• Worst case running time:
– Obtain bound on largest possible running time
– Generally captures efficiency in practice.
– Pessimistic view, but hard to find effective alternative.
7
Defining Efficiency:
Worst-Case Polynomial-Time
• Def. An algorithm is efficient if its worst-case running time is
polynomial or better.
• Not always true, but works in practice:
– 6.02  1023  N20 is technically poly-time
– Breaking through the exponential barrier of brute force typically
exposes some crucial structure of the problem.
– Most poly-time algorithms that people develop have low constants and
low exponents (e.g., N1 N2 N3).
• Exceptions.
– Sometimes the problem size demands greater efficiency than poly-time
– Some poly-time algorithms do have high constants and/or exponents,
and are useless in practice.
– Some exponential-time (or worse) algorithms are widely used because
the worst-case instances seem to be rare.
simplex method
Unix grep
9
2.2 Asymptotic Order of Growth
11
What is Asymptotic Analysis
• Instead of giving precise runtime, we’ll show
that algorithms are bounded within a constant
factor of common functions
• Three types of bounds
– Upper Bounds (never greater)
– Lower Bounds (never less)
– Tight Bounds (Upper and Lower are same)
12
Growth Notation
• Upper bounds (Big-Oh):
– T(n)  O(f(n)) if there exist constants c > 0 and n0  0
•such
Frequently
that for allabused
n  n0 wenotation:
have T(n)  c · f(n).
• T(n) = O(f(n)).
• Lower bounds
(Omega):
– Asymmetric:
– T(n)  (f(n)) if there
exist constants
c > 0 and n0  0
3
2
f(n)all= n5n ;ng(n)
=have
3n T(n)  c · f(n).
such that•for
we
0
• f(n) = O(n3) = g(n)
• but f(n)  g(n).
• Tight bounds (Theta):
• Better
notation:
O(f(n)).
– T(n)
 (f(n))
if T(n) is T(n)
both
O(f(n))
and (f(n)).
13
Use-Cases
• Big-Oh:
– “The algorithm’s runtime is
O(N lgN), a potential
improvement over prior
approaches, which were O(N2).”
• Omega:
– “Their algorithm’s runtime is
(N2), which is worse than our
algorithm’s proven O(N1.5)
runtime.”
• Theta:
– “Our original algorithm was (N
lgN) and O(N2). With
modifications, we were able to
improve the runtime bounds to
(N1.5)”
80000
70000
60000
50000
N^2
40000
Nlgn
N^1.5
30000
20000
10000
0
0
50
100
150
200
250
300
14
Properties
• Transitivity.
– If f  O(g) and g  O(h) then f  O(h).
– If f  (g) and g  (h) then f  (h).
– If f  (g) and g  (h) then f  (h).
• Additivity.
–
–
–
–
If f  O(h) and g  O(h) then f + g  O(h).
If f  (h) and g  (h) then f + g  (h).
If f  (h) and g  O(h) then f + g  (h).
These generalize to any number of terms
• “Symmetry”:
– If f  (h) then h  (f)
– If f  O(h) then h  (f)
– If f  (h) then h  O(f)
15
Asymptotic Bounds for Some Common
Functions
• Polynomials. a0 + a1n + … + adnd is (nd) if ad > 0.
• Polynomial time. Running time is O(nd) for some constant d independent
of the input size n.
• Logarithms. O(log a n) = O(log b n) for any constants a, b > 0.
can avoid specifying the base
• Logarithms. For every x > 0, log n = O(nx).
log grows slower than every polynomial
• Exponentials. For every r > 1 and every d > 0, nd = O(rn).
every exponential grows faster than every polynomial
16
Proving Bounds
• Use Transitivity:
– If an algorithm is at least as
fast as another, then it has
the same upper bound
• Use Additivity:
– f(n) = n3 + n2 + n1 + 3
– f(n)  (n3)
• Be direct:
– Start with the definition
and provide the parts it
requires
• Take the logarithm of
both functions
• Use limits:
c, then f ( x)  ( g ( x))
f ( x) 
lim
 0, then f ( x)  O ( g ( x))
x  g ( x )
, then f ( x)  ( g ( x))

• Keep in mind that
sometimes functions
can’t be bounded. (But I
won’t assign you any like
that!)
17
2.4 A Survey of Common
Running Times
19
Linear Time: O(n)
• Linear time. Running time is at most a
constant factor times the size of the input.
• Computing the maximum. Compute
maximum of n numbers a1, …, an.
def max(L):
max = L[0]
for i in L:
if i > max:
max = i
return max
20
Linear Time: O(n)
#the merge algorithm combines two
#sorted lists into one larger sorted
#list
def merge(A,B):
i = j = 0
M = []
while i < len(A) or j < len(B):
if A[i]  B[j]:
M.append(A[i])
i+=1
else:
M.append(B[j])
j+=1
if i == len(A):
return M + B[j:]
return M + A[i:]
• Claim. Merging two lists of size n takes O(n) time.
• Pf. After each comparison, the length of output list increases by 1.
21
O(n log n) Time
• O(n log n) time. Arises in divide-and-conquer algorithms.
• Sorting. Mergesort and heapsort are sorting algorithms
that perform O(n log n) comparisons.
• Largest empty interval. Given n time-stamps x1, …, xn on
which copies of a file arrive at a server, what is largest
interval of time when no copies of the file arrive?
• O(n log n) solution. Sort the time-stamps. Scan the sorted
list in order, identifying the maximum gap between
successive time-stamps.
22
Quadratic Time: O(n2)
•
Quadratic time. Enumerate all pairs of elements.
•
Closest pair of points. Given a list of n points in the plane (x1, y1), …, (xn, yn), find
the pair that is closest.
def closest_pair(L):
x1 = L[0]
x2 = L[1]
#no need to compute the sqrt for the distance
min = (x1[0] – x2[0])**2 + (x1[1] – x2[1])**2
for i in range(len(L)):
for j in range(i+1,len(L)) :
x1,x2 = L[i],L[j]
d = (x1[0] – x2[0])**2 + (x1[1] – x2[1])**2
if d < min :
min  d
return min
•
23
Remark. (n2) seems inevitable, but this is just an illusion.
see chapter 5
Cubic Time: O(n3)
• Cubic time. Enumerate all triples of elements.
• Set disjointness. Given n sets S1, …, Sn each of which is a
subset of
1, 2, …, n, is there some pair of these which are disjoint?
def find_disjoint_sets(S):
for i in range(len(S)):
for j in range(i+1,len(S)):
for p in S[i]:
if p in S[j]:
break;
else: #no p in S[i] belongs to S[j]
return S[i],S[j] #return tuple if found
return None #return nothing if all sets overlap
24
Polynomial Time: O(nk) Time
k is a constant
• Independent set of size k. Given a graph, are there k nodes such that no
two are joined by an edge?
• O(nk) solution. Enumerate all subsets of k nodes.
def independent_k(V,E,k):
#subset_k,independent omitted for space
for sub in subset_k(V,k):
if independent(sub,E):
return sub
return None
– Check whether S is an independent set = O(k2).
– Number of k element subsets = n  n (n 1) (n  2) (n  k 1)
nk

 
– O(k2 nk / k!) = O(nk).
k!
k  k (k 1) (k  2) (2) (1)
poly-time for k=17,
but not practical
25

Exponential Time
• Independent set. Given a graph, what is
maximum size of an independent set?
• O(2n) solution. Enumerate all subsets.
def max_independent(V,E):
for k in len(V):
if not independent_k(V,E,k):
return k-1
26
Download