CSE 830: Design and Analysis of Algorithms

advertisement
CSE 830:
Design and Theory of Algorithms
Dr. Eric Torng
TA: Carl Bussema
Many slides adapted from those used by Dr. Charles Ofria
Outline
• Definitions
– Algorithms
– Problems
• Course Objectives
• Administrative stuff …
• Analysis of Algorithms
What is an Algorithm?
Algorithms are the ideas behind computer programs.
An algorithm is the thing that stays the same whether the
program is in C++ running on a Cray in New York or is in
BASIC running on a Macintosh in Alaska!
To be interesting, an algorithm has to solve a general, specified
problem.
What is a problem?
• Definition
– A mapping/relation between a set of input instances
(domain) and an output set (range)
• Problem Specification
– Specify what a typical input instance is
– Specify what the output should be in terms of the input
instance
• Example: Sorting
– Input: A sequence of N numbers a1…an
– Output: the permutation (reordering) of the input
sequence such that a1  a2  …  an .
Types of Problems
Search: find X in the input satisfying property Y
Structuring: Transform input X to satisfy property Y
Construction: Build X satisfying Y
Optimization: Find the best X satisfying property Y
Decision: Does X satisfy Y?
Adaptive: Maintain property Y over time.
Two desired properties of algorithms
• Correctness
– Always provides correct output when presented
with legal input
• Efficiency
– Computes correct output quickly given input
Correctness
• Example: Traveling Salesperson Problem (TSP)
• Input: A sequence of N cities with the distances dij
between each pair of cities
• Output: a permutation (ordering) of the cities <c1’, …, cn’>
that minimizes the expression
Σj =1 to n-1 dj’,j’+1 + dn’,1’
• Which of the following algorithms is correct?
– Nearest neighbor: Initialize tour to city 1. Extend tour by visiting
nearest unvisited city. Finally return to city 1.
– All tours: Try all possible orderings of the points selecting the
ordering that minimizes the total length:
Efficiency
•
•
•
•
Example: Odd Number Problem
Input: A number n
Output: Yes if n is odd, no if n is even
Which of the following algorithms is most efficient?
– Count up to that number from one and alternate naming each
number as odd or even.
– Factor the number and see if there are any twos in the
factorization.
– Keep a lookup table of all numbers from 0 to the maximum
integer.
– Look at the last bit (or digit) of the number.
Outline
• Definitions
– Algorithms
– Problems
• Course Objectives
• Administrative stuff …
• Analysis of Algorithms
Course Objectives
1.
2.
3.
4.
5.
Details of classic algorithms
Methods for designing algorithms
Validate/verify algorithm correctness
Analyze algorithm efficiency
Prove (or at least indicate) no correct,
efficient algorithm exists for solving a
given problem
6. Writing clear algorithms and proofs
Classic Algorithms
•
•
Lots of wonderful algorithms have already
been developed
I expect you to learn most of this from
reading, though we will reinforce in
lecture
Algorithm design methods
•
•
•
Something of an art form
Cannot be fully automated
We will describe some general techniques
and try to illustrate when each is
appropriate
Algorithm correctness
•
•
Proving an algorithm generates correct
output for all inputs
One technique covered in textbook
–
•
Loop invariants
We will do some of this in the course, but
it is not emphasized as much as other
objectives
Analyzing algorithms
•
•
•
The “process” of determining how much
resources (time, space) are used by a given
algorithm
We want to be able to make quantitative
assessments about the value (goodness) of one
algorithm compared to another
We want to do this WITHOUT implementing
and running an executable version of an
algorithm
Proving hardness results
•
•
•
We believe that no correct and efficient
algorithm exists that solves many
problems such as TSP
We define a formal notion of a problem
being hard
We develop techniques for proving
hardness results
Clear Writing
• Methods for Expressing Algorithms
– Implementations
– Pseudo-code
– English
• Writing clear and understandable proofs
• My main concern is not the specific language used
but the clarity of your algorithm/proof
Outline
• Definitions
– Algorithms
– Problems
• Course Objectives
• Administrative stuff …
• Analysis of Algorithms
Algorithm Analysis Overview
•
•
•
RAM model of computation
Concept of input size
Measuring complexity
–
•
Best-case, average-case, worst-case
Asymptotic analysis
–
Asymptotic notation
The RAM Model
• RAM model represents a “generic”
implementation of the algorithm
• Each “simple” operation (+, -, =, if, call) takes
exactly 1 step.
• Loops and subroutine calls are not simple
operations, but depend upon the size of the
data and the contents of a subroutine. We do
not want “sort” to be a single step operation.
• Each memory access takes exactly 1 step.
Input Size
•
•
•
In general, larger input instances require
more resources to process correctly
We standardize by defining a notion of
size for an input instance
Examples
–
–
What is the size of a sorting input instance?
What is the size of an “Odd number” input
instance?
Algorithm Analysis Overview
•
•
•
RAM model of computation
Concept of input size
Measuring complexity
–
•
Best-case, average-case, worst-case
Asymptotic analysis
–
Asymptotic notation
Measuring Complexity
•
The running time of an algorithm is the function
defined by the number of steps (or amount of
memory) required to solve input instances of
size n
–
–
–
–
–
•
F(1) = 3
F(2) = 5
F(3) = 7
…
F(n) = 2n+1
Problem: Inputs of the same size may require
different numbers of steps to solve
3 different analyses
• The worst case running time of an algorithm is the
function defined by the maximum number of steps taken
on any instance of size n.
• The best case running time of an algorithm is the function
defined by the minimum number of steps taken on any
instance of size n.
• The average-case running time of an algorithm is the
function defined by an average number of steps taken on
any instance of size n.
• Which of these is the best to use?
Average case analysis
•
Drawbacks
–
Based on a probability distribution of input instances
•
•
–
•
The distribution may not be appropriate
Provides little consolation if we have a worst-case input
More complicated to compute than worst case
running time
Worst case running time is often comparable to
average case running time (see next graph)
–
Counterexamples to above point:
•
•
Quicksort
simplex method for linear programming
Best, Worst, and Average Case
Worst case analysis
•
Typically much simpler to compute as we do not
need to “average” performance on many inputs
–
•
•
•
Instead, we need to find and understand an input that
causes worst case performance
Provides guarantee that is independent of any
assumptions about the input
Often reasonably close to average case running
time
The standard analysis performed
Algorithm Analysis Overview
•
•
•
RAM model of computation
Concept of input size
Measuring complexity
–
•
Best-case, average-case, worst-case
Asymptotic analysis
–
Asymptotic notation
Motivation for Asymptotic Analysis
• An exact computation of worst-case running
time can be difficult
– Function may have many terms:
• 4n2 - 3n log n + 17.5 n - 43 n⅔ + 75
• An exact computation of worst-case running
time is unnecessary
– Remember that we are already approximating
running time by using RAM model
Simplifications
• Ignore constants
– 4n2 - 3n log n + 17.5 n - 43 n⅔ + 75 becomes
– n2 – n log n + n - n⅔ + 1
• Asymptotic Efficiency
– n2 – n log n + n - n⅔ + 1 becomes n2
• End Result: Θ(n2)
Why ignore constants?
• RAM model introduces errors in constants
– Do all instructions take equal time?
– Specific implementation (hardware, code
optimizations) can speed up an algorithm by constant
factors
– We want to understand how effective an algorithm is
independent of these factors
• Simplification of analysis
– Much easier to analyze if we focus only on n2 rather
than worrying about 3.7 n2 or 3.9 n2
Asymptotic Analysis
• We focus on the infinite set of large n
ignoring small values of n
• Usually, an algorithm that is asymptotically
more efficient will be the best choice for all
but very small inputs.
0

“Big Oh” Notation
• O(g(n)) =
{f(n) : there exist positive constants c and n0 such
that " n≥n0, 0 ≤ f(n) ≤ c g(n) }
– What are the roles of the two constants?
• n0:
• c:

0
n0
f(n) ≤ c g(n)
Set Notation Comment
• O(g(n)) is a set of functions.
• However, we will use one-way equalities like
n = O(n2)
• This really means that function n belongs to the
set of functions O(n2)
• Incorrect notation: O(n2) = n
• Analogy
– “A dog is an animal” but not “an animal is a dog”
Three Common Sets
f(n) = O(g(n)) means c  g(n) is an Upper Bound on f(n)
f(n) = (g(n)) means c  g(n) is a Lower Bound on f(n)
f(n) = (g(n)) means c1  g(n) is an Upper Bound on f(n)
and c2  g(n) is a Lower Bound on f(n)
These bounds hold for all inputs beyond some threshold n0.
O(g(n))
(g(n))
(g(n))
O(f(n)) and (g(n))
O( f (n)) 
1
100
( g (n)) 
1
25
n2
n
Example Function
f(n) =
2
3n
- 100n + 6
Quick Questions
c
3n2 - 100n + 6 = O(n2)
3n2 - 100n + 6 = O(n3)
3n2 - 100n + 6  O(n)
3n2 - 100n + 6 = (n2)
3n2 - 100n + 6  (n3)
3n2 - 100n + 6 = (n)
3n2 - 100n + 6 = (n2)?
3n2 - 100n + 6 = (n3)?
3n2 - 100n + 6 = (n)?
n0
Common Complexity Functions
Complexity
10
20
30
40
50
60
n
110-5 sec 210-5 sec 310-5 sec 410-5 sec 510-5 sec 610-5 sec
n2
0.0001 sec 0.0004 sec 0.0009 sec 0.016 sec
0.025 sec
0.036 sec
n3
0.001 sec
0.008 sec
0.027 sec
0.064 sec
0.125 sec
0.216 sec
n5
0.1 sec
3.2 sec
24.3 sec
1.7 min
5.2 min
13.0 min
2n
0.001sec
1.0 sec
17.9 min
12.7 days
35.7 years 366 cent
3n
0.59sec
58 min
6.5 years
3855 cent
2108cent 1.31013cent
log2 n
310-6 sec 410-6 sec 510-6 sec 510-6 sec 610-6 sec 610-6 sec
n log2 n 310-5 sec 910-5 sec 0.0001 sec 0.0002 sec 0.0003 sec 0.0004 sec
Example Problems
1. What does it mean if:
f(n)  O(g(n)) and g(n)  O(f(n))
2. Is 2n+1 = O(2n) ?
Is 22n = O(2n) ?
3. Does f(n) = O(f(n)) ?
4. If f(n) = O(g(n)) and g(n) = O(h(n)),
can we say f(n) = O(h(n)) ?
?
Extra Slides
• Slides illustrating TSP algorithms
• Case study with Insertion sort for Best,
Average, Worst case analysis
Possible Algorithm:
Nearest neighbor
Not Correct!
A Correct Algorithm
Try all possible orderings of the points selecting the
ordering that minimizes the total length:
d=
For each of the n! permutations, Pi of the n points,
if cost(Pi) < d then
d = cost(Pi)
Pmin = Pi
return Pmin
Case study: Insertion Sort
Count the number of times each line will be executed:
Num Exec.
for i = 2 to n
key = A[i]
j=i-1
while j > 0 AND A[j] > key
A[j+1] = A[j]
j = j -1
A[j+1] = key
Download