Algorithm Efficiency

advertisement
Algorithm Efficiency


There are often many approaches (algorithms) to
solve a problem. How do we choose between them?
At the heart of a computer program design are two
(sometimes conflicting) goals:
1. To design an algorithm that is easy to understand, code,
and debug.
2. To design an algorithm that makes efficient use of the
computer’s resources.



Goal 1 is the concern of Software Engineering.
Goal 2 is the concern of data structures and algorithm
analysis.
When goal 2 is important, how do we measure an
algorithm’s cost?
How to Measure Efficiency?

Empirical comparison (run the programs).




Only valid for that machine.
Only valid for that compiler.
Only valid for that coding of the algorithm.
Asymptotic Algorithm Analysis

Must identify critical resources





time - where we will concentrate
space
Identify factors affecting that resource
For most algorithms, running time depends on “size” of
the input.
Running time is expressed as T(n) for some function T
on input size n.
Examples of Growth Rate
Example 1:
int largest (int* array, int n) {
int currlarge = 0;
for (int I=0; I<n; I++)
if (array[I]>currlare)
currlarge=array[I];
return currlarge;
}
Example 2:
sum = 0;
for (I=1; I<=n; I++)
for (j=1; j<=n; j++)
sum++;
Growth Rate Graphs
1750
2n^2
2^n
1500
5n log n
1250
20 n
1000
750
10 n
500
250
10
20
30
40
50
Expanded View
400
2^n
2n^2
350
20 n
300
250
200
5n log n
10 n
150
100
50
2
4
6
8
10
12
14
Best, Worst and Average Cases


Not all inputs of a given size take the same time.
Sequential search for K in an array of n integers:






Begin at the first element in array and look at each
element in turn until K is found.
Best Case:
Worst Case:
Average Case:
While average time seems to be the fairest
measure, it may be difficult to determine.
When is the worst case time important?

Time critical events (real time processing).
Faster Computer or Faster Algorithm?
What happens when we buy a computer 10
times faster?
f(n)
n
n’
change
n’/n
10n
1,000 10,000 n’=10n
10
20n
500 5,000 n’=10n
10
5n log n 250 1,842 sqrt(10)n<n’<10n 7.37
2n2
70
223
n’=sqrt(10)n 3.16
2n
13
16
n’=n+3
>1
n:Size of input that can be processed in 1 hour
(10,000 steps).
n’: Size of input that can be processed in one
hour on the new machine (100,000 steps).
Asymptotic Analysis: Big-oh




Definition: T(n) is in the set O(f(n)) if there exist
two positive constants c and n0 such that |T(n)|<=
c|f(n)| for all n>n0.
Usage: the algorithm is in O(n2) in [best, average,
worst] case.
Meaning: for all data sets big enough (i.e., n>n0),
the algorithm always executes in less than c|f(n)|
steps in [best, average, worst] case.
Upper Bound:


Example: if T(n)=3n2 then T(n) is in O(n2).
Tightest upper bound:

T(n)=3n2 is in O(n3), we prefer O(n2).
Big-oh Example

Example 1. Finding the value X in an array.



Example 2. T(n)=c1n2+c2n in the average case



T(n)=csn/2.
For all values of n>1, |csn/2|<=cs|n|. Therefore, by the
definition, T(n) is in O(n) for n0=1 and c=cs.
| c1n2+c2n |<=| c1n2+c2n2 |<=(c1+c2)|n2| for all n>1.
Therefore, T(n) is in O(n2).
Example 3: T(n)=c. This is in O(1).
Big-Omega




Definition :T(n) is in the set Ω(g(n)) if there exist
two positive constants c and n0 such that |T(n)| >=
c|g(n)| for all n>n0.
Meaning: For all data sets big enough (i.e., n>n0 ),
the algorithm always executes in more than c|g(n)|
steps.
It is a LOWER bound.
Example: T(n)=c1n2+c2n




| c1n2+c2n |>=| c1n2 | for all n>1.
|T(n)|>= c|n2| for c=c1 and n0=1.
Therefore, T(n) is in Ω(n2) by the definition
We want the greatest lower bound.
Theta Notation



When big-Oh and Ω are the same for an algorithm, we
indicate this by using Θ (big-Theta) notation.
Definition: an algorithm is said to be Θ(h(n)) if it is in
O(h(n)) and it is in Ω (h(n)).
Simplifying rules:




if f(n) is in O(g(n)) and g(n) is in O(h(n)) then f(n) is in
O(h(n)).
if f(n) is in O(kg(n)) for any constant k>0, then f(n) is in
O(g(n)).
if f1(n) is in O(g1(n)) and f2(n) is in O(g2(n)), then
(f1+f2)(n) is in O(max(g1(n),g2(n))).
if f1(n) is in O(g1(n)) and f2(n) is in O(g2(n)), then f1(n)f2(n)
is in O(g1(n)g2(n)).
Big O rules

If T1(n)=O(f(n)) and T2(n)=O(g(n)) then




T1(n)+T2(n)=max( O(f(n)), O(g(n)) )
T1(n)*T2(n)=O(f(n)) * O(g(n))
If T(n) is a polynomial of degree k then
T(n)= Θ(nk).
logkn = O(n) for any constant k. Logarithms
grow very slowly.
General Algorithm Analysis Rules




The running time of a for loop is at most the
running time of the statements inside the for
loop times the number of iterations.
Analyze nested loops inside out. Then apply the
previous rule.
Consecutive statements just add (so apply the
max rule).
The running time of a if/else statement is never
more than the running time of the test plus the
larger of the times of the true and false case.
Running Time of a Program

Example 1: a=b;


this assignment statement takes constant time, so it is
Θ(1)
Example 2:
sum=0;
for (I=1; I<=n; I++)
sum+=n;

Example 3:
sum=0;
for (j=1; j<=n; j++)
for (I=1; I<=j; I++)
sum++;
for (k=1; k<=n; k++)
a[k]=k-1;
More Examples

Example 4:
sum1=0;
for (I=1; I<=n; I++)
for (j=1; j<=n; j++)
sum1++;
sum2=0;
for (I=1; I<=n; I++)
for (j=1; j<=I; j++)
sum2++:

Example 5:
sum1=0;
for (k=1; k<n; k*=2)
for (j=1; j<=n; j++)
sum1++:
sum2=0;
for (k=1; k<=n; k*=2)
for (j=1; j<=k; j++)
sum2++;
Binary Search
int binary (int value, int* array, int size)
{
int left=-1;
int right=size;
while (left+1!= right)
{
int mid=(left+right)/2;
if (value < array[mid]) right=mid;
else if (value>array[mid]) left=mid;
else return mid;
}
return -1;
}
Binary Search Example
Position
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Key
11
13
21
26
29
36
40
41
45
51
54
56
65
72
77
83
Now let’s search for the
value 45.
Unsuccessful Search
Position
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Key
11
13
21
26
29
36
40
41
45
51
54
56
65
72
77
83
Now let’s search for
the value 24
How many elements are examined in the worse case?
Case Study – Maximum Subsequence


Given a sequence of integers a1, a2,…, an,
find the subsequence that gives you the
largest sum.
Since there is no size limit on the
subsequence, then if the sequence is all
positive or all negative then the solution is
trivial.
Simple Solution
Look at all possible combinations of start
and stop positions of the subsequence.
for (i=0 i<n; i++)
for (j=i; j<n; j++)
{
thissum=0;
for (k=i; k<=j;k++)
thissum=thissum+a[k];
if (thissum>maxsum) maxsum=thissum; }

Analysis of Simple Solution


Inner loop is executed j-i+1 times.
The middle loop changes j from i to n-1.



Looking at the j-I+1 when j goes from I to n-1,
we have 1+2+…+(n-i).
So this is done (n-i+1)(n-i)/2 times.
(n2-2ni+i2+n-i)/2
More Analysis

The outer loop changes i from 0 to n-1.







n2 summed n times is n3
The sum of 2ni when i changes from 0 to n-1 is 2n(n1)(n)/2 = n3-n2
The sum of i2 when i changes from 0 to n-1 is (n1)(n)(2n-1)/6 = (2n3-3n2-n)/6
The sum of n when i changes is n2.
The sum of i when i changes from 0 to n-1 is (n-1)(n)/2
= (n2-n)/2.
Total is (n3+ n3-n2+ (2n3-3n2-n)/6+ n2+ (n2-n)/2)/6
This is O(n3).
An improved Algorithm
Start at position i and find the sum of all
subsequences that start at position i. Then repeat
for all starting positions.
for (i=0 i<n; i++)
{ thissum=0;
for (j=i; j<n; j++)
{
thissum=thissum+a[j];
if (thissum>maxsum) maxsum=thissum;}}

Analysis of Improved Algorithm





The inner loop goes from i to n-1.
When i is 0, this is n times
When i is 1 it is n-1 times
Until i is n-1 then 1 time
Summing this up backwards, we get
1+2+…+n = n(n+1)/2=(n2+n)/2= O(n2)
Final great algorithm
for (j=0; j<n; j++)
{
thissum=thissum+a[j];
if (thissum>maxsum)maxsum=thissum;
else
if(thissum<0) thissum=0;
}
O(n)
Analyzing Problems







Upper bound: The upper bound of best known
algorithm to solve the problem.
Lower bound: The lower bound for every possible
algorithm to solve that problem, even unknown
algorithms.
Example: Sorting
Cost of I/O: Ω(n)
Bubble or insertion sort O(n2)
A better sort (Quicksort Mergesort, Heapsort) O(n
log n)
We prove in chapter 8 that sorting is Ω(n log n)
Multiple Parameters



Compute the rank ordering for all C pixel values in
a picture of P pixels.
Monitors have a fixed number of colors (256, 16M,
64M).
Need to count the number of each color and
determine the most used and least used colors.
for (i=0; i<C; i++)
count[i]=0;
for (i=0; i<P; i++)
count[value[i]]++;
sort(count);



If we use P as the measure, then time is O(P logP)
Which is bigger, C or P? 600x400=2400; 1024x1024=1M
More accurate is O(P + C log C)
Space Bounds



Space bounds can also be analyzed with asymptotic
complexity analysis.
 Time: Algorithm
 Space: Data Structure
Space/Time Tradeoff Principle:
 One can often achieve a reduction in time if one is willing
to sacrifice space, or vice versa.
 Encoding or packing information
 Boolean flags- takes one bit, but a byte is the
smallest storage, so pack 8 booleans into 1 byte.
Takes more time, less space.
 Table Lookup
 Factorials - Compute once, use many times
Disk based Space/Time Tradeoff Principle:
 The smaller you can make your disk storage
requirements, the faster your program will run. Disk is
slow.
Download