Algorithmic Efficiency

advertisement
Algorithms


Step-by-step instructions that tell a computing agent
how to solve some problem using only finite resources
Resources


Memory
CPU cycles


Time/Space
Types of instructions


Sequential
Conditional


If statements
Iterative

Loops
2
Pseudocode: The Interlingua for
Algorithms
an English-like description of the
sequential, conditional, and iterative
operations of an algorithm
 no rigid syntax. As with an essay, clarity
and organization are key. So is
completeness.

3
Pseudocode Example
Find Largest Number
Input: A list of positive numbers
Output: The largest number in the list
Procedure:
1. Set Largest to zero
2. Set Current-Number to the first in the list
3. While there are more numbers in the list
3.1 if (the Current-Number > Largest) then
3.1.1 Set Largest to the Current-Number
3.2 Set Current-Number to the next one in the list
4. Output Largest
4
Pseudocode Example
Find Largest Number
Input: A list of positive numbers
Output: The largest number in the list
Procedure:
conditional
1. Set Largest to zero
operation
2. Set Current-Number to the first in the list
3. While there are more numbers in the list
3.1 if (the Current-Number > Largest) then
3.1.1 Set Largest to the Current-Number
3.2 Set Current-Number to the next one in the list
4. Output Largest
5
Pseudocode Example
Find Largest Number
Input: A list of positive numbers
Output: The largest number in the list
iterative
Procedure:
operation
1. Set Largest to zero
2. Set Current-Number to the first in the list
3. While there are more numbers in the list
3.1 if (the Current-Number > Largest) then
3.1.1 Set Largest to the Current-Number
3.2 Set Current-Number to the next one in the list
4. Output Largest
6
Pseudocode Example
Find Largest Number
Input: A list of positive numbers
Output: The largest number in the list
Procedure:
1. Set Largest to zero
2. Set Current-Number to the first in the list
3. While there are more numbers in the list
3.1 if (the Current-Number > Largest) then
3.1.1 Set Largest to the Current-Number
3.2 Set Current-Number to the next one in the list
4. Output Largest
Let’s “play computer” to review this algorithm…
7
Algorithms vary in efficiency
example: sum the numbers from 1 to n
Algorithm I
1. Set sum to 0
2. Set currNum to 1
3. Repeat until currNum > n
4.
Set sum to sum + currNum
5.
Set currNum to currNum + 1
efficiency
space= 3 memory cells
time = t(step1) +
t(step 2) +
n t(step 4) +
n t(step 5)
• space requirement is constant (i.e. independent of n)
• time requirement is linear (i.e. grows linearly with n).
This is written “O(n)”
8
to see this graphically...
Algorithm Is time requirements
time
y = mx + b
time = m n + b
0 1 2
…
n
size of the problem
The exact equation for the line is unknown
because we lack precise values for the constants m and b.
But, we can say:
time is a linear function of the size of the problem
time = O(n)
9
Algorithm II for summation
First, consider a specific case: n = 100.
The “key insight”, due to Gauss:
the numbers can be grouped into
50 pairs of the form:
1 + 100 = 101
2 + 99 = 101
sum = 50 x 101
...
50 + 51 = 101
}
This algorithm
requires a single
multiplication!
Second, generalize the formula for any (even) n :
sum = (n / 2) (n + 1)
Time requirement is constant.
time = O(1)
10
Sequential Search:
A Commonly used Algorithm




Suppose you want to a find a student in the UT
directory.
It contains EIDs, names, phone numbers, lots
of other information.
You want a computer application for searching
the directory: given an EID, return the student’s
phone number.
You want more, too, but this is a good start…
11
Sequential Search of a student database
name
1 John Smith
2 Paula Jones
3 Led Belly
.
.
.
n
Chuck Bin
algorithm
to search database
by EID :
EID
major
JS456
PJ123
LEB900
physics
history
.
.
.
.
.
.
CB1235
music
math
credit hrs.
36
125
72
.
.
.
89
1. ask user to enter EID to search for
2. set i to 1
3. set found to ‘no’
4. while i <= n and found = ‘no’ do
1. if EID = EIDi then set found to ‘yes’
else increment i by 1
7. if found = ‘no’ then print “no such student”
else < student found at array index i >
12
Time requirements for
sequential search
• best case (minimum amount of work):
EID found in student1
one loop iteration
• worst case (maximum amount of work):
EID found in studentn
n loop iterations
• average case (expected amount of work):
EID found in studentn/2
n/2 loop iterations
amount
of work
because the amount of work is a constant multiple of n,
the time requirement is O(n)
in the worst case and the average case.
13
O(n) searching is too slow
Consider searching UT’s student database using
sequential search on a computer capable of
20,000 integer comparisons per second:
n = 150,000 (students registered during past 10 years)
average case
150,000
2
comparisons x
1
seconds
20,000 comparisons
= 3.75 seconds
worst case
150,000 comparisons x
1
seconds = 7.5 seconds
20,000 comparisons
Bad news for searching NYC phone book, IRS database,14...
Searching an ordered list is faster: an
example of binary search
name
1 John Smith
2 Paula Jones
3 Led Belly
.
.
.
n Chuck Bin
student
24576
36794
42356
major
credit hrs.
physics
history
music
36
125
72
.
.
.
.
.
.
.
.
.
93687
math
89
note: the
student array
is sorted in
increasing
order
how would you
search for 58925 ?
38453 ?
46589 ?
student
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
24576
36794
38453
41200
43756
45987
47865
49277
51243
58925
59845
60011
60367
64596
86756
93687
Probe 1
Probe 3
Probe 2
15
The binary search algorithm
assuming that the entries in student are sorted in increasing order,
1. ask user to input studentNum to search for
2. set found to ‘no’
What does this mean?
3. while not done searching and found = ‘no’
4.
set middle to the index counter at the middle of the student list
5. if studentNum = studentmiddle then set found to ‘yes’
6. if studentNum < studentmiddle then chop off the last half
of the student list
How?
7. If studentNum > studentmiddle then chop off the first half
of the student list
How?
8. if found = ‘no’ then print “no such student”
else <studentNum found at array index middle>
16
The binary search algorithm
assuming that the entries in student are sorted in increasing order,
1. ask user to input studentNum to search for
2. set beginning to 1
3. set end to n
4. set found to ‘no’
5. while beginning <= end and found = ‘no’
6. set middle to (beginning + end) / 2 {round down to nearest integer}
7. if studentNum = studentmiddle then set found to ‘yes’
8. if studentNum < studentmiddle then set end to middle - 1
9. if studentNum > studentmiddle then set beginning to middle + 1
10.if found = ‘no’ then print “no such student”
else <studentNum found at array index middle>
17
Time requirements for binary
search
At each iteration of the loop, the algorithm cuts the list
(i.e. the list called student) in half.
In the worst case (i.e. when studentNum is not in the list called student)
how many times will this happen?
n = 16
1st iteration 16/2 = 8
2nd iteration 8/2 = 4
3rd iteration 4/2 = 2
4th iteration 2/2 = 1
the number of times a number n
can be cut in half and not go below 1
is log2 n.
Said another way:
log2 n = m is equivalent to 2m = n
In the average case and the worst case, binary search is O(log2 n) 18
This is a major improvement
n
sequential search
O(n)
binary search
O(log2 n)
100
100
150,000
150,000
20,000,000 20,000,000
7
18
25
number of comparisons
needed in the worst case
27=128
218=262,144
225 is about
33,000,000
in terms of seconds...
sequential search:
vs.
binary search:
150,000 comparisons x
18
comparisons x
1
second
= 7.5 seconds
20,000 comparisons
1
second
> .001 seconds
20,000 comparisons
19
Sorting a list
First, an algorithm that’s easy to write, but is badly inefficient...
1
2
3
4
5
unsorted
35467
67854
46781
13528
87341
sorted
initially
1
2
3
4
5
unsorted
35467
67854
46781
13528
87341
sorted
13528
1st iteration
1
2
3
4
5
unsorted
35467
67854
46781
13528
87341
sorted
13528
35467
2nd iteration
1
2
3
4
5
unsorted
35467
67854
46781
13528
87341
sorted
13528
35467
46781
3rd iteration
etc.
20
The “Simple Sort” Algorithm
given a list of positive numbers, unsorted1, …, unsortedn and
another list, sorted1, …, sortedn, with all values initially set to zero
1. set i to 1
2. repeat until i > n
1. set indexForSmallest to the index of the smallest
positive value in unsorted
4.
set sortedi to unsortedindexForSmallest
5.
set unsortedindexForSmallest to 0
6.
increment i
21
This algorithm is expensive!
Time requirement
total time = n iterations x time per iteration
time per iteration = time to find smallest value in a list of length n
= O(n)
total time = n x O(n)
= O(n2)
Space requirement
total space = space for unsorted + space for sorted
= O(2n)
22
Creating Algorithms is the
Challenge of Computer Science
It’s not easy; try this one:
The Traveling Salesperson problem:

A salesperson wants to visit 25 cities
while minimizing the total number of
miles driven, visiting each city exactly
once, and returning home again. Which
route is best?
23
Simplify the Problem to get
an intuition about it
A
B
four cities connected by roads
C
D
Q: Starting at A, what’s the shortest route that meets the requirements?
A: Obvious to anyone who looks at the entire map. Not so obvious to an
algorithm that “sees”, at any one time, only one city and its roads
start at A,
visit B, C, and D in some order,
then return to A
One algorithm to answer the question:
1. generate all possible routes of length 5
2. check each path to determine whether it meets the requirement
How much time does this algorithm require?
24
All Paths from A of length 5
A
A
B
C
D
B
C
A
B
D
C
B
A
C
B
D
C
B
C
A D A D A D A D A D A D A D A D
Number of paths to generate and check is 24 = 16.
25
Can you Improve the Algorithm?

Prune bad routes as soon as possible.
What’s a “bad route?”

Look for good solutions, even if they’re
sub-optimal. What’s a “good solution?”
26
This gets real bad, real fast!

In general, the algorithm’s time requirement is:
(the number of roads in&out of a city) number of cities



Assuming the number of roads is only 2, the time
requirement is 2number of cities , given by the powers of
2 table.
Assuming a computer could evaluate 10 million
routes per second, finding the best route for 25
cities would require about 3.5 seconds. No
problem!
However, finding the best route for 64 cities
would require about 20 million seconds, or
5000
hours!
27
Comparing the time
requirements
work
2n
n
n2
35
30
25
20
15
10
5
0
n
log2n
0
5
10
15
n
order
10
log2n
n
n2
2n
.0003
.0001
.01
.1024
50
100
1,000
.0006 .0007
.001
.005
.01
.1
.25
1
1.67 min
3570
4x1016 forget it!
years centuries
time requirements for algorithms
of various orders of magnitude.
Time is in seconds, unless stated otherwise
28
Conclusions
Algorithms are the key to creating
computing solutions.
 The design of an algorithm can make the
difference between useful and impractical.
 Algorithms often have a trade-off between
space (memory) and time.

29
Download