k - Joe Meehean

advertisement
Joe Meehean
1
 Data
structures are great at…
• storing items
• providing a simple access interface
 Algorithms
…
• operate over data structures
• instructions to do more complicated work
• may or may not want embedded in
data structure
2
 Found
in <algorithm> header
 From simple
• for_each
• find
• swap
 To the complex
• sort
• set_intersection
• random_shuffle
• next_permutation
3
 straight
forward approach
 usually based directly on
• problem statement
• definitions of concepts
 e.g., an
• an = a * a * …. * a (n times)
4
 Advantages
• applicable to a wide variety of problems
• no limitation on problem size for some
important problems
• simple to design
 designing better algo not always worth it
 if problem small or algo will run infrequently
• provides a comparison for more complex algos
5
 Disadvantages
• slow
• may be so slow as to make impossible to
complete in human lifetime
6
 Problem
• arrange comparable items in list into
sorted order
 Most
sorting algorithms involve
comparing item values
 We assume items define
• < operator
• > operator
• == operator
 Brute
force
 Find the smallest value in vector A and
put in it in A[0]
 Find 2nd smallest value and put it in A[1]
 Etc…
 Use
a nested loop
 Outer loop index k
• indicates position to fill
 Inner
loop index j
• from k+1 to A.length – 1
• indicates value to compare to min next
 Swap
A[k] with A[min]
• A[min] is min value in range k to A.length–1
min
8
5
2
6
9
3
1
4
0
7
k
10
min
8
5
2
6
9
3
1
4
0
7
k j
11
min
8
k
5
2
6
9
3
1
4
0
7
j
12
min
8
k
5
2
6
9
3
1
4
0
7
j
13
min
8
k
5
2
6
9
3
1
4
0
7
j
14
min
8
k
5
2
6
9
3
1
4
0
7
j
15
min
8
k
5
2
6
9
3
1
4
0
7
j
16
min
8
k
5
2
6
9
3
1
4
0
7
j
17
min
8
k
5
2
6
9
3
1
4
0
7
j
18
min
8
k
5
2
6
9
3
1
4
0
7
j
19
swap
8
k
5
2
6
9
3
min
1
4
0
7
j
20
swap
0
k
5
2
6
9
3
min
1
4
8
7
j
21
min
0
5
2
6
9
3
1
4
8
7
k j
22
23
Item value
Position in the Array
 After
i outer loop iterations
• A[0] through A[i-1] contain their final values
 Outer
loop executes N times
 Inner loop executes a different number of
times depending on outer loop
• 1st outer = N – 1 inner
• 2nd outer = N – 2 inner
• …
• Nth outer = 0 inner
• (N-1) + (N-2) +…+ 2 + 1 + 0 = O(N2)
 Always
O(N2)
 Combinatorial
problems
• problems where the answer is a combination of
items from a set
 Exhaustive
search
• brute force approach to combinatorial problems
• generate all possible combinations
• check each combination to see if it is a
possible solution
• then select the best solution
27
 Have
a set of items
• each has a weight: wi
• each has a monetary value: vi
 Have
knapsack
• can only hold a total weight of W
 Fill
the knapsack to maximize its value
28
 Exhaustive
search
• try every combination of items
• throw out combinations that are too heavy
• select the combination with the largest value
 Complexity
• dominated by generating all combinations
• each item may be in the knapsack or not
• for N items: 2N possible combinations
• O(2N): very, very bad
29
30
 Exploit
relationship between solution to a
problem and solution to a smaller
instance
 Reduce problem to a smaller problem
 Solve smaller problem
 Use smaller problem solution to solve
original problem
31
 Top-down
• reduce larger problem into successively
smaller problems
• recursive approach
 Bottom-up
• solve smallest version of problem
• use to solve next larger problem
• incremental approach
32
3
variations
 Decrease-by-constant
• compute an for positive integer n
• an = an-1 * a
• f(n) = an
• f(n) = f(n-1) * a, if n >0
• f(n) = 1, if n == 0
• recursive definition
33
3
variations
 Decrease-by-constant-factor
• an = (an/2)2
• n/2 is not an integer if n is odd, so…
• an = (an/2)2 , if n is even
• an = (a(n-1)/2)2 * a, if n is odd
• an = 1, if n = 0
• O(logN) number of multiplications
34
3
variations
 Variable-size-decrease
• lookup in BST
• BST is unbalanced
• at each node going left or right removes a
variable number of nodes
35
 Incremental
approach
 Reduce list size to trivial, sort and solve
 Then increase list size
 Put 1st 2 items in correct order
 Insert the 3rd item in the correct place
relative to the first 2
 Insert the 4th item in the correct place
relative to the first 3
 etc…
 Nested
loop
 Outer loop
• index k from 1 to A.length – 1
• item to put into correct place
 Inner
loop
• index j from k – 1 to 0
• items to compare to A[k] to find its correct place
8
5
j
k
2
6
9
3
1
4
0
7
38
temp
5
8
j
2
6
9
3
1
4
0
7
k
39
temp
5
8
j
2
6
9
3
1
4
0
7
k
40
5
8
j
k
2
6
9
3
1
4
0
7
41
5
8
2
6
9
3
1
4
0
7
j k
42
temp
2
5
8
6
9
3
1
4
0
7
j k
43
temp
2
5
8
6
9
3
1
4
0
7
j k
44
temp
2
5
j
8
6
9
3
1
4
0
7
k
45
2
j
5
8
6
9
3
1
4
0
7
k
46
2
5
8
6
9
3
1
4
0
7
j k
47
temp
6
2
5
8
9
3
1
4
0
7
j k
48
temp
6
2
5
8
9
3
1
4
0
7
j k
49
2
5
6
8
9
3
1
4
0
7
j k
50
51
Item value
Position in the Array
 After
the i-th iteration of outer loop
• A[0] through A[i – 1] are in order relative to each
other only
 In
order to insert an item, we need to shift
some items to the right
 Outer
loop executes N times
 Worst-case
• Occurs when A is in reverse sorted order
• Inner loop executes 1 to N – 1 times
• O(N2)
 Best
case
• Occurs when A is already sorted
• Inner loop never executes
• O(N)
55
 Divide
an instance of a problem into two
or more smaller problems
 Solve smaller problems, if easy
 If not, divide again
 “Top-down” approach
• solve “top” problem by stopping and going
down to solve smaller problem
 Classic
recursion
56
 Strategy
1. divide a problem into smaller instances
2. conquer the smaller instances
3. combine the smaller solutions
57
 Do
not use when an instance of size n:
• divides into two or more instances of near n size
• divides into n instances of size n/c, where c is a
constant
 Do
not get very much out of
dividing problem
 Sometimes this is unavoidable
• larger problem is too difficult to solve
without dividing
58
 Fundamental
Idea
• array of size one is sorted
• divide an array repeatedly until it is a bunch of
sorted arrays of size one
• combining two sorted arrays into a single sorted
array can be done in O(N)
59
 Algorithm
• divide array into 2 halves
• merge sort each half
 divide, sort, merge
 a list of size one is already sorted
• merge the halves
60
aidx
A
bidx
0
1
3
4
7
0
1
2
3
4
B
2
5
6
8
9
0
1
2
3
4
cidx
C
0
1
2
3
4
5
6
7
8
9
61
aidx
A
bidx
0
1
3
4
7
0
1
2
3
4
B
2
5
6
8
9
0
1
2
3
4
cidx
C
0
0
1
2
3
4
5
6
7
8
9
62
aidx
A
bidx
0
1
3
4
7
0
1
2
3
4
B
2
5
6
8
9
0
1
2
3
4
cidx
C
0
1
0
1
2
3
4
5
6
7
8
9
63
aidx
A
bidx
0
1
3
4
7
0
1
2
3
4
B
2
5
6
8
9
0
1
2
3
4
cidx
C
0
1
2
0
1
2
3
4
5
6
7
8
9
64
aidx
A
0
1
3
4
7
0
1
2
3
4
bidx
B
2
5
6
8
9
0
1
2
3
4
cidx
C
0
1
2
3
0
1
2
3
4
5
6
7
8
9
65
aidx
A
0
1
3
4
7
0
1
2
3
4
bidx
B
2
5
6
8
9
0
1
2
3
4
cidx
C
0
1
2
3
4
0
1
2
3
4
5
6
7
8
9
66
aidx
A
0
1
3
4
7
0
1
2
3
4
bidx
B
2
5
6
8
9
0
1
2
3
4
cidx
C
0
1
2
3
4
5
0
1
2
3
4
5
6
7
8
9
67
aidx
A
0
1
3
4
7
0
1
2
3
4
bidx
B
2
5
6
8
9
0
1
2
3
4
cidx
C
0
1
2
3
4
5
6
0
1
2
3
4
5
6
7
8
9
68
aidx
A
0
1
3
4
7
0
1
2
3
4
B
bidx
2
5
6
8
9
0
1
2
3
4
cidx
C
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
8
9
69
aidx
A
0
1
3
4
7
0
1
2
3
4
B
bidx
2
5
6
8
9
0
1
2
3
4
cidx
C
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
9
70
aidx
A
0
1
3
4
7
0
1
2
3
4
B
bidx
2
5
6
8
9
0
1
2
3
4
cidx
C
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
71
aidx
A
bidx
0
1
3
4
7
0
1
2
3
4
cidx
C
0
1
B
2
5
6
8
9
0
1
2
3
4
Works exactly the same if input
lists are just different parts of the
same list
2
3
4
5
6
7
8
9
72
aidx
A
bidx
0
1
3
4
7
2
5
6
8
9
0
1
2
3
4
5
6
7
8
9
cidx
C
0
1
Works exactly the same if input
lists are just different parts of the
same list
2
3
4
5
6
7
8
9
73
A
0
1
3
4
7
2
5
6
8
9
0
1
2
3
4
5
6
7
8
9
Sorted data must be copied back
to original list after it is sorted
C
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
74
A
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
Sorted data must be copied back
to original list after it is sorted
C
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
75
5
2
6
3
1
4
0
7
76
5
5
2
6
2
6
3
3
1
4
1
0
4
7
0
7
77
5
5
5
2
6
2
6
3
2
6
3
1
4
0
1
3
1
4
4
7
7
0
0
7
78
5
5
5
5
2
6
2
6
3
3
1
4
0
1
2
6
3
2
6
3
1
1
4
7
7
0
4
0
4
0
7
7
79
5
2
6
3
1
4
0
7
80
2
5
5
3
6
2
6
3
1
1
4
0
4
0
7
7
81
2
2
5
3
5
0
6
5
3
6
2
6
3
1
1
1
7
4
4
0
4
0
7
7
82
2
2
5
0
1
2
3
5
6
3
4
5
6
0
5
3
6
2
6
3
1
1
1
7
7
4
4
0
4
0
7
7
83
Item value
Position in the Array
 Merge
sort is not an in-place algorithm
• requires a temporary buffer to hold the partially
merged array
 Merge
sort is an example of a divide and
conquer algorithm
• divides problem into progressively
smaller parts
• solves easy smaller problems
• combines the result
 Calls
to mergeAux form a binary tree
• number in node is the array size, given N = 8
8
4
4
Log2N
1
1
1
2
2
2
2
1
1
1
1
1
 The
height of the tree is O(logN)
 Work done at each level
• all values are merged at each level (not node)
• O(N)
 Total
time is O(NlogN)
• always, never any faster or slower
88
 Several
solutions
 Each solution has a value
 Want to find an optimal solution
• one with best (min/max) value
• may be several optimal solutions
89
 Motivation
 Works
well for optimization problems
 Difficulty
• for some recursive problems
• recursion may be inefficient
• (rule 4: compound interest rule)
90
 Fibonacci
• fib(1) = 1, fib(2) = 1
• fib(N) for N > 2 = fib(N-1) + fib(N-2)
• fib(3) = fib(2) + fib(1) = 1 + 1 = 2
int fib(int n){
if( n <= 2) return 1;
return fib(n – 2) + fib(n – 1);
}
fib(4)
3
1
+
2
fib(3)
fib(2)
 Note:
fib(2) is
called twice
 Redoing work
2
1
fib(1)
+
1
fib(2)
 Solution: Dynamic
Programming
• solve sub-problems 1st
• store results in a table
• a “bottom-up” approach
vs. divide and conquer’s “top-down”
93
 Fibonacci
Dynamic Programming
int fib(int n){
int fibs[n+1];
fibs[1] = 1;
fibs[2] = 1;
for(int i = 3; i < n + 1; i++){
fibs[i] = fibs[i-2] + fibs[i-1];
}
return fibs[n];
}
94
 Recognizing
• recursive problem
• same sub-problems solved independently
 Developing
algorithm
• establish recursive property
 allows division into sub-problems
• solve sub-problems bottom up
 store results in a table
95
 Levenshtein
distance
• an edit distance
• how much two strings differ
• number of point mutations need to change string
s1 into string s2
 Point
mutation
• change a letter
• insert a letter
• delete a letter
96
 Levenshtein
distance: recursive property
• d(s1,s2) for strings s1 & s2
• d(“”, “”) = 0 // empty strings are the same
• d(s1, “”) = d(“”, s1) = s1.length()
• d(s1+ch1, s2 + ch2) = minimum of…
 if( ch1 == ch2 ), d(s1, s2) // characters are the same
 if( ch1 != ch2), d(s1, s2) + 1 // change a letter
 d(s1 + ch1, s2) + 1 // delete last letter from s2
 d(s1, s2 + ch2) + 1 // delete last letter from s1
97
 Levenshtein
distance
• could easily write recursively
• base case
 one string is empty
• 3 recursive cases
 remove last character from both words
 remove last character from string s1
 remove last character from string s2
98
 Levenshtein
distance
• could easily write recursively
• inefficient to write recursively
• lots of repeated sub-problems
 remove last character from s1, recurse
 then remove last character from s2, recurse
 the same as removing from both, recurse
99
 Dynamic
programming approach
 create a table
• stores edit distance sub-problems
• 2D array m
• s1.length() rows
• s2.length() columns
• m[i][j] = d(s1[0..i-1], s2[0..j-1])
 distance of first i characters of s1
to first j characters of s2
10
0
 Dynamic
programming approach
 Important note
• i & j are “count” based indices for m, counting #
•
•
•
•
of characters compared
i & j are 0-based indices for s1 & s2
s1[4] is 5th character of s1
m[4][j] is comparing 1st four chars of s1 against
1st j chars of s2
so s1[3] is the last char compared for m[4][j]
10
1
 Dynamic
programming approach
 Fill the table
 base cases
• m[0][0] = 0
• m[i][0] = i
• m[0][j] = j
10
2
 Dynamic
programming approach
 Fill the table
 recursive cases: m[i][j] = min(
• if( s1[i-1] == s2[j-1] ) m[i – 1][j – 1]
• else m[i – 1][j – 1] + 1
• m[i – 1][j] + 1,
• m[i][j – 1] + 1
)
10
3
 Dynamic
programming approach
 Fill the table
• recursive table entries rely only on previous row
and column
• fill from left to right, top to bottom
 Complexity
• fill an entry: O(1)
• # of entries: s1.length() * s2.length() = n * m
• n*m * O(1) = O(n*m)
10
4
10
5
 Also
used to solve optimization problems
 Make a series of choices
• irreversible
 Make “best” choice
at the time
• ignoring previous choices and future choices
 Intuitive
and simple to create
 Difficult to prove optimal
10
6
 Make
change using US currency
• use the least number of coins
• 25c, 10c, 5c, 1c
• give quarters until infeasible
• then give dimes, then nickels, then pennies
• 83c = 3 quarters, 1 nickel, 3 pennies
 Not
optimal for all denominations
• 25c, 10c, 1c
• 30c = 3x 10c, not 25c + 5x 1c
10
7
 Outline
1. start with empty set
2. add items in sequence
 sub-outline next slide
3. repeat (2) until set represents solution
10
8
 Selection
procedure
• chooses next item based on greedy criterion
 Feasibility
check
• does the new set violate the rules?
 Solution
check
• is the new set the answer?
10
9
 Symbols
are encoded into binary
• e.g., letters, colors, …
 Convert
fixed-length encoding
• ASCII
• pixel color data (RGB)
 to
variable-length encoding
• most frequent symbols have smaller encoding
• reduces size of encoded data
11
0
 Variable
length encoding
• how to separate symbols?
• e.g., 0000111 =>000 01 11 => BAD
• use prefix-free codes
• no codeword is a prefix for any other
11
1
 How
0
0
1
B
E
0
1
1
0
1
A
C
D
do we create a
set of
prefix-free
codewords?
 Binary tree
• symbols as leaves
• left branch is 0, right
branch is 1
• path from root to leaf
defines prefix-free
code
11
2
 How
to create a tree to maximize
compression?
 Huffman’s algorithm
1. create a tree for each symbol with weight
based on frequency of the symbol
2. combine two smallest weight trees
 make left and right child of new tree
 weight of new tree is sum of left and right child
3. repeat step 2 until 1 tree
11
3
35
10
A
20
B
20
C
15
D
E
symbol
A
B
C
D
E
frequency
35 10 20 20 15 11
4
25
35
B
E
20
A
20
C
D
symbol
A
B
C
D
E
frequency
35 10 20 20 15 11
5
25
40
35
B
E
A
C
D
symbol
A
B
C
D
E
frequency
35 10 20 20 15 11
6
60
40
A
B
C
D
E
symbol
A
B
C
D
E
frequency
35 10 20 20 15 11
7
A
B
E
C
D
symbol
A
B
C
D
E
frequency
35 10 20 20 15 11
8
0
0
1
A
0
B
1
0
C
1
Symbol
Code
A
01
B
000
C
10
D
11
E
001
D
1
E
symbol
A
B
C
D
E
frequency
35 10 20 20 15 11
9
 Average
code length
• sum of (codeword length * probability of
symbol) for all symbols
• for our example = 2.25
 Fixed
length would require 3 bits
• 23 = 8 > 5
• 22 = 4 !> 5
 Compression
ratio
• fixed length – average code length / fixed length
• (3- 2.25) / 3 = 25% reduction
12
0
12
1
 Problem
• choose a sequence of items
• from a set
• sequence must satisfy some criterion
 State
space tree
• root of tree, no items selected
• children: all possible selections from set
• leaf: sequence complete
• each path represents a possible sequence
12
2
 Backtracking
• goal: find solution
• create & prune tree
• preorder, depth-first search of state space tree
 Backtracking
pruning
• non-promising node: node where children
cannot lead to solution
• pruned state space tree should include only
promising nodes
12
3
 Backtracking
algorithm
void checkNode(Node v){
if( promising(v) ){
if( v is solution ){
stop search and return solution
}
else{
for( each child u of v ){
checkNode(u)
}}}}
12
4
 Backtracking
algorithm
• state space tree not explicitly created
and traversed
• tree created implicitly using recursion
• pruning happens while traversing
12
5
 Eg: n-Queens
 Place
n queens on an n x n chessboard
 Queens must not threaten each other
 No two queens can be in same row,
column, or diagonal
12
6
 n-Queens
state space tree
• queens cannot share row
• for each queen only need to choose column
• each level in tree represents choosing column
for queen in next row
• each node stores 2 numbers: row,column
12
7
S
Portion of state space tree
0,0
1,0
2,0
3,0
3,1
0,1
1,1
2,1
3,2
1,2
2,2
0,2
0,3
1,3
2,3
3,3
12
8
S
Pruned left-side of state space tree
0,0
1,0
2,0
1,1
2,1
1,2
2,2
2,3
1,3
2,0
3,0
2,1
3,1
2,2
3,2
2,3
3,3
12
9
S
Pruned right-side of state space tree
0,1
1,0
1,1
1,3
1,2
2,0
3,0
3,1
3,2
13
0
13
1
13
2
13
3
13
4
13
5
13
6
13
7
13
8
13
9
14
0
14
1
14
2
 Random
number is used to make choice
 Run time depends on random numbers
& input
 Worst-case runtime may be similar to
non-random algorithm
 Why bother then?
• if input are not evenly distributed
• may avoid worst-case more often
• runs differently even for same input
14
3
 Two
types:
 Monte Carlo
• always run fast
• may produce incorrect answers with
small probability
 Las Vegas
• always produces correct answer
• runs quickly with high probability
14
4
 Skip
list
• Las Vegas
• sorted linked list
• average lookup and insert of O(LogN)
14
5
 Preliminary
skip list
• every 2nd node has link to node two ahead
• every 4th node has link to node four ahead
• 2ith node has link to node 2i ahead
A
F
U
H
G
M
R
S
X
14
6
 Lookup
• traverse “highest” link until next item is too large
• drop to next highest link, continue
• dropped to lowest link, should be pointing to item
A
F
U
H
G
M
R
S
X
14
7
Lookup S
A
F
U
H
G
M
R
S
X
14
8
Lookup S
A
F
U
H
G
M
R
S
X
14
9
Lookup S
A
F
U
H
G
M
R
S
X
15
0
Lookup S
A
F
U
H
G
M
R
S
X
15
1
Lookup S
A
F
U
H
G
M
R
S
X
15
2
Lookup S
A
F
U
H
G
M
R
S
X
15
3
Lookup T
A
F
U
H
G
M
R
S
X
15
4
Lookup T
A
F
U
H
G
M
R
S
X
15
5
Lookup T
A
F
U
H
G
M
R
S
X
15
6
Lookup T
A
F
U
H
G
M
R
S
X
15
7
Lookup T
A
F
U
H
G
M
R
S
X
15
8
Lookup T
A
F
U
H
G
M
R
S
X
15
9
 Insert
• could cause nodes to shift right
• each shifted node would need to change
 “level”: number of pointers
 nodes pointed to and nodes that point to it
A
F
U
H
G
M
R
S
X
16
0
 Insert
• add randomization
• randomly choose “level” of new node
• preliminary: 1/2i nodes of at least level i
A
F
U
H
G
M
R
S
X
16
1
 Insert
• add randomization
• randomly choose “level” of new node
• preliminary: 1/2i nodes of at least level i
• random: new node level i with probability of 1/2i
F
A
M
G
H
R
S
U
X
16
2
 Insert
• during lookup keep track of “drops”
• drop pointers <= “level”
 point to new node
F
A
M
G
H
R
S
U
X
16
3
Insert U
F
A
M
G
H
R
S
X
16
4
Insert U
F
A
M
G
H
R
S
X
16
5
Insert U
F
A
M
G
H
R
S
X
16
6
Insert U
F
A
M
G
H
R
S
X
16
7
Insert U
F
A
M
G
H
R
S
X
16
8
Insert U
F
A
M
G
H
R
S
X
16
9
Insert U
F
A
M
G
H
R
S
U
X
17
0
 Skip
list complexity
 Preliminary lookup
• cut out half the remaining nodes at each step
• O(LogN)
• why? 1/2i nodes at level i
 Final
lookup
• still has 1/2i nodes at level i
• randomly spread through list
• on average: O(LogN)
17
1
 Skip
list complexity
 Final insert
• lookup: avg O(LogN)
• pointer manipulations: at most i
• on average: O(LogN + i) = O(LogN)
17
2
17
3
Download