Outline
1.
2.
3.
General Design and Problem Solving
Strategies
–
More about Dynamic Programming
Example: Edit Distance
–
Backtracking (if there is time)
Another Strategy for the Knapsack Problem:
Design Strategies
Dynamic Programming Design Strategy
–
Solve an “easy” sub-problem
–
–
–
Store the solution
Use stored solution to solve a more difficult subproblem.
Repeat until you solve the “big” hard problem
Other Strategies
–
–
Divide and Conquer
Brute Force
– Greedy
Design Strategies
Dynamic Programming is not divide and conquer.
Consider Floyd’s algorithm
–
–
At no point did we break the input into two parts
What is Floyd’s algorithm really doing?
Design Strategies
What is Floyd’s algorithm really doing?
– STEP 1: Find all the shortest paths allowing a hop through vertex A , store the shortest paths
– STEP 2: Now, use that answer to find the shortest paths allowing a hop through vertex B
More General Design Strategies
Top Down
– See the big picture first
–
–
Break it into parts
Analyze each part
– Continue breaking down sub-parts into solvable tasks
Quicksort is a classic example.
More General Design Strategies
Top Down Quicksort
–
–
–
–
See the big picture first
Need to put items in the correct sorted position
Break it into parts
Put the pivot in the correct position and partition the list into two parts
Analyze each part
Pick a pivot for each part…
Continue breaking down sub-parts into solvable tasks
Continue recursively until sub-parts are lists of size 1
More General Design Strategies
Bottom Up
– Use the solution to larger and larger problems to solve the BIG problem and see the big picture
– Use solution to small tasks to solve larger problems
– Identify easily solvable tasks
Mergesort is a classic example
More General Design Strategies
Bottom Up Mergesort
–
–
–
Use the solution to larger and larger problems to solve the BIG problem and see the big picture
Merging the final two sorted list
Use solution to small tasks to solve larger problems
Merging sorted lists
Identify easily solvable tasks
Sorting lists of size 2
More General Design Strategies
Is Floyd’s Algorithm Top Down or Bottom Up?
More General Design Strategies
Divide and Conquer can be both
Top Down or Bottom Up
Dynamic Programming tends to only be
Bottom Up.
More General Design Strategies
Consider Bottom-up Strategies
– Divide and Conquer usually merges two smaller sub-problems into a large problem
(N/2 + N/2) N
– Dynamic Programming usually extends the solution in some way
N-2 N-1
N_simple_version N_harder_version
More about Dynamic Programming
How does Floyd’s Algorithm extend the solution?
– N-2 N-1
–
Does it consider a smaller graph and then extends the solution to a larger graph?
N_simple_version N_harder_version
Does it consider a simpler shortest path problem and extend it to a more complex shortes path problem?
More about Dynamic Programming
In a graph, it is really easy to solve the shortest path problem if you do not allow any hops (intermediate vertices)
– The adjacency matrix stores all the shortest paths
(direct hop)
More about Dynamic Programming
It is also easy to solve the problem if you only allow a hop through vertex x
if (M[a][x] + M[x][b] < M[a][b]) then
– update the distance
O(N 2 ) is required to update all the cells
Then, just repeat this process N times; one for each intermediate vertex. O(N 3 ) total time
Top Down vs. Bottom Up
Top Down
– Rethinking the design of existing ideas/inventions
–
–
Managing projects that are underway
Works really good in a Utopian world
Bottom Up
– Designing totally new ideas
–
–
Putting together projects from scratch
Seen more often in the real world.
Bottom-up Design
Top Down
– Lets build a flying carriage; what are the parts?
– Lift, propulsion, steering, etc.
Lets build a steering mechanism; what are the parts?
–
–
We need a steering control
Umm? Wait, we need to know how the other parts work first.
Lets build a lift mechanism; how do we do this?
– ???
Lets build a propulsion mechanism
Bottom-up Design
Bottom UP
–
–
Discoveries:
This shape produces lift
A spinning propeller creates propulsion in the air
Canvas with a wood frame is light enough
Next Step: Perhaps we can build an stable, controllable flying thing.
Bottom-up Design
Before we can analyze the big picture
We have to
– Look at some of the initial smaller problems
– See how they were solved
– See how they led to new discoveries
Problem:
– Find The Edit Distance Between Two Strings
Solutions:
– Brute Force – O(K N )
–
–
–
Greedy – No Optimal Algorithms yet
Divide & Conquer – None discovered yet
Dynamic Programming – O(N 2 )
Edit Distance
How many edits are needed to exactly match the Target with the Pattern
Target: TCGACGTCA
Pattern: TGACGTGC
Edit Distance
How many edits are needed to exactly match the Target with the Pattern
Target: T C GACGT C A
Pattern: T GACGT G C
Three:
– By Deleting C and A from the target, and by
Deleting G from the Pattern
Edit Distance
Applications:
– Approximate String Matching
–
–
–
Spell checking
Google – finding similar word variations
DNA sequence comparison
– Pattern Recognition
T
G
1
2
T C G A C G T C A
TG and TCG
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8
1 2 1 2 3 4 5 6 7 TG and TCG A
A 3 2 3 2 1 2 3 4 5 6
C 4 3 2 3 2
Optimal edit distance for
1 2
Optimal edit distance for TG and TCG A
A
G 5 4 3 2 3 2 1 2 3 4
T
G
C
6
7
8
5 4 3 4 3 2 1 2 3
Final Answer
6 5 4 5 4 3 2 3 4
7 6 5 6 5 5 3 2 3
Edit Distance int matrix[n+1][m+1]; for (x = 0; x <= n; x++) matrix[x][0] = x; for (y = 1; y <= m; y++) matrix [0][y] = y; for (x = 1; x <= n; x++) for (y = 1; y <= m; y++) if (seq1[x] == seq2[y]) matrix[x][y] = matrix[x-1][y-1]; else matrix[x][y] = max(matrix[x][y-1] + 1, matrix[x-1][y] + 1); return matrix[n][m];
6
7
4
5
8
2
3
0 1 2 3 4 5 6 7 8 9
1 0 1 2 3 4 5 6 7 8
1
2
2
3
1
2
2
1
3
2
4
3
5
4
6
5
7
6
3
4
5
6
7
2
3
4
5
6
3
2
3
4
5
2
3
4
5
6
1
2
3
4
5
2
1
2
3
5
3
2
1
2
3
4
3
2
3
2
5
4
3
4
3
Edit Distance
How many times is this assignment performed?
int matrix[n+1][m+1];
How many times is this assignment performed?
for (x = 0; x <= n; x++) matrix[x][0] = x; for (y = 0; y <= m; y++) matrix [0][y] = y; for (x = 1; x <= n; x++) for (y = 1; y <= m; y++)
How many times is this comparison performed?
if (seq1[x] == seq2[y]) matrix[x][y] = matrix[x-1][y-1]; else
How many times is this assignment performed?
matrix[x][y] = max(matrix[x][y-1] + 1, matrix[x-1][y] + 1); return matrix[n][m];
To derive this value 5,
In the worst case, we need to know that this may take n comparisons
T
0
T
1
C G
2 3
A
4
C
5
G T
6
C A already matching two C’s and
7 previously matching two T’s
1 0 1 2 3 4 5 6 7 8 n=8
6
7
8
4
5
2
3
G
C
G
T
G
A
C
1 2 1 2 3 4 5 6 7
To derive the value 7,
2 3 2 1 2 3 4 5 6 we need to know that we
3 2 3 2 1 2 3 4 5
4 3 2 3 2 1 2 3 4
To derive the value 6,
5 4 3 4 3 2 1 2 3
6 5 4 5 4 3 2 3 4 matching two T’s
7 6 5 6 5 5 3 2 3
G
T
A
C
T
G
G
C
7
8
3
4
5
6
1
2
T C G A C G T C A
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8
1 2 1 2 3 4 5 6 7
We can’t do any more matching (period!)
2 3 2 1 2 3 4 5 6
3 2 3 2 1 2 3 4 5
4 3 2 3 2 1 2 3 4
Thus, the edit distance is increased
5 4 3 4 3 2 1 2 3
6 5 4 5 4 3 2 3 4
7 6 5 6 5 5 3 2 3
Lesson to learn
There is no way to compute the optimal (minimum) edit distance without considering all possible matching combinations.
The only way to do that is to consider all possible sub-problems.
This is the reason the entire table must be considered.
If you can compute the optimal (minimum) edit distance using less than O(nm) computations.
Then you will be renown!