Document 11384452

advertisement
This text is my rendering of a part of chapter 5.6 in Jurafsky & Martin (2000), plus an attempt at applying the
Minimum Edit Distance algortihm described by J&M to path finding. The text was intended for the curriculum
for a course in large text processing (the course was never realized).
Asbjørn Brændeland 2008
Dynamic Programming
"Dynamic Programming is the name for a class of algorithms, first introduced by Bellmann (1957), that apply a
table driven method to solve problems by combining solutions to subproblems. This class of algortihms includes
the most commonly-used algorithms in speech and language processing, among them the minimum edit
distance algortihm for spelling error correction, the Viterbi algorithm and the forward algorithm which are
used both in speech recognition and in machine translation, and the CYK and Early algorithm used in parsing."
(Jurafsky & Martin, 2000):
Minimum Edit Distance
The minimum edit distance algortihm can serve as a reasonably simple introductory example to the dynamic programming paradigm. The minimum edit distance may be seen as a measure of the editing required to get from
one string to another. For instance, to get from "intention" to "execution" (the example given by Jurafsky & Martin (2000)), one could tabulate the editing costs with regard to insertions and deletions — or also substitutions,
when possible.
edit oper cur. string acc.cost
intention
0
delete i
delete n
delete t
delete n
insert c
insert u
insert x
ntention
1
tention
2
ention
3
etion
4
insert e
ection
5
ecution
6
xecution
7
execution
8
edit oper
delete i
subst. n by e
subst. t by x
insert u
subst. n by c
cur. string acc.cost (1 pr sbst) acc. cost (2 pr subst.)
intention
0
0
ntention
1
1
etention
2
3
exention
3
5
exenution
4
6
execution
5
8
Figure 1. Three possible tabulations of the edit distance between different strings. The cost is either 1 per
operation or 1 per insertion/deletion and 2 per substitution (rightmost column).
The accumulated costs in the two tables correspond to different suggestions made by Levenstein (1966).
(Levenstein actually suggests disallowing substitutions, which amounts to the same as charging 2 units for a
substitution, i.e. a deletion followed by an insertion — see rightmost column.)
The table in Figure 3 may be seen as a representation of the various paths leading from the source to the target
string. A vertical upwards movement represents a deletion of a character in the soruce string, a sideways rightwards movement represents an insertion of a character into the target string, while a diagonal movement represents a substitution. The cruxial point here is that the cost of substituting a character for itself is zero. I.E. the
cost of taking one step up or one step to the right is always 1, but the cost of moving diagonally is either 0 or 2
depending on whether the characters in the target and source strings corresponding to the cell we are stepping
into, are the same or not. To facilitate the computation (i.e. to avoid special test at the margins of the table) each
string is prefixed with a number sign. For the same reason the zero'th column and row is tabulated in advance
(this can be seen as computing the cost of deleting the entire soruce string and inserting the entire target string,
respectively.)
1
Generally the value in a cell is computed to be the minimum of these three values
- the value in the cell to the left + 1,
- the value in the cell below + 1,
- the value in the cell diagonally below and to the left + the current substitution cost (0 or 2).
Figure 2. shows a Java implementation of the algorithm.
private int minEditDist(String target, String source)
{
n = target.length();
m = sourcelength();
distance = new int[n][m];
for (int i = 0; i < n; i++) distance[i][0] = i;
for (int j = 0; j < m; j++) distance[0][j] = j;
for (int i = 1; i < n; i++)
for (int j = 1; j < m; j++)
distance[i][j] =
min(distance[i - 1][j] + 1,
distance[i - 1][j - 1] +
(target.charAt(i) == source.charAt(j) ? 0 : 2),
distance[i][j - 1] + 1);
return distance[n-1][m-1];
}
Figure 2. The minimum distance algorithm.
The table in Figure 3. shows the values computed during the process of finding the minimum edit distance
between "intention" to "execution".1
n
o
i
t
n
e
t
n
i
#
║ 9 │ 8 │ 9 │ 10
║ 8 │ 7 │ 8 │ 9
║ 7 │ 6 │ 7 │ 8
║ 6 │ 5 │ 6 │ 7
║ 5 │ 4 │ 5 │ 6
║ 4 │ 3 │ 4 │ [5]
║ 3 │ 4 │ [5] │ 6
║ 2 │ 3 │ [4] │ 5
║ 1 │ [2] │ 3 │ 4
║ [0] │ 1 │ 2 │ 3
║ # │ e │ x │ e
│ 11
│ 10
│ 9
│ 8
│ [7]
│ 6
│ 7
│ 6
│ 5
│ 4
│ c
│
│
│
│
│
│
│
│
│
│
│
12
11
10
9
[8]
7
8
7
6
5
u
│ 11
│ 10
│ 9
│ [8]
│ 9
│ 8
│ 7
│ 8
│ 7
│ 6
│ t
│ 10
│ 9
│ [8]
│ 9
│ 10
│ 9
│ 8
│ 7
│ 6
│ 7
│ i
│
│
│
│
│
│
│
│
│
│
│
9
[8]
9
10
11
10
9
8
7
8
o
│ 8
│ 9
│ 10
│ 11
│ 10
│ 9
│ 8
│ 7
│ 8
│ 9
│ n
Figure 3. Computation of the minimum edit distance. To get from the source to the target string with a
minimum of editin one has to move through the white area in the table. In particular one has to pass through the
points of cost free substitutions. The bracketed numbers mark the middle path characterised by a preference for
substitutions whenever possible.
(To make sense of the numbers in the gray areas one would have to imagine that one could actually get the cost
of a the deletion of a character reimbursed by reinserting that character two or more steps later, and vice versa.)
1
A comparison of the above programming code and table with the corresponding figures 5.5 and 5.6. in Jurafsky
& Martin (2000) will reveal several minor discrepancies of which I won't try to make any account.
2
edit oper
subst n by n
subst o by o
subst i by i
subst t by t
insert u
insert c
delete n
subst e by e
insert x
insert e
delete t
delete n
delete i
edit oper
subst n by n
subst o by o
subst i by i
subst t by t
delete n
insert u
insert c
subst e by e
delete t
delete n
delete i
insert x
insert e
edit oper
subst n by n
subst o by o
subst i by i
subst t by t
insert u
subst n by c
subst e by e
delete t
subst n by x
subst i by e
current string
# e x e c u
# e x e c u
# e x e c u
# e x e c u
# e x e c u
# e x e c t
# e x e t i
# e x e n t
# e x e n t
# e e n t i
# e n t i o
# t e n t i
# n t e n t
# i n t e n
current string
# e x e c u
# e x e c u
# e x e c u
# e x e c u
# e x e c u
# e x e c u
# e x e c n
# e x e n t
# e x e n t
# e x t e n
# e x n t e
# e x i n t
# e i n t e
# i n t e n
current string
# e x e c u
# e x e c u
# e x e c u
# e x e c u
# e x e c u
# e x e c t
# e x e n t
# e x e n t
# e x t e n
# e n t e n
cost
8
8
8
8
8
7
6
n
6
n
5
3
4
n
2
o n
1
i o n
0
cost
t i o n
8
t i n n
8
t i o n
8
t i o n
8
t i o n
8
n t i o n
7
t i o n
6
i o n
5
i o n
5
t i o n
4
n t i o n
3
e n t i o n 2
n t i o n
1
t i o n
0
cost
t i o n
8
t i n n
8
t i o n
8
t i o n
8
t i o n
8
i o n
7
i o n
5
i o n
5
t i o n
4
t i o n
2
t
t
t
t
t
i
o
i
i
o
n
o
i
t
i
i
i
i
i
o
n
o
o
n
o
n
o
o
o
n
n
n
n
n
n
# i n t e n t i o n
0
move













move













move










cell
(9, 9)
(8, 8)
(7, 7)
(6, 6)
(5, 5)
(4, 5)
(3, 5)
(3, 4)
(2, 3)
(1, 3)
(0, 3)
(0, 2)
(0, 1)
(0, 0)
cell
(9, 9)
(8, 8)
(7, 7)
(6, 6)
(5, 5)
(5, 4)
(4, 4)
(3, 4)
(2, 3)
(2, 2)
(2, 1)
(2, 0)
(1, 0)
(0, 0)
cell
(9, 9)
(8, 8)
(7, 7)
(6, 6)
(5, 5)
(4, 5)
(3, 4)
(2, 3)
(2, 2)
(1, 1)
(0, 0)
Figure 4. The table shows two marginal paths characterized by a preference for source character deletion and
for target character insertion, respecitvely, and the middle path characterized by a preference for substitutions
(indicated by bracketed numbers in Figure 3.). Note that the editing here progresses upwards.
3
Pathfinding
Finding the most likely path
We now want to apply the dynamic programming paradigm to the task of planning the most effortless path from
one town to another through a network of towns and interconnecting roads. We represent the network by a directed graph where we, rather than indicating the strenuousness of each road, indicate the likelihood of each road
being chosen, relative to other possible choices, as a function of its length, quality,etc,. — The shorter and easier,
the more likely a road is to be chosen over the other roads departing from the same town. The preference likelihoods of all the roads departing from one town sums up to 1.
Figure 5. A graph representation of some towns and some of the roads connecting them. Assuming we want to
get from A to H, only the roads leading in the desired direction are included in the graph. The numbers on the
arcs are the preference likelihoods of the corresponding roads
For the computation we need a table where we can represent
- for each town
- the roads leading from it,
- the accumulated highest likelihood of the town being visited, and
- the last town visited on the way there, and
- for each departing road
- its destination and
- its likelihood of being chosen.
town
last stop on most
likely path here
accum. likelihood
of being visited
departing roads
destination : likelihood
A
-
1.0
B : 0.3
C : 0.4
D : 0.3
B
-
0.0
E : 0.6
F : 0.3
G : 0.1
C
-
0.0
E : 0.3
F : 0.5
G : 0.2
D
-
0.0
E : 0.1
F : 0.7
G : 0.2
E
-
0.0
H : 1
F
-
0.0
H : 1
G
-
0.0
H : 1
H
-
0.0
Figure 6. The initial values in the table for computing the path most likely to be chosen traveling from town A
to town H.
4
The most likely path can be computed dynamically by running through the towns and the roads departing from
them, as shown in Figure 7.
- For each town ti.
- For each departing road rj and its destination town tk
- If the likelihood of getting to ti times the likelihood of rj being chosen
is greater than the previously computed likelihood of getting to tk, then
- store the higher number in likelihood columnk and
- store i in backtracking columnk.
Figure 9. Dynamic programming algorithm for finding most likely path.
i

vi
j, k(j) 
destk of roadj : accum. max. l.hood visit : coming from
0
A
B: 0.3
1
B
E: 0.18 : B
F: 0.09 : B
G: 0.03 : B
ABE : 0.18
2
C
E: 0.12 : B
F: 0.20 : C
G: 0.08 : C
ACF : 0.20
3
D
E: 0.03 : B
F: 0.21 : D
G: 0.06 : C
ADF : 0.21
4
E
H: 0.18 : E
ABEH : 0.18
5
F
H: 0.21 : F
ACFH : 0.20
6
G
H: 0.08 : G
ADFH : 0.21
7
H
: A
C: 0.4
: A
D: 0.3
most likely path :
acumm. l.hood
: A
AC : 0.4
Figure 8. The successive updates for each town of the accumulated maximum likelihood of beeing visited and
the immediate prior visitee giving that likelihood.
We se from Figure 8. how towns are falling in and out of the most likely path according to the accumulated
likelihoods for each new round. Figure 9. shows the final results.
destination : likelihood
town
backtrack to
accum. l.hood
0
A
-
1.0
B : 0.3
C : 0.4
D : 0.3
1
B
0
0.3
E : 0.6
F : 0.3
G : 0.1
2
C
0
0.4
E : 0.3
F : 0.5
G : 0.2
3
D
0
0.3
E : 0.1
F : 0.7
G : 0.2
4
E
1
0.18
H : 1
5
F
3
0.21
H : 1
6
G
2
0.08
H : 1
7
H
5
0.21
Figure 9. The final results of computing the most likely path through the graph.
Finding the shortest path
We can easily adapt the above algorithm to the task of finding the shortest path through a directed graph
(a graph where every edge points in only one direction—where there are only one-way edges).
- First we adjust the graph by replacing the relative preference likelihoods by road lengths.
- Next we initialize the cells of the accumulation column to some unreachable maximum.
- Finally we adjust the computation formula so that instead of accumulating a product of road preference
likelihoods we accumulate a sum of road lengths.
5
A Java implementation of the shortest path finding algorithm, along with the corresponding table, is given in
Figure 11. The table has been simplified to an integer matrix m where
- m[i][i] (the top-left to bottom-right diagonal) is the shortest path from v0 to vi, and
- m[i][j] (the part of the row to the right of the diagonal) is the distance from vi to vj,
- m[0][i] (the zeroth column) is the last town on the shortest path to vi.
Figure 10. A distance version of the graph in Figure 5. The numbers on the arcs, representing road lengths, are
"inversions" of the preference likelihoods shown in Figure 5. That is, if Li is the length and Pi is the preference
likelihood of roadi, then Li = (1 – Pi)  10.
public void calcBestPath()
{
for (int i = 0; i < indmax; i++)
for (int j = i + 1; j <= indmax; j++)
if (m[i][j] > 0 &&
// There is a path from i to j.
m[i][i] + m[i][j] < m[j][j]) // The path to j via i is better
{
// than any path seen so far.
m[j][j] = m[i][i] + m[i][j]; // Update total distance to j.
m[j][0] = i;
// Store last node on
}
// the best way to j.
}
Figure 11. Dynamic programming algorithm for finding shortest path.
A
B
C
D
E
F
G
H
A
0
A
A
A
B
D
C
F
B
7
7
C
6
0
6
D
7
0
0
7
E
0
4
7
9
11
F
0
7
5
3
0
10
G
0
9
8
8
0
0
14
H
0
0
0
0
20
20
20
30
Figure 12. The successive updates for each town of the accumulated shortest path and the immediate prior
visitee along that path. For readability the town indices are substituted by town names.
6
Relating edit minimzing and path finding
Let us see how we may relate the minimum edit distance and the pathfinding algorithms. For that purpose we
will consider a simpler edit distance example than the one above, i.e. that of finding the distance between the
strings "ABCD" and "AFCG".
D
C
B
A
#
║
║
║
║
║
║
4
3
2
1
0
#
│
│
│
│
│
│
3
2
1
0
1
A
│
│
│
│
│
│
4
3
2
1
2
F
│
│
│
│
│
│
3
2
3
2
3
C
│
│
│
│
│
│
4
3
4
3
4
G
Figure 13. Computation of the minimum edit distance between "ABCD" and "AFCG".
We can represent the possible movements through the table in Figure 13 in a graph, with the cost of each move
indicated on the arcs — as shown in Figure 14.
Figure 14. Graph representation of the edit options moving from "ABCD" to "AFCG".
Since there are not more than 3 arcs leaving any node, we can represent the above graph in a 25  3 table and
compute the shortest path the same way we did above (or in a similar way). In stead of getting a 5  5 nested
loop where we for each iteration look at 3 numbers, we now get a 25  3 nested loop—So the amount of work
reamins the same, as does the final outcome. Rather than the using the isomorph square table we used for computing the shortest path, we use the plymorph table from the most likely path computation, minus the backtracking column. The algorithm will then be an adaption of the one given in Figure 9.
7
- For each node ni.
- For each directed arc aj and its destination node nk
- If the minimum edit distnace to ni plus the length of aj is
less than the previously computed minimum edit distance to nk, then
- store the lesser number in cellk of the mininum edit column
Figure 16. Dynamic programming algorithm for finding minimum edit path.
node
acc.min.
ed. dist.
neighbor : distance
#,#
0
#,A : 1
A,A : 0
A,# : 1
#,A
1
#,B : 1
A,B : 2
A,A : 1
#,B
2
#,C : 1
A,C : 2
Q,B : 1
#,C
3
#,D : 1
A,D : 2
A,C : 1
#,D
4
A,#
1
A,A : 1
F,A : 2
F,# : 1
A,A
0
A,B : 1
F,B : 2
F,A : 1
A,B
1
A,C : 1
F,C : 2
F,B : 1
A,C
2
A,D : 1
F,G : 2
F,C : 1
A,D
3
F,#
2
F,A : 1
C,A : 2
C,# : 1
F,A
1
F,B : 1
C,B : 2
C,A : 1
F,B
2
F,C : 1
C,C : 0
C,B : 1
F,C
3
F,G : 1
C,D : 2
C,C : 1
F,D
4
C,#
3
C,A : 1
G,A : 2
G,# : 1
C,A
2
C,B : 1
G,B : 2
G,A : 1
C,B
3
C,C : 1
G,C : 2
G,B : 1
C,C
2
C,D : 1
G,D : 2
G,C : 1
C,D
3
G,#
4
G,A : 1
G,A
3
G,B : 1
G,B
4
G,C : 1
G,C
3
G,D : 1
G,D
4
A,D : 1
F,G : 1
C,D : 1
G,D : 1
Figure 15. The final results of computing the most likely edit distance through a graph.
—————————————————————————————————————————————
Ref:
Jurafsky & Martin, Speech and Language Processing, Prentice Hall, 2000
8
Download