An Extension of the String-toString Correction Problem Roy Lowrance and Robert A. Wagner Journal of the ACM, vol. 22, No. 2, April 1975, pp. 177-183. Speaker: 吳展碩 Edit Distance • Three edit operations: – Substitution • abcd -> aacd (change b to a) – Insertion • abcd -> abacd (insert an a) – Deletion • abcd -> abd (delete c) • Given two strings T and P, The problem is to determine the minimum number of edit operations to transform T into P. Note: For clarity, we consider the cost of all edit operations are same. d[i, j] = min( d[i-1, j] + 1, d[i, j-1] + 1, d[i-1, j-1] + cost(A[i]->B[j]) ) s a t u r d a y 0 1 2 3 4 5 6 7 8 s 1 0 1 2 3 4 5 6 7 u 2 1 1 2 2 3 4 5 6 n 3 2 2 2 3 3 4 5 6 d 4 3 3 3 3 4 3 4 5 a 5 4 3 4 4 4 4 3 4 y 6 5 4 4 5 5 5 4 3 This example is copied from Wikipedia s a t u r d a y s u n d a y The Problem • This paper extends the set of edit operations to include the operation of interchanging two adjacent characters. – Swap • Example: T: a b c d P: c d a a b c d -> a c d -> c a d -> c d a Trace • A trace is a graphical specification of how edit operations apply to each character in the two strings. • Example: T: a b c d P: c d a Important Properties • The edit operations in following cases can be substituted by other edit operations. abc a ... b a ... a ... b bca b ... c b ... ... ... a abc abc bca bca 2 swaps insertion + deletion a ... b a ... b b ... c b ... c swap + substitution or deletion + substitution K a ... b b ... c 2 substitution a ... a ... b a ... a ... b b ... ... ... a b ... ... ... a L swap + Kdeletion + Linsertion a trace with lower cost The Algorithm i' i’ i ............ a ... b j' ........ b ...... a j’ j j d[i, j] = min( d[i-1, j] + 1, d[i, j-1] + 1, d[i-1, j-1] + cost(A[i]->B[j]), d[i'-1, j'-1] + (i-i'-1) + (j-j'-1) + 1 ) i Summary • With a simple preprocessing on |T| and |P|, then the problem can be solved by dynamic programming in time O(|T||P|). • If we allow edit operations to have different cost Insertion (cost WI) Deletion (cost WD) Swap (cost WS) Substitution (cost WC) then the algorithm works if 2 WS ≥ WI + WD.