AUTHOR ET AL.: TITLE APPENDIX A Properties of Algorithm: Lemma 1. For cost matrix π = π[0. . π, 0. . π], let c[0,0]=0 denotes the minimum cost of transforming the empty sequence into itself; c[0,j]=∑1≤β≤π ππΈ (π, π[β]) denotes the minimum cost of inserting the first j letters of P into the empty sequence; c[i,0]= ∑1≤β≤π π(π[β], π) denotes the minimum cost of deleting the first i letters of W so as to form the empty sequence; Thus, for every i ∈ 1..n, j ∈ 1..m, c[i,j]=min{c[i1,j]+ π(π[π], π) ,c[i,j-1]+ π(π, π[π]) , c[i-1,j1]+ π(π[π], π[π])} Proof. We base on a fact that insertion and deletion can be performed in any order. In particular, we consider the three possible orderings of inserts and deletes that leave the operations on W[i] and/or P[j] to be performed last: One possibility is that W[i] should be deleted at a cost of π(π[π], π) added to c[i-1,j] A second possibility is that P[j] should be inserted at a cost of π(π, π[π]) added to c[i,j-1] The final possibility is that a substitution could take place W[i] deleted and P[j] inserted – at a cost of π(π[π], π[π]) added to c[i-1,j-1]. Of course in the case that W[i]=P[j], this increment would be zero. ο―ο Based on lemma 1, for the initial values of the cost matrix: π[0, π] = π(π, π[1. . π]) for every π ∈ 1. . π the cost of inserting the prefix P[1. . π] The initial values in column 1: If W[1] = π[1] then π[π, 1] ← 0 Else π[π, 1] ← min{π[π − 1,1] + π(π[π], π), π(π[π], π[1]} Once the boundary values in row 0 and column 1 have been correctly computed, we may compute each element π[π, π], π ∈ 1. . π, π ∈ 2. . π , according the basic recurrence relation: π[π, π] ←min{ c[i-1,j]+ π(π[π], π),c[i,j-1]+ π(π, π[π]),c[i-1,j1]+ π(π[π], π[π])}.ο Theorem 1. In refined mode, the average-time complexity of Procedure 1 is π° (|W|+m), and in space π°(| ππ |q+|W|+m). In sketch mode, the average-time time complexity of Procedure 1 is an O(c|W|logσm/m) approximate matching algorithm with error π= O((π /m)t). Proof. If the precision degree of pre-matching f is refined, the main task is to compute ππ (π₯, π¦) in step (2). In general we cannot assume that a q-gram v as such could serve as an index. Rather, we need to map each of v to an integer. A natural encoding is to interpret v directly as |ππ | integers. Let v= z1 z2… zq and let ππ ={N0, N1,…, ππ−1 }. Then the integer code of q-gram v is π£Μ = π§Μ1 π π−1 + π§Μ2 π π−2 + β― + π§Μπ π 0 (11) Where π§Μπ = π if zi=Nj. Then, let x=a1…a|W| , and let vi=ai…ai+q-1, 1≤i≤n-q+1, be the q-grams of x Obviously, π£Μπ+1 = (π£Μπ − πΜπ π π−1 ) β π + πΜπ+π (12) By setting and then applying (7) for 1≤i≤n-q we get the integer codes for all q-gram in an array G[0: π π -1] by setting G[π£Μπ ]←G[π£Μπ ]+1 for all i. At the end G[π£Μ]=G(x)[v] for each q-gram v. Moreover, we create a list L of codes that really occur in x. Assuming that each application 1 takes constant time (this holds true at least for small π and a). The total time for computing G and L is O(|x|). Let us denote these G and L as G1 and L1. Similarly, we get G=G2 and L=L2 for a string y in time O(|y|). Now step (2) gets the form ππ (π₯, π¦) = ∑π£Μ∈πΏ1∪πΏ2|πΊ1 [π£Μ] − πΊ2 [π£Μ]| which, obviously, can be evaluated in time O(|x|+|y|). So in refined mode, the average-time complexity of Procedure 1 is Ο (|W|+m), and space complexity is Ο(| ππ |q+|W|+m). If f equals sketch, Procedure 1 use sampling q-gram technique to speed the execution of pre-matching. Let w’= w − q + 1 be the number of q-grams (counting repetitions) in a window of length w. Let Pr denote the probability that a randomly chosen q-gram is a substring of the pattern P, i.e. Pr =σ1π ·|{Pi..i+q−1 :i = 1, ...,m − q + 1}|[25]. According to [26], If π <(m − 2 ππππ π )/(1 + 4 ππππ π ), then Procedure 1 is a algorithm with one-sided error π= O(( πππππ π /m)c) for any constant c > 0. Clearly the running time of Procrdure 1 equal O(cq), i.e. O(clog π m), so in sketch mode, the average-time complexity of Procedure 1 is O(c|W| ππππ π/π)ο― The following lemma shows how t can be used to update the state s from position i-1 in the event sequence W to position t: Lemma 2. Let s(i) denote the string s that defines the state corresponding to position i in the event sequence E. define the initial state s(0)[0..m]=0(1m). Then for every i∈1..n and every j∈1..m s(i)[j]= s(i-1)[j-1]βt[x[i],j] (13) Where β is the logical OR operator. Proof. Let i be the current position in E. Observe that, no calculation is performed on s(i-1)[m]. Suppose first that for any j∈1..m, s(i-1)[j-1]=0, so that by (7) W[i-j+1..i-1]=P[1..j-1]. If t[W[i],j]=0, then by (8) it is moreover true that W[i]= P[j], so that we should also set s(i)[j]←0. Conversely, If t[W[i],j]=1, then by (8) it is moreover true that W[i]≠P[j], and we should set s(i)[j]←1. Thus in each case the correct result is ensured by the logical OR in (14). ο― Theorem 2. Suppose a event sequence W[1..n] and a set P={P1,P2,…Pr} of patterns are given, let π = |P1|+|P2|+…+|Pr|. Then procedure 2 can compute all kapproximate occurrences in W of every pattern in P, using π©((π⌈π⁄π€⌉ + π)π) time and π©((π + π)⌈π⁄π€⌉) additional space, where w is the computer word length. Where ⌈… ⌉ is Supremum symbol; π© is a notation that the average runtime describe the bounds on asymptotic growth rates of algorithm complexity. Proof. If π ≤ π€, π [1. . π] and π‘[β, 1. . π] each occupy at most one computer word, so that the calculation of s can be viewed as a pair of constant-time operations: a one-bit shift followed by a logical OR in a one-word register. More generally (i.e. m>w), for arbitrary m, π [1. . π] and π‘[β, 1. . π] each occupy ⌈π⁄π€⌉ computer words and therefore each calculation of s requires Θ(⌈π⁄π€⌉) time. Procedure 2 must execute (7) (k+1)n times in order to locate all k-approximate matches of P in W. Each execution requires three right shifts, three 2 IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID AND operations and OR operation on each word of each sl, l=1,2,…k, hence constant time per word. Observe that the array π‘= π‘[1..π,1..m] can be computed in the pre-matching phase that initially sets every word to 1π , then resets position π‘[π[π], π] ← 0 for every π ∈ 1. . π , hence the array t can be computed in time Θ(π⌈π⁄π€⌉ + m) and not cost the execution time of procedure 2. (π) The additional bit operations required to correct the π π can be performed in constant time, and since they are all logical operations, they can be performed on words. The time complexity of Step (10) to step (19) is Θ(π(π⁄π) ), where π⁄π denotes the average length of pattern πππ’ , Thus, the time complexity of procedure 2 is Θ((π⌈π⁄π€⌉ + π)π), and (k+1) bit vectors and array t need Θ((π + π)⌈π⁄π€⌉) additional space.ο―ο REFERENCES [25] Ukkonen, E., “Approximate string-matching with q-grams and maximal matches”, Theoretical Computer Science, vol. 92, no. 1, pp. 191-211, Jan. 6, 1992, doi:10.1016/03043975(92)90143-4. [26] M. Kiwi1, G. Navarro, and C. Telha, “On-line approximate string matching with bounded errors”, Theoretical Computer Science, vol. 412, no. 45, pp. 6359–6370, 21 October 2011.