Covering problems from a formal language point of view Marcella ANSELMO Maria MADONIA Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 1 Covering a word Covering a word w with words in a set X X X X X X w Covering = concatenations +overlaps Example: X = ab+ba w = abababa a b a b a b a Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 2 Why study covering ? • Molecular biology: manipulating DNA molecules (e.g. fragment assembly) • Data compression • Computer-assisted music analysis Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 3 Literature • Apostolico, Ehrenfeucht (1993) • Brodal, Pedersen (2000) • Moore, Smyth (1995) w is ‘quasiperiodic’ x is a ‘cover’ of w • Iliopulos, Moore, Park (1993) • Iliopulos, Smyth (1998) x ‘covers’ w ‘set of k-covers’ of w • Sim, Iliopulos, Park, Smyth (2001) (complete references) p ‘approximated period’ of w All algorithmic problems!!! (given w find ‘optimal’ X) Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 4 Formal language point of view Formal language point of view is needed! Madonia, Salemi, Sportelli (1999) [MSS99]: If X A*, X cov = set of words ‘covered’ by words in X also Xcov = (X, A*), set of z-decompositions over (X, A*) Here: Coverings not simple generalizations of z-decompositions! Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 5 Formal Definition A , alphabet A , disjoint copy w an ... a1 if If s.t. a a aa * w a1 ... an A A . If X A* , X x | x X . * w A A , red(w) = canonical representative of the class of w in the free group (Ex: w ab b ba ba a ab, red(w) ababab) Def. A covering (over X) of w in A* is =(w1, …, wn) s.t. 1. n is odd; for any odd i, wi X * for any even i, wi A 2. red(w1… wn) = w 3. for any i, red(w1…wi) is prefix of w Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 6 Example: X = ab+ba w = ababab. =(ab, b , ba, , ba, a , ab) is a covering of w over X 1. n is odd; for any odd i, wi X; for any even i, wi A * 2. red(ab b ba ba a ab) = ababab 3. for any i, red(w1…wi) is prefix of w : a b a b a b Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 7 Concatenation, zig-zag, covering Concatenation submonoid X* Zig-zag z-submonoid X cov-submonoid Xcov Covering cov-submonoid Ravello 19-21 settembre 2003 z-submonoid submonoid Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 8 Splicing systems for Xcov X, finite S, splicing system s.t. L(S) = Xcov $ COV2(X) = x1 x 2 x3 | x1, x 2, x3 A*, x1 x 2, x 2 x3 X Start with: x $, xX or COV2(X) Rules: (, x, $), xX (, x, x3$), x=x1x2, x2x3 X Example: X= ab+ba, w=#ababaab$ L(S) a b a b $ x = ab a b a $ Ravello 19-21 settembre 2003 a b a b a $ b a a b $ x = ba a b a b a a b$ Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 9 Coding problems [MSS99] How many coverings has a word? Example: • X=ab + ba, w = ababab X cov w has many different coverings over X : 1 =(ab, , ab, , ab) 2 =(ab, b, ba, , ba , , ba, a, ab) 3 =(ab, b , ba, a , ab, , ab) 4 =(ab, , ab, b, ba, a , ab) 5 =(ab, b , ba, a , ab, b , ba, a ,ab) Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 10 Covering codes [MSS 99] X A* is a covering code if any word in A* has at most one minimal covering (over X). Example: X = ab + ba is not a covering code (remember δ1, δ2) Example: X = aabab + abb is a covering code Example: X= ab+a + a Ravello 19-21 settembre 2003 is a covering code Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 11 Cov - freeness Let M A*, cov-submonoid. cov-G(M) is the minimal X A* such that M= Xcov. M is cov-free if cov-G(M) is a covering code. Fact: M free M stable (well-known) M z-free M z-stable (known) Question: M cov-free M ‘cov-stable’? We want ‘cov-stability’ = global notion equivalent to cov-freeness. Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 12 Toward a cov-stability definition (I) stable u,w,uv,vw M z-stable w, vw M , uv, u Z-p-s(uvw) implies v Z-p-s(uvw) cov-stable? w, vw, uvx, uy M, for x <w and y <vw, implies vx M ? implies wM Not always! Example: X = abcd+bcde+cdef+defg Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 13 Toward a cov-stability definition (II) Main observation in the classical proof of (stable implies free): • x minimal word with 2 different factorizations: the last step in a factorization from the last step in the other factorization New situation with covering: u w So we have to study the case v = . Example: X = abc + bcd + cde Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 14 Cov – stability Def. M is cov-stable if w, vw, uvx, uy M, for x w and y vw 1. If v , then vz M, for some z w Moreover vx M if y v 2. If v = , u and x y then t M, for some t proper suffix of ux Remark: cov-stable implies stable Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 15 Cov-stable iff cov-free Theorem: M covering submonoid. M is cov-stable M is cov-free Proof: many cases and sub-cases (as in definition!) Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 16 Some consequences Fact 1: (cov-free cov-free) cov-free Fact 2: cov-free implies free (not viceversa) Fact 3: cov-free implies very pure (not viceversa) Fact 4: M covering submonoid, X= cov-G(M). M cov-free implies X* free. Fact 5: cov –free z-free free Remark: Covering not simple generalization of z-decomposition! Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 17 Cov - maximality and cov-completeness Let X A*, covering code. X is cov-complete if Fact(Xcov). X is cov-maximal if X X1, covering code Fact: X cov-complete Remark [MSS99]: X cov-complete Remark complete maximal X=X1 X cov-maximal X infinite (unless X=A) cov-complete (not viceversa) cov-maximal (not viceversa) Example: X=ab+a +a Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 18 Counting minimal coverings X A*, regular language covX : w number of minimal coverings of w A, 1DFA recognizing X X A 1 X B, 2FA recognizing Xcov Remark: B counts all coverings of w Xcov Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 19 Remark on minimal coverings Remark: In minimal coverings, no 2 steps to the left under the same occurrence of a letter Crossing sequences in B for minimal coverings of w: w 1 1 1 1 1 1 1 1 1 Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 20 A 1NFA automaton for covX CS3 = crossing sequences of length 3 and no twice state 1 (cs,a) =cs’ if cs matches cs’ on a C = (CS3, (1), , (1) ) a A: Example: X = ab + ba, 4 1 b 2 1 C: 2 a a b b b 2 3 a 3 1 b a 1 1 3 Ravello 19-21 settembre 2003 a b a 3 b 1 2 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 21 Some remarks • Language recognized by C = X cov • X regular implies X cov regular • Behaviour of C is covX • X regular implies covX rational • X covering code iff C unambiguous (decidable) (different proof in [MSS99]) Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 22 Conclusions and future works • Formal language point of view is needed • Covering not generalization of zig-zag (or z-decomposition): many new problems and results •Further problems: covering codes: measure special cases: |X| =1, X Ak suggestions … Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 23 x w is‘quasiperiodic’ x is a ‘cover’ of w x x x w x x x x x ‘covers’ w w X X ‘set of k-covers’ of w Ravello 19-21 settembre 2003 X X X X Ak w Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 24 Example: X = ab+ba a b a b a b w = ababab Xcov a b a b a b w = ababab (X, A*) Xcov = (ab + ba+ aba + bab)* Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 25 1: a b a b a b 2: a b a b a b All the steps to the right are needed for covering w: δ1, δ2 are minimal coverings! Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 26 3: a b a b a b 4: a b a b a b 5: a b a b a b All blue steps are useless for covering w : δ3, δ4, δ5 are not minimal. We count only minimal coverings. Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 27 Toward a cov-stability definition (I) stable u,w,uv,vw M u z-stable v vM w w, vw M , uv, u Z-prefix-strict(uvw) v Z -prefix-strict(uvw) u Ravello 19-21 settembre 2003 v w Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 28 Example: X= abcd+bcde+cdef+defg M=Xcov a b c d e f g vx Set u=ab, v=c, w=defg, x=de, y=cd. Therefore w, vw, uvx, uy M but vx =cde M. •Note vz=cdef M, Ravello 19-21 settembre 2003 zw. Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 29 Example: X = abc + bcd + cde M=Xcov w a b c d e u x Set u=ab, v= , w=cde, x=cd, y=c. Therefore w, vw, uvx, uy M but vz M for no z w. • Note bcd M, Ravello 19-21 settembre 2003 bcd proper suffix of ux. Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 30 Case 1. u w v v vz M y v zw x y u w v v vx M y v y Ravello 19-21 settembre 2003 x Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 31 Case 2. w u v x y u x y t M, t proper suffix of ux Ravello 19-21 settembre 2003 Covering Problems from a Formal Language Point of View M. Anselmo - M. Madonia 32