ON THE EXPRESSIVE POWER OF SHUFFLE PRODUCT Antonio Restivo Università di Palermo A very general problem: Given a basis B of languages, and a set O of operations, characterize the family O(B) of languages expressible from the basis B by using the operations in O. The Family REG of Regular Languages: The basis: B = { {a} | a } U {ε} The operations: O = {union, concatenation, (Kleene) star} REG = O(B) REG is closed also under all Boolean operations The Family SF of Star-Free Languages The basis: B = { {a} | a } U {ε} The operations: O = {Boolean operations , concatenation} SF = O(B) Shuffle Product The shuffle of two words u and v is the set u ш v = {u1v1…unvn|n≥0, u1…un=u, v1…vn=v} ab ш ba = {abba, baab, abab, baba} The shuffle of two languages L and K is the language L шK = UuєL,vєK u ш v Expressive Power of the Shuffle Very little is known about classes of languages closed under shuffle, and their study appears to be a difficult problem. Such a study, apart its theoretical interest, is also motivated by applications to the modeling of process algebras and to program verification The Family INT of Intermixed Languages The basis: B = { {a} | a } U {ε} The operations: O = {Boolean operations, concatenation, shuffle} INT = O(B) Theorem (Berstel, Boasson, Carton, Pin, R.) SF INT REG SF INT REG The Problem Problem 1: Give a (decidable) characterization if the family INT Proposition INT is not a variety (in the sense of Eilenberg) Remark: REG and SF are varieties Periodicity A language L * is aperiodic , or non-counting, if there exists an integer n 0 such that, for all x,y,z *, one has xynz L xyn+1z L. Theorem (M.P. Schutzenberger) A regular language L is aperiodic if and only if it is star-free Periodicity The strict inclusion SF INT implies that the shuffle of two star-free languages in general is not star-free: «the shuffle creates periodicities» Problem 2: Determine conditions under which the shuffle of two star-free languages is star-free. Bounded Shuffle Let k be a positive integer. The k-shuffle of two languages L1 and L2 is defined as follows: L1 шk L2 = = {u1v1…umvm |m≤k, u1…umL1, v1…vmL2}. Any k-shuffle is called bounded shuffle Theorem (Castiglione, R.) SF is closed under bounded shuffle Corollary. The shuffle of a star-free language and a finite language is a star-free language Partial Commutations Let be an alphabet and let be a symmetric and reflexive relation, called (partial) commutation. Consider the congruence of * generated by the set of pairs (ab,ba) with (a,b). If L * is a language, [L] denotes the closure of L by . L is closed by if L = [L]. The closed subsets of * are called trace languages. Partial Commutations Let L1 and L2 be two languages over the alphabet. Let 1 and 2 be two disjoint copies of the alphabet (colored copies), and i: i , for i=1,2, the corresponding bijections. Let L’1 (L’2 resp.) be the subset of 1 (2 resp.) corresponding to the L1 (L2 resp.) under the morphism 1 (2 resp.). Let = 1 2 and consider the partial commutation 1 2 and let : * * be the morphism induced by 1 and 2 (delete colours). The -product of L1 and L2 is L1 ш L2 = ( [L’1L’2]) Partial Commutations bacbcaabca L1, babcacbab L2 bacbcaabca babcacbab = {(a,a), (a,b), (b,a), (b,b), (c,a) (c,b), (c,c)} bbaabcbcaabcacacbab L1 ш L2 The -product generalizes at the same time concatenation and shuffle: If = , then L1 ш L2 = L1L2 If = 1 2, then L1 ш L2 = L1 ш L2 Partial Commutations Given the partial commutation 1 2, we define the partial commutation ’ defined as follows: (a,b) ’ (a,b) Theorem (Guaiana, R., Salemi) Let L1, L2 be languages over , closed under ’. If L1, L2 SF, then L1 ш L2 SF. Corollary. The shuffle of two commutative starfree languages is star-free Partial Commutations If the internal commutation ’ (i.e.the commutation allowed inside each of the languages L1, L2) is the «same» as the external commutation (i.e. the commutations between the letters in L1 and the letters in L2), then the -product preserves the star-freeness. Unambiguous Star-Free Languages A language L is the marked product of the languages L0, L1, …, Ln if L = L0a1L1a2L2 … anLn, for some letters a1, a2, … , an of . A marked product L = L0a1L1a2L2 … anLn is unambiguous if every word of L admits a unique decomposition u = u0a1u1…anun, with u0 L0, … , un Ln. The product {a,c}*a{}b{b,c}* is unambiguous Unambiguous Star-Free Languages SF is the smallest Boolean algebra of languages which is closed under marked product The family USF of Unambiguous Star-Free languages is the smallest Boolean algebra of languages of * containing the languages of the form A*,for A , which is closed under unambiguous marked product. Unambiguous Star-Free Languages FO : class of languages corresponding to formulas of first order logic. FOk : class of languages corresponding to formulas of first order logic with k variables. Theorem (McNaughton) SF = FO Theorem (Immerman, Kozen) FO3 = FO Theorem (Therien, Wilke) FO2 = USF Unambiguous Star-Free Languaes USF SF INT REG USF SF INT REG Theorem (Castiglione, R.) If L1, L2 USF then L1 ш L2 SF Cyclic Submonoids The languages in the class USF correspond to regular expressions in which the star operation is restricted to subsets of the alphabet. The simplest languages not in USF are the languages of the form L = u*, where u is a word of length 2. Such languages are the cyclic submonoids of *. We here study the shuffle of cyclic submonoids. Cyclic submonoids Theorem (Berstel, Boasson, Carton, Pin, R.) If a word u contains at least two different letters, then u* INT. A word u * is primitive if the condition u=vn, for some word v and integer n, implies u=v and n=1. Theorem (McNaughton, Papert) The language u* is star-free if and only if u is a primitive word. Cyclic submonoids u = b, v = ab b* ш (ab)* = (b + ab)* SF u = aab, v = bba (aab)* ш (bba)* (ab)* = ((ab)3)* (aab)* ш (bba)* SF Problem 3: Characterize the pairs of primitive words u,v such that u*ш v* is a star-free language. Combinatorics on Words Theorem (Lyndon, Schutzenberger) If u and v are distinct primitive words, then the word unvm is primitive for all n,m 2. Theorem (Shyr, Yu) If u and v are distinct primitive words, then there is at most one non-primitive word in the language u+v+. Combinatorics on Words Problem 3 is related to the search for the powers (non-primitive words) that appear in the language u+ ш v+. Denote by Q the set of primitive words. For u,v,w Q, let p(u,v,w) be the integer k such that (u*ш v*) w* = (wk)* If (u*ш v*) w* = {}, then p(u,v,w) = 0. Combinatorics on Words For u,v Q, define the set of integers P(u,v) = { p(u,v,w) | w Q}. For instance, if u = a10b and v = b then P(u,v) = {0,1,2,5,10}. Problem 4: Given two primitive words u, v, characterize the set P(u,v) in terms of the combinatorial properties of u and v.