1 2 3 47 6 Journal of Integer Sequences, Vol. 10 (2007), Article 07.6.5 23 11 The Minimal Density of a Letter in an 883 Infinite Ternary Square-Free Word is 3215 Andrey Khalyavin Mech. & Math. Department Moscow State University 119992 Moscow Russia halyavin@land.ru Abstract The problem of determining the minimal density of a letter in an infinite ternary square-free word was investigated by Tarannikov and Ochem. In this paper we solve 883 this problem, and prove that the minimal density is equal to 3215 . 1 Introduction A word is called square-free if it cannot be written in the form axxb for two words a, b and a nonempty word x. In this paper we study the minimal density of a letter a in an infinite square-free word over the alphabet {a, b, c}. The density of a letter a in a finite where n(w) is the number of letters a in w and |w| is the length of the word w is n(w) |w| word. The density of a letter a in an infinite word is lim inf l→∞ n(wl ) |wl | where wl is the prefix of the word w of length l. Tarannikov [1] found the minimal density of a letter a with up to 4 tight digits after the decimal point. Ochem [2] improved this result, proving that the , 883 ]. In this article we completely minimal density of a letter a belongs to the interval [ 1000 3641 3215 883 solve this problem and prove that the minimal density of a letter a is equal to 3215 . We use huge computer calculations in our proof. You can find sources of the programs at http://mech.math.msu.su/department/dm/dmmc/PUBL/words.zip. 1 2 Main result 883 Our aim is to prove that the minimal density is at least ρ = 3215 . Let us construct 4 special words b1 , . . . , b4 of length 3215 that contain 883 letters a each (they are probably the same words used by Ochem [2] in order to construct an infinite word with density ρ). We call these words the basic words.The files with these basic words can be found at the URL address above. We proved the next theorem by a computation using the technique of backtracking. Theorem 1. In any square-free word of length 90000 there exist either 1) a prefix with density greater than or equal than ρ, 2) two adjacent basic words (a subword bi bj ). So, either an infinite square-free word can be split into words with densities greater than or equal than ρ or there exist two basic words one after another. In the first case the density of an infinite word is greater than or equal than ρ because the lengths of the words in the splitting are bounded by 90000. So we can assume that an infinite square-free word begins with a word bi bj . When our program that proves Theorem 1 finds two adjacent basic words in the current word w, it saves the pair (n(w), |w|) and backtracks. Let us denote by W the set of all such current words. Then we delete equal pairs (n(w), |w|) from the list and obtain 80 distinct pairs (n(w), |w|). Let us denote by L the set of these pairs. Then for all l < 200000 we calculate the maximal number p(l) such that after the concatenation of any words w ∈ W and v such that n(v) ≥ p(l), |v| = l, the word vw has the density greater than or equal than ρ. So, p(l) = max(n(w),|w|)∈L [ρ(l + |w|)] − n(w) + 1. Our second program proves the next theorem. Theorem 2. Let u be an infinite square-free word that begins with a word bi bj . Then u contains one of the following prefixes: 1) a prefix w such that 2 · 3215 < |w| < 200000 and n(w) ≥ p(|w|), 2) a prefix w such that 4 · 3215 ≤ |w| < 200000, n(w) ≥ ρ|w|, that ends with two adjacent basic words. Now, we are prepared to prove Theorem 3. An infinite square-free word that begins with a word bi bj has density not less than ρ. Proof. Let us consider an infinite square-free word u that begins with a word bi bj . Let us show that u must either have density greater than or equal than ρ or start with one of the following prefixes: 1) the prefix ww1 w2 . . . wn v such that |w| < 200000, |wi | < 200000, 2 · 3215 < |v| < 200000, n(wi ) ≥ ρ|wi |, n(wv) ≥ ρ|wv|, which ends with two adjacent basic words. 2 2) the prefix w such that 4 · 3215 ≤ |w| < 200000, n(w) ≥ ρ|w|, which ends with two adjacent basic words. Let us apply Theorem 2 to our infinite square-free word. If the second case of Theorem 2 holds, we have the second case in the above sentence. Otherwise we apply Theorem 1 to the word u without the prefix w. We obtain either A) a subword v that ends with two adjacent basic words or B) a subword w1 with the density greater than or equal than ρ. In the case A) we have the second case in our sentence since n(w) ≥ p(|w|). In the case B) we apply Theorem 1 to the word u without the prefix ww1 , and so on. If the second case of Theorem 1 never occurs then u = ww1 w2 . . . where lengths of words w and wi are bounded. Therefore the density of u is greater than or equal than ρ. If the second case of Theorem 1 occurs on the nth step, we obtain the prefix ww1 . . . wn−1 v such that n(w) ≥ p(|w|), v ∈ W and so n(wv) ≥ ρ|wv|. Therefore our word u can be split into subwords w and ww1 w2 . . . wn v with densities greater than or equal than ρ and bounded lengths of words w,wi and v. Therefore the word u also has the density greater than or equal than ρ. From Theorem 1 and Theorem 3 we obtain immediately our main Theorem: Theorem 4. The density of a letter a in an infinite square-free word is greater than or equal than ρ. Using the result from Ochem [2] and Theorem 4 we obtain that the minimal density of an infinite square-free word is equal to ρ. 3 Optimization of calculations We have spent about 12 hours of calculations (on a 2GHz CPU) in order to prove Theorems 1 and 2. In order to speed up calculations we used a special algorithm for checking the square-free property. The program proving the theorems above uses a backtracking algorithm. So we must check on every step if the current word S ends with a word of the form ww. Although the algorithm described below uses O(n2 ) time in the worst case for checking the square-free property, on average it uses sublinear time. First, we choose m. In our experiments the optimal m was about 100. Then we calculate the exact and inexact hash for all positions from the mth character to the end of the current string. Let us denote by Si the number corresponding to the ith symbol (Si = 0 for symbol a, Si = 1 for symbol b, Si = 2 for symbol c). Let us define the inexact hash for the jth position as hj = Sj + Sj−1 p + Sj−2 p2 + · · · + Sj−m+1 pm−1 ( mod P ) where p = 5 and P is the size of the hash table (we use P = 100003 for Theorem 1 and P = 200003 for Theorem 2). If there do not exist positions before the jth with the exact hash hj then we define the exact hash of the jth position equal to the inexact hash. If such positions exist, we check the last m symbols before the jth position. If the last m symbols before the jth position coincide with the last m symbols before kth position with the exact hash hj , we define the exact hash for the jth position equal to hj . In the opposite case we check all positions that have the exact hash equal to hj + 1 and repeat the previous procedure. Then we check positions with the exact hash equal to hj + 2 and 3 so on. We continue to increase the hash value until we have found a position with the same last m characters or hash value that does not correspond to any position in the string. We define the exact hash of the jth position equal to the last checked hash value. In order to check hash values quickly we use the table where for each hash value the last position with the same exact hash is written. Also for each position in the string we store the nearest position before it with the same exact hash. Thus, all positions with the same exact hash are linked to the list. Hence, the exact hash of the position uniquely defines the last m symbols before it and we can quickly update it after deleting or inserting a symbol to the end of the current string. In order to check if the current word S ends with a word of the form ww, we check possible lengths (of word w) less than m directly. Then we search all positions with exact hash equal to the exact hash of last position. Then we compare strings for these positions. We speed up this procedure by m times using exact hash values. 4 Acknowledgments The author is grateful to his scientific supervisor Prof. Yuriy Tarannikov, Dr. Roman Kolpakov and the anonymous referee for the attention to this work and helpful remarks. References [1] Y. Tarannikov. The minimal density of a letter in an infinite ternary square-free word is 0.2746... J. Integer Sequences 5(2): Article 02.2.2 (2002). [2] P. Ochem. Letter frequency in infinite repetition-free words. Proceedings of Words 2005, Montreal, Canada, September 13-17 (2005), 335-340. 2000 Mathematics Subject Classification: Primary 11B05. Keywords: combinatorics on words; square-free word; factorial languages; minimal density. (Concerned with sequence A006156.) Received June 19 2006; revised versions received May 5 2007; June 12 2007. Published in Journal of Integer Sequences, June 14 2007. Return to Journal of Integer Sequences home page. 4