The Minimal Density of a Letter in an 883 3215

advertisement
1
2
3
47
6
Journal of Integer Sequences, Vol. 10 (2007),
Article 07.6.5
23 11
The Minimal Density of a Letter in an
883
Infinite Ternary Square-Free Word is 3215
Andrey Khalyavin
Mech. & Math. Department
Moscow State University
119992 Moscow
Russia
halyavin@land.ru
Abstract
The problem of determining the minimal density of a letter in an infinite ternary
square-free word was investigated by Tarannikov and Ochem. In this paper we solve
883
this problem, and prove that the minimal density is equal to 3215
.
1
Introduction
A word is called square-free if it cannot be written in the form axxb for two words a, b
and a nonempty word x. In this paper we study the minimal density of a letter a in an
infinite square-free word over the alphabet {a, b, c}. The density of a letter a in a finite
where n(w) is the number of letters a in w and |w| is the length of the
word w is n(w)
|w|
word. The density of a letter a in an infinite word is lim inf
l→∞
n(wl )
|wl |
where wl is the prefix of
the word w of length l. Tarannikov [1] found the minimal density of a letter a with up
to 4 tight digits after the decimal point. Ochem [2] improved this result, proving that the
, 883 ]. In this article we completely
minimal density of a letter a belongs to the interval [ 1000
3641 3215
883
solve this problem and prove that the minimal density of a letter a is equal to 3215
. We
use huge computer calculations in our proof. You can find sources of the programs at
http://mech.math.msu.su/department/dm/dmmc/PUBL/words.zip.
1
2
Main result
883
Our aim is to prove that the minimal density is at least ρ = 3215
. Let us construct 4 special
words b1 , . . . , b4 of length 3215 that contain 883 letters a each (they are probably the same
words used by Ochem [2] in order to construct an infinite word with density ρ). We call
these words the basic words.The files with these basic words can be found at the URL address
above. We proved the next theorem by a computation using the technique of backtracking.
Theorem 1. In any square-free word of length 90000 there exist either
1) a prefix with density greater than or equal than ρ,
2) two adjacent basic words (a subword bi bj ).
So, either an infinite square-free word can be split into words with densities greater than
or equal than ρ or there exist two basic words one after another. In the first case the density
of an infinite word is greater than or equal than ρ because the lengths of the words in the
splitting are bounded by 90000. So we can assume that an infinite square-free word begins
with a word bi bj .
When our program that proves Theorem 1 finds two adjacent basic words in the current
word w, it saves the pair (n(w), |w|) and backtracks. Let us denote by W the set of all such
current words. Then we delete equal pairs (n(w), |w|) from the list and obtain 80 distinct
pairs (n(w), |w|). Let us denote by L the set of these pairs. Then for all l < 200000 we
calculate the maximal number p(l) such that after the concatenation of any words w ∈ W
and v such that n(v) ≥ p(l), |v| = l, the word vw has the density greater than or equal than
ρ. So, p(l) = max(n(w),|w|)∈L [ρ(l + |w|)] − n(w) + 1. Our second program proves the next
theorem.
Theorem 2. Let u be an infinite square-free word that begins with a word bi bj . Then u
contains one of the following prefixes:
1) a prefix w such that 2 · 3215 < |w| < 200000 and n(w) ≥ p(|w|),
2) a prefix w such that 4 · 3215 ≤ |w| < 200000, n(w) ≥ ρ|w|, that ends with two adjacent
basic words.
Now, we are prepared to prove
Theorem 3. An infinite square-free word that begins with a word bi bj has density not less
than ρ.
Proof. Let us consider an infinite square-free word u that begins with a word bi bj . Let
us show that u must either have density greater than or equal than ρ or start with one of
the following prefixes:
1) the prefix ww1 w2 . . . wn v such that |w| < 200000, |wi | < 200000, 2 · 3215 < |v| <
200000, n(wi ) ≥ ρ|wi |, n(wv) ≥ ρ|wv|, which ends with two adjacent basic words.
2
2) the prefix w such that 4 · 3215 ≤ |w| < 200000, n(w) ≥ ρ|w|, which ends with two
adjacent basic words.
Let us apply Theorem 2 to our infinite square-free word. If the second case of Theorem 2
holds, we have the second case in the above sentence. Otherwise we apply Theorem 1 to the
word u without the prefix w. We obtain either A) a subword v that ends with two adjacent
basic words or B) a subword w1 with the density greater than or equal than ρ. In the case
A) we have the second case in our sentence since n(w) ≥ p(|w|). In the case B) we apply
Theorem 1 to the word u without the prefix ww1 , and so on. If the second case of Theorem
1 never occurs then u = ww1 w2 . . . where lengths of words w and wi are bounded. Therefore
the density of u is greater than or equal than ρ. If the second case of Theorem 1 occurs
on the nth step, we obtain the prefix ww1 . . . wn−1 v such that n(w) ≥ p(|w|), v ∈ W and so
n(wv) ≥ ρ|wv|.
Therefore our word u can be split into subwords w and ww1 w2 . . . wn v with densities
greater than or equal than ρ and bounded lengths of words w,wi and v. Therefore the word
u also has the density greater than or equal than ρ.
From Theorem 1 and Theorem 3 we obtain immediately our main Theorem:
Theorem 4. The density of a letter a in an infinite square-free word is greater than or equal
than ρ.
Using the result from Ochem [2] and Theorem 4 we obtain that the minimal density of
an infinite square-free word is equal to ρ.
3
Optimization of calculations
We have spent about 12 hours of calculations (on a 2GHz CPU) in order to prove Theorems
1 and 2. In order to speed up calculations we used a special algorithm for checking the
square-free property.
The program proving the theorems above uses a backtracking algorithm. So we must
check on every step if the current word S ends with a word of the form ww. Although the
algorithm described below uses O(n2 ) time in the worst case for checking the square-free
property, on average it uses sublinear time. First, we choose m. In our experiments the
optimal m was about 100. Then we calculate the exact and inexact hash for all positions
from the mth character to the end of the current string. Let us denote by Si the number
corresponding to the ith symbol (Si = 0 for symbol a, Si = 1 for symbol b, Si = 2 for symbol
c). Let us define the inexact hash for the jth position as hj = Sj + Sj−1 p + Sj−2 p2 + · · · +
Sj−m+1 pm−1 ( mod P ) where p = 5 and P is the size of the hash table (we use P = 100003
for Theorem 1 and P = 200003 for Theorem 2). If there do not exist positions before the
jth with the exact hash hj then we define the exact hash of the jth position equal to the
inexact hash. If such positions exist, we check the last m symbols before the jth position.
If the last m symbols before the jth position coincide with the last m symbols before kth
position with the exact hash hj , we define the exact hash for the jth position equal to hj . In
the opposite case we check all positions that have the exact hash equal to hj + 1 and repeat
the previous procedure. Then we check positions with the exact hash equal to hj + 2 and
3
so on. We continue to increase the hash value until we have found a position with the same
last m characters or hash value that does not correspond to any position in the string. We
define the exact hash of the jth position equal to the last checked hash value. In order to
check hash values quickly we use the table where for each hash value the last position with
the same exact hash is written. Also for each position in the string we store the nearest
position before it with the same exact hash. Thus, all positions with the same exact hash
are linked to the list.
Hence, the exact hash of the position uniquely defines the last m symbols before it and
we can quickly update it after deleting or inserting a symbol to the end of the current string.
In order to check if the current word S ends with a word of the form ww, we check
possible lengths (of word w) less than m directly. Then we search all positions with exact
hash equal to the exact hash of last position. Then we compare strings for these positions.
We speed up this procedure by m times using exact hash values.
4
Acknowledgments
The author is grateful to his scientific supervisor Prof. Yuriy Tarannikov, Dr. Roman
Kolpakov and the anonymous referee for the attention to this work and helpful remarks.
References
[1] Y. Tarannikov. The minimal density of a letter in an infinite ternary square-free word is
0.2746... J. Integer Sequences 5(2): Article 02.2.2 (2002).
[2] P. Ochem. Letter frequency in infinite repetition-free words. Proceedings of Words 2005,
Montreal, Canada, September 13-17 (2005), 335-340.
2000 Mathematics Subject Classification: Primary 11B05.
Keywords: combinatorics on words; square-free word; factorial languages; minimal density.
(Concerned with sequence A006156.)
Received June 19 2006; revised versions received May 5 2007; June 12 2007. Published in
Journal of Integer Sequences, June 14 2007.
Return to Journal of Integer Sequences home page.
4
Download