Lecture 2: Characterization of entropies of multidimensional shifts of finite type Michael Hochman The Hebrew University of Jerusalem Combinatorics, Automata and Number Theory CIRM, May 2012 Recall that for an X = SFTd (A, F) h(X ) = lim n→∞ 1 log Nn (X ) nd where Nn (X ) = #{globally admissible patterns on [n; n]d } Theorem A number α is the entropy of a 2-dimensional shift of SFT if and only if α = inf{αn } for some computable sequence (αn )n∈N . The results for higher dimensions can either be proved in the same way, or derived from this result. Proposition Let F be a finite set of A-patterns and define e n (A, F) = #{a ∈ A[1,n]2 : a is locally admissible for F} N Then the entropy of X = SFT (A, F) is h(X ) = inf n 1 e n (A, F) log N n2 e n (A, F) is computable from (A, F), this Since the sequence N shows one direction of the theorem, that is, that h(X ) ∈ Π1 whenever X is an SFT. Proof of proposition. Every locally admissible pattern is globally admissible, so e n (A, F) ≥ Nn (X ) N therefore inf n 1 e n (A, F) ≥ inf 1 log Nn (X ) = h(X ) log N 2 n n2 n Since h(X ) = inf n12 log Nn (X ), for the reverse inequality it is enough to show that for each n0 , inf n→∞ 1 e n (A, F) ≤ 1 log N 2 (A) log N n0 n n02 2 Fix n0 . For every locally but not globally admissible a ∈ A[1,n0 ] , there is an N(a) such that a cannot be extended to a locally admissible pattern on [−N(a), N(a)]2 . Let 2 N = max{N(a) : a ∈ A[1,n0 ] is locally but not globally admissible} Now fix a large n and partition [1; n]2 as follows: Divide [1, n] × [1, n] into a maximal array of n0 × n0 squares that are all at least distance N of the complement of [1, n]2 . The area of the array is at least n2 − (n − 4N)2 , and the number of cells in the array is at least [(n − 4N)2 /n0 ] Also note: Every square R in the array lies at the center of a square R 0 of side 2N + 1 contained in [1; n]2 , 2 Suppose b ∈ A[1;n] is locally admissible. For each R, R 0 as above, b|R 0 is locally admissible, so b|R is globally admissible. Therefore the number of possible b is at most #{colorings of boundary} × #{colorings of array} ≤ |A|n 2 −(n−4N)2 2 /n × Nn (X )[(n−4N) 0] e n (A, F). Taking logarithms, This is an upper bound on N 2 2 1 e n (A, F) ≤ O(nN + N ) log |A| + (n − 4N) · 1 log Nn (X ) log N 0 n0 n2 n2 n2 Taking n → ∞, we have the desired inequality inf n→∞ 1 e n (A, F) ≤ 1 log N 2 (A) log N n0 n n02 Now let α ∈ Π1 . Our goal is now to construct a 2-dimensional SFT with entropy equal to α. We first perform a reduction. Definition 2 For A0 ⊆ A and a ∈ A[1,n] define #(A0 , a) = #{u ∈ |a| : au ∈ A0 } 2 Let X ⊆ AZ let δn (A0 |X ) = 1 max{#(A0 , a) : a = x|[1;n]2 for some x ∈ X } n2 and define the upper density of A0 in X by δ(A0 |X ) = lim δn (A0 |X ) n→∞ Note that for an SFT X = SFT (A, F), δn (A0 |X ) = 1 max{#(A0 , a) : a is globally admissible} n2 Proposition Suppose that X = SFT (A, F) and h(X ) = 0. Let A0 ⊆ A. Then there is an SFT Y = SFT (A0 , F 0 ) of the same dimension as X such that h(Y ) = δ(A0 |X ). Proof. Let A0 = A × {0, 1} and identify A0 -patterns a0 ∈ (A0 )E with pairs (a, b) where a ∈ AE and b ∈ {0, 1}E . Let and b a {0, 1}-pattern. Let F 0 = {(a, b) : a ∈ F}∪{(a, b) : au ∈ A\A0 and bu = 1 for some u} d If a ∈ A[1,n] is globally admissible for F then the number of globally admissible words a0 = (a, b) for F 0 is 2#(A0 ,a) . Hence 2n 2δ n (A0 ,a) ≤ Nn (A0 , F 0 ) ≤ Nn (A, F) · 2n 2δ n (A0 ,X ) Taking logarithms and dividing by 1/nd , and using the fact that 1 log Nn (A, F) → h(X ) = 0 we find that, writing nd Y = SFT (A0 , F 0 ), Notice how the construction worked: 1. we started with X = SFT (A, F). 2. We formed the product alphabet A0 = A × B. 3. We added rules in F 0 to ensure that the first level obeys the constraints defined by F. 4. We added additional constraints involving the second and possibly first layers. Every y ∈ SFT (A0 , F 0 ) now has a first layer x ∈ X , and the second layer (B-symbols) obeys local constraints that may depend locally on the x layer (in the example, the symbol 1 could only appear over symbols from A0 ). This process is called superposition. By a Turing machine we mean a finite automaton moving on a 1-dimensional array of cells. Each cell contains one read-only bit and one re-writable bit. At each time step I I It reads both bits from the current cell. Based on the input and its internal state, it I I I Updates the re-writable bit in the current cell. Moves one cell left or right. Enters a new internal state. Formally if S is the set of internal states, T : {0, 1} × {0, 1} × S → {0, 1} × {0, 1} × {←, →} × S There is also I A special initial state s0 ∈ S, which we assume is never re-entered; I A set of halting states H ⊆ S on which T is not defined. So in fact T : {0, 1}×{0, 1}×(S\H) → {0, 1}×{0, 1}×{←, →}×(S\{s0 }) Let T be a given Turing machine. We can encode a machine at a cell in a single symbol in {0, 1}2 × S. When there is no machine at a cell, we encode the data pair and a symbols ⇐,⇒ to indicate in which direction the machine is located. We call the resulting alphabet AT . Now a typical pairs of rows of the computation looks like this: Let T be a given Turing machine. We can encode a machine at a cell in a single symbol in {0, 1}2 × S. When there is no machine at a cell, we encode the data pair and a symbols ⇐,⇒ to indicate in which direction the machine is located. We call the resulting alphabet AT . Now a typical pairs of rows of the computation looks like this: Of a cells is on the right or left of a cell with a machine, the cell contains an arrow pointing to the machine. Otherwise, an arrow must point to an identical arrow. This forces every row with a machine to have all arrows pointing to it. In particular, there is at most one machine in the row, though there may be rows without any machine – only arrows pointing in a common direction. The transition from row to row can be prescribed locally because the contents of a cell is determined by T and the three cells below it. One question is: how does the machine initialize if no transition leads into the initial state? By our rules we have not forbidden the following patterns: Thus in an admissible configuration, either all rows do not contain a machine, the data never changes, and the arrows can switch direction arbitrarily; or there is a row i0 with the machine in its initial state s0 . All rows below i0 do not contain a machine and behave as in the first alternative; all rows above i0 are obtained from the previous row by the transition rule of T . Note that if in some row the machine is in a halting state, there is no admissible row above it. In particular, let a be a symbol encoding the initial state of the machine. Then a can be extended to an admissible configuration if and only if there is a choice of initial date such that the machine does not halt. This shows that it is undecidable whether a is globally admissible (Wang’s theorem). Also note that the SFT X we have constructed has entropy 0. This is because the pattern on the boundary of [1, n]2 determines the interior unless the machine “appears” inside the square, in which case one must also know in which column (the boundary arrows determin which row). This gives a bound of the form Nn (X ) ≤ c n for some constant c, so log(Nn (X ))/n2 → 0 = h(X ). Controlling frequencies. Given the alphabet AT associated to the Turing machine T , consider the set of symbols A0T = {a ∈ A : There is a 1 in the read-only data layer} Suppose x is an admissible configuration containing a machine. This means that the machine did not halt. Also, notice that the read-only data symbol is constant on columns. Therefore #(A0T , x|[1,n] ) = #{columns 1...n with read-only data symbol 1} Let T0 be a Turing machine computing a sequence αn ≥ 0 and α = inf αn Assume αn > α + 1/ log(n) (if not, replace αn by αn + 1/ log(n)) Let T be the Turing machine that for each n reads the read-only data word an from −n to n relative to the machine’s initial position; and if #(A0T , an ) > αn , the machine halts. Construct the SFT X associated to T . The result: If x is an admissible configuration with a machine, then the density of A0T -symbols is ≤ α. One can show that there are initial configurations of read-only data that give density exactly α. We have almost ensured that δ(A0T , X ) = α. But we do not control densities in configurations with no machine. We must force computation to occur. Definition A board is a set B ⊆ Z2 of the following form: for some intervals I = [a; b] and J = [a0 , b0 ], and subsets I0 ⊆ I and J0 ⊆ J that include the endpoints but no consecutive integers, B = (J × I0 ) ∪ (I0 × J) |I0 | is the width of B and |J0 | is its height. Definition 2 d Let x ∈ AZ and A0 ⊆ A. A configuration x ∈ AZ has arbitrarily large boards (with respect to A0 ), if {u ∈ Z2 : xu ∈ A0 } S is a union of boards Bi without common or adjacent sites, and for each n some of the Bi have width and height ≥ n. Proposition (Robinson) There exists a 2-dimensional SFT X 6= ∅ on an alphabet AX , and a symbol b ∈ AX , such that every x ∈ X has arbitrarily large boards (with respect to {b}), and h(X ) = 0. Let X be as in the proposition and T a Turing machine. 2 We next describe how to construct an SFT Y ⊆ X × AZ for some alphabet A, such that in every configuration (x, x 0 ) ∈ Y , if B is a board appearing in x, the pattern x|B records the operation of T . We use the “junctions” of the board as cells of the machine’s tape (each row of junctions is a row of cells). Junctions can be identified locally so we can require that only they carry the symbols used to simulate T . Each junction needs to know what symbols are in the junctions to its left and right (if such exist). We allow symbols on the vertical edges consisting of pairs of symbols from AT : I On an edge adjacent to a junction, the symbol in the junction appears adjacent to the junction. I Two adjacent cells on an edge carry the same symbol. Now at each junction we have enough enough information (using the immediate neighbors right and left) to determine what the symbol is in the junction above. But we still need to get the information there. We allow symbols from AT on the vertical edges, with the rule that I In the vertical edge immediately above a junction, there appears the symbols from AT that should appear in the next junction above. I If two vertical edges are adjacent they carry the same symbol. I A junction must have the same symbol as the vertical edge below it, if one exists. We require that in the bottom row of a board there is a machine in initial state in the lower left corner. We also must introduce rules to deal with vertices on the left and right boundaries (they do not have adjacent cells on the right and left). For example we can interpret this as a special input symbol to the machine, indicating that the tape ands, and assume that the machine knows how to deal with this. We have associated to T an SFT such that each configuration in the SFT contains arbitrarily large boards, and each board of size n × n represents the computation of T for n steps (without halting). This SFT is empty if and only if T halts on every input. This is the proof of Berger’s theorem. Returning to densities... Now suppose that T0 is a Turing machine computing a sequence αn ↓ α and we define T as before to check densities of the input using T0 . Then for every ε > 0, in every large enough board the density of junctions with data symbol 1 cannot exceed α + ε. Also there are arbitrarily large boards with densities arbitrarily close to α. (Also, the entropy of the SFT is still 0). But what about outside of the boards? We can easily synchronize the read-only symbols in the boards so that they are constant on each column (they are currently constant on the columns inside a given board). To complete the proof there is one more stage that we will not carry out in detail: we force every sufficiently large board to “sample” enough columns so that the bounds on densities in boards applies everywhere. This forces δ(A0T , X ) = α and completes the proof of the characterization of entropies of 2-dimensional SFTs. For more detials, see [Hochman-Meyerovitch 2010].