Computability 2022 - Recitation 1 23/10/22 - 29/10/22 1 Set Theory You already know items 1-5. These are just quick reminders. 1. A set is a collection of elements. The empty set is denoted by ∅. For every element x we write x ∈ A if x is a member of A. It is always true that either x ∈ A or x ∈ / A, and exactly one of them hold. 2. Set operations and relations: • Union: The union of two sets is the set containing all the elements from each set. Formally, A ∪ B = {x : x ∈ A or x ∈ B} • Intersection: The intersection between two sets is the set containing all the elements that are in both sets. Formally, A ∩ B = {x : x ∈ A and x ∈ B} • Complementation: The complement of a set A (with respect to a set C) is the set of all the elements not in A. • Difference: The difference between sets A, B is the set containing all the elements in A that are not in B. Formally, A \ B = {x : x ∈ A and x ∈ / B} Exercise: Prove that A \ B = A ∩ B. • Containment: We say that A ⊆ B if every element of A is an element of B. Formally, if ∀x, x ∈ A =⇒ x ∈ B • Equality: We say that A = B iff A ⊆ B and B ⊆ A. This is important!!! 3. Power set: Let A be a set, the power set of A, denoted P (A) or 2A is defined as 2A = {C : C ⊆ A} Example: If A = {a, b}, then 2A = {∅, {a}, {b}, {a, b}} 4. Cartesian product: Let A, B be two sets. The Cartesian product A × B is the set of ordered pairs from A and B. A × B = {(a, b) : a ∈ A, b ∈ B} Example: {1, 2} × {a, b, c} = {(1, a), (2, a), (1, b), (2, b), (1, c), (2, c)} 1 We allow “longer” Cartesian products, where the elements are sequences. That is: A1 × A2 × ... × An = {(a1 , ..., an ) : ∀i, ai ∈ Ai } The product A × A × ... × A is denoted An . {z } | n 5. A relation between sets S, T is a subset R ⊆ S × T . We consider mostly relations between a set A and itself. Example: ≤⊆ × . N N A relation R ⊂ A × A is called: • Reflexive if ∀a ∈ A (a, a) ∈ R. • Symmetric if ∀a, b ∈ A (a, b) ∈ R =⇒ (b, a) ∈ R • Transitive if ∀a, b, c ∈ A if (a, b) ∈ R and (b, c) ∈ R then (a, c) ∈ R. We sometime denote aRb or R(a, b) if the relation holds. Example: Let A = {1, 2, 3, 4}, Consider the relation R = {(a, b) ∈ A × A : |a − b| ≤ 1} • Is (3, 2) ∈ A? Yes. • Is R reflexive? Yes, because |a − a| = 0 ≤ 1. • Is R symmetric? Yes, because |a − b| = |b − a|. • Is R transitive? No, (1, 2) ∈ R and (2, 3) ∈ R, but (1, 3) ∈ / R. A relation that is reflexive, symmetric and transitive is called an equivalence relation. For every element a ∈ A we can look at the equivalence class of a, [a]R = {b ∈ A : (a, b) ∈ R}. For equivalence relations, S these sets form a partition of A. That is, either [a]R = [b]R or [a]R ∩ [b]R = ∅, and a∈A [a] = A Example: Let G = hV, Ei be an undirected graph, and let ∼⊆ V × V be the following relation: for every u, v ∈ V we have that u ∼ v iff there is a path in G from u to v. It is easy to see that ∼ is an equivalence relation, and its equivalence classes are exactly the connected components of G. The converse also holds: Any partition of a set defines an equivalence relation on the set. Proof: left as an exercise. 6. Cardinalities: The cardinality of a set A, denoted |A|, is a measure of how many elements there are in a set. For finite sets, |A| is the number of elements in the set. For infinite sets, the situation is slightly more complicated. For two infinite sets A, B, we say that |A| = |B| iff there exists a bijection between them. We denote | | = ℵ0 . The following claims hold: Z Q N • | | = ℵ0 . This can be seen by counting the integers as 0, 1, −1, 2, −2, 3, −3, ... • | | = ℵ0 . • |[0, 1]| = 2ℵ0 = |2N |. Sets that have cardinality ℵ0 are called countable. We say that |A| ≤ |B| if there exists an injection from A to B. We say that |A| < |B| if |A| ≤ |B|, and there does not exist a surjection from A to B. The claim that ℵ0 6= 2ℵ0 is non-trivial, and you will see it later. 2 2 Languages 2.1 Definitions Let Σ be a finite (non-empty) set, which we will call alphabet. The elements of Σ are called letters. Example: Σ = {a, b} We now consider Σn for n ∈ Example: N. For n = 0, we define Σ0 = {ǫ}, where ǫ is an “empty sequence”. Σ3 = {(a, a, a), (a, a, b), (a, b, a), ..., (b, b, b)}. Define Σ⋆ = ∞ [ Σn n=0 ⋆ The elements of Σ are sequences of any finite length of letters from Σ, and we refer to those as words. ǫ is called the empty word. For convenience, we don’t write words as (a, b, b, b, a), but rather as abbba. We can concatenate words: w1 · w2 = w1 w2 . Observe that since Σ⋆ has words of any finite length, then it is closed under concatenation. Example: abb · bab = abbbab A formal language over the alphabet Σ is a set L ⊆ Σ⋆ . That is, a set of words. Example: • L1 = {ǫ, a, aa, b}. A finite language. • L2 = {w : w starts with an ’a’}. An infinite language. • L3 = {ǫ}. • L4 = ∅. Is this the same as L3 ? • L5 = {w : |w| < 24}. |w| is the number of letters in w. Finite language. ⋆ It is sometimes convenient to view languages as elements in 2Σ . This is a bit confusing at first, so let’s recap: • Languages are sets of words. • Languages are subsets of Σ⋆ . ⋆ • Languages are elements of 2Σ . What operations can we do on languages? First, any operation on sets. In particular, observe that complementation is done with respect to Σ⋆ . We can also concatenate languages: L1 · L2 = {w1 · w2 : w1 ∈ L1 , w2 ∈ L2 } 3 Example: Consider the languages L1 = {w : w begins with an ’a’} and L2 = {w : w ends with a ’b’}. • What is L1 ∩ L2 ? All the words that begin with a and end with b. • What is L1 ∪ L2 ? All the words that begin with a or end with b. • What is L1 · L2 ? Same as L1 ∩ L2 . • What is L2 · L1 ? All the words that contain ba. Example: Let’s try something harder. Consider the language L = {ww : w ∈ Σ⋆ } What is the complement of L? (denoted L) Answer: L = {x = x1 · · · xn : n is odd, or n is even and x1 · · · xn/2 6= xn/2+1 · · · xn } What is L · L? Answer: L · L = {wwxx : w ∈ Σ∗ , x ∈ Σ∗ } 2.2 Counting stuff Assume Σ 6= ∅. How many words are there in Σ⋆ ? ℵ0 . Proof: We can enumerate the words first by length and then in lexicographical order (after ordering Σ arbitrarily). ⋆ How many languages are there? |2Σ | = |2N | = 2ℵ0 . Since 2ℵ0 > ℵ0 , this means that there are more ∗ languages over Σ than words over Σ. More formally, there is no surjection from Σ∗ to 2Σ . 2.3 Are all languages regular? Well, are they? The answer is no. The reason is that there are ℵ0 regular languages, but 2ℵ0 languages. Indeed, a DFA can be characterized by a finite binary string (representing its states, transitions, etc.), and since every regular language has a corresponding automaton, there are at most ℵ0 such languages. But this argument is not constructive. That is, it says that there are non-regular languages, but it doesn’t give an example. Soon, we will learn how to prove that specific languages are non-regular! his is an important point. Much of what we do in this course is explore the limitations of computational models. 3 The function δ ∗ Consider the DFA A = hQ, Σ, δ, q0 , F i depicted in Figure 1. Suppose that a computation in A is currently in the state q and the letter read is σ, what will be the next state? This question is answered by δ : Q × Σ → Q. The next state will be δ(q, σ). For example, we see in the figure that δ(q1 , a) = q2 . One unfortunate property of δ is that it only allows us to ask what will happen after reading one letter. What if we’d like to ask about reading, say, three letters? To answer such questions, We can define δ ∗ : Q × Σ∗ → Q as follows: ( q, if w = ǫ ∗ δ (q, w) = ∗ ′ ′ ′ ∗ δ(δ (q, w ), σ), if w = w · σ where w ∈ Σ and σ ∈ Σ 4 Let’s try this new definition out. In A, what would happen if a computation is at state q1 and the letters b and then a are read successively? δ ∗ (q1 , ba) = δ(δ ∗ (q1 , b), a) = δ(δ(δ ∗ (q1 , ǫ), b), a) = δ(δ(q1 , b), a) = δ(q0 , a) = q1 Important: Given Q and Σ, not every function Q × Σ∗ → Q is the δ ∗ function of some DFA. Thus, we may not define a DFA to just have some δ ∗ function that we came up with. Rather, we must define its δ function and then derive δ ∗ from it. a, b b q0 a a q1 q2 b Figure 1: The DFA A 4 Formally proving what the language of an automaton is (If time permits) In class, you have seen several DFAs, and mentioned what their language is. How can we be sure, however, that the language we think they recognize is actually the language they recognize? “It’s obvious”, as you know, is a dangerous expression in mathematics. When we say that something is obvious, we mean that if we are asked to, we can very easily provide the proof. In this section, we will see how to prove the correctness of a language conjecture. This is a rather painful procedure, and we will do it once here, and once in the exercise. After that, we will allow you to explain less formally why an automaton accepts a language. However, whenever you claim that a certain language is recognized by an automaton, always make sure you could prove it formally at gun point. Let Σ = {0, ..., 9, #}, and consider the language ∗ L = x#a : x ∈ {0, ..., 9} , a ∈ {0, ..., 9} , a appears in x . For example, we have 644270#5 ∈ / L, and 1165319#3 ∈ L. We want to construct a DFA A for the language, and prove its correctness formally. We start with an intuition as to how we recognize the language. This will also assist us in the correctness proof. While reading a word w, A keeps track of the numbers that were read so far, but it does not keep track of their order. Thus, there is a state corresponding to every subset of 0, ..., 9. When # is read, the state records it by moving to a new state, while still keeping track of the subset of numbers read. Then, when a digit a is read, the current state can tell us whether a was already seen. If so, we move to an accepting state, and otherwise to a rejecting sink. If additional letters are read, we move to a rejecting sink. It is quite hard to draw this DFA, as it is huge. Thus, we now describe formally the DFA A = hΣ, Q, δ, q0 , F i as follows. The state space is Q = (2{0,...,9} × {1, 2}) ∪ {qacc , qsink } with F = {qacc } and q0 = h∅, 1i. Finally, we define the transition function as follows: For a set C ∈ 2{0,...,9} , for a letter σ ∈ Σ, and for i ∈ {1, 2} we have hC, 2i i=1∧σ =# hC ∪ σ, 1i i = 1 ∧ σ ∈ {0, ..., 9} δ(hC, ii, σ) = qacc i=2∧σ ∈C qsink i=2∧σ ∈ /C Note that the last case also covers the case where we read an extra #. 5 We also define δ(qsink , σ) = δ(qacc , σ) = qsink for every σ ∈ Σ. Now that we have a fully defined DFA, we are ready to prove correctness. Proposition 4.1. L(A)=L. Proof. We want to prove formally that this is indeed the language of the described automaton. We do this by giving a very detailed analysis of the behavior of the DFA. Specifically, we make the following claim: ∗ After reading a word w ∈ {0, ..., 9} , the DFA A reaches a state hC, 1i where C ∈ 2{0,...,9} such that the letters that appear in w are exactly C. Formally: ∗ Claim 4.2. For a word w ∈ {0, ..., 9} , define S(w) = {σ ∈ {0, ..., 9} : σ appears in w}. Then δ ∗ (q0 , w) = hS(w), 1i. Proof (of Claim 4.2). We prove the claim by induction over |w| = n. The base case is w = ǫ, in which case we have S(w) = ∅, and indeed δ ∗ (q0 , ǫ) = q0 = h∅, 1i. ∗ We assume correctness for n, and prove for n + 1. Consider a word w = x · σ ∈ {0, ..., 9} , where |x| = n. ∗ By the induction hypothesis, it holds that δ (q0 , x) = hS(x), 1i. Observe that S(x · σ) = S(x) ∪ {σ}. Now, we have: δ ∗ (q0 , x · σ) = δ(δ ∗ (q0 , x), σ) = δ(hS(x), 1i, σ) = hS(x) ∪ {σ} , 1i = hS(w), 1i Where the equalities follow from the definition of the transition function and of δ ∗ . We now proceed with the proof of the main proposition: Consider a word of the form w#, where w ∈ {0, ..., 9}. From the claim above, we get that δ ∗ (q0 , w) = hS(w), 1i. By the definition of the transition function, we now have δ ∗ (q0 , w#) = hS(w), 2i. We are now ready to complete the proof. Let w ∈ Σ∗ . First, if w ∈ L, then w is of the form x#σ where ∗ σ appears in x and x ∈ {0, ..., 9} . By the second claim above, we have δ ∗ (q0 , w) = δ(δ ∗ (q0 , x#), σ) = δ(hS(x), 2i, σ) = qacc So L ⊆ L(A). To prove the converse containment, we split into cases. • If w does not contain a #, then by the claim above, the run of A on w does not end in the state qacc , so w ∈ / L(A), and indeed w ∈ / L. ∗ • If w = x# where x ∈ {0, ..., 9} , then again, by the second claim above, the run of A on w does not end in the state qacc , so w ∈ / L(A). ∗ • If w can be written as w = x#y where x ∈ {0, ..., 9} and |y| > 1, then by the claims above, we have δ ∗ (q0 , w) = δ ∗ (hS(x), 2i, y) Observe that since |y| > 1, then after reading the first two letters, we end up in qsink (either by going straight to qsink , or by passing through qacc ). In any case, the continuation of the run stays in qsink , so w ∈ / L(A) (we remark that if you want to be extra formal, you can prove by induction that no run leaves qsink , but this is really trivial). ∗ • Finally, if w = x#σ where x ∈ {0, ..., 9} and σ ∈ / S(x), then by the claims above, we have δ ∗ (q0 , w) = δ(hS(x), 2i, σ) = qsink so w ∈ / L(A). This completes the proof of the claim. 6 Computability - Recitation 2 30/10/22 - 05/11/22 1 1.1 NFAs with -transitions Removing -transitions We want to show that for every -NFA A there exists an equivalent NFA (without -transitions). Furthermore, we will explicitly show how to get rid of -transitions efficiently. Let A = hQ, Σ, δ, Q0 , F i be an -NFA. The intuitive idea is as follows. Consider a state q ∈ Q. If there is an -transition from q to q 0 , then whenever we reach q, we can also reach q 0 . However, there may be -transitions from q 0 , so we need to include them as well. For every q ∈ Q, let E(q) ∈ 2Q be the set E(q) = {q 0 ∈ Q | q 0 is reachable from q using only -transitions} . Note that, in particular, q always belongs to E(q), since a state is reachable from itself by a path of length 0. This path vacantly consist only of -transitions. Given the sets E(q) for every q ∈ Q, we define the NFA B: B = hQ, Σ, η, ∪q∈Q0 E(q), F i, S where η(q, σ) = s∈δ(q,σ) E(s) for every q ∈ Q and σ ∈ Σ. In the exercise, you will prove that this construction is correct, namely, that L(B) = L(A). Furthermore, we claim that B can be computed in polynomial time given a description of A. Indeed, consider a directed graph with vertex set Q and with an edge from q to s iff A has an -transition from q to s. Computing E(q) can be done in polynomial time by running DFS from q on this graph. Hence, all of the sets E(q) can be computed in polynomial time, and so can B. 2 2.1 Closure properties using NFAs REG is Closed Under Union You have seen in class that REG is closed under union. We will now show a much simpler way of getting the same result using NFAs. Let L1 , L2 ⊆ Σ∗ , then L1 ∪ L2 = {x : x ∈ L1 or x ∈ L2 } We want to show that if L1 , L2 ∈ REG, then there exists an NFA that recognizes L1 ∪ L2 . Let A = hQ, Σ, q0 , δ, F i and B = hS, Σ, s0 , η, Gi be DFAs such that L(A) = L1 and L(B) = L2 . We can assume w.l.o.g. that S ∩ Q = ∅. We define a new NFA C = hQ ∪ S, Σ, {q0 , s0 } , α, F ∪ Gi where α is defined as follows: ( {δ(q, σ)} if q ∈ Q α(q, σ) = {η(q, σ)} if q ∈ S 1 In order to show that L1 ∪L2 = L(C), we will show two way inclusions. First, we show that L1 ∪L2 ⊆ L(C): let x = σ1 · σ2 · · · σm ∈ L1 ∪ L2 s.t. ∀i ∈ [m], σi ∈ Σ; that is, x is a word of length m in L1 ∪ L2 . So either x ∈ L1 or x ∈ L2 . w.l.o.g x ∈ L1 (the case x ∈ L2 in identical). Hence the run of A on x is accepting; that is, there exist a sequence of states r = r0 , r1 , . . . , rm ∈ Q with: r0 = q0 , rm ∈ F , and for all 0 ≤ i < m, it holds that δ(ri , σi+1 ) = ri+1 . Now by the definition of C the same run r is also a run of C. Indeed, for all 0 ≤ i < m, it holds that ri+1 ∈ α(ri , σi+1 ). Also, note that as r0 = q0 ∈ {q0 , s0 } and rm ∈ F ⊆ F ∪ G, we conclude that r is an accepting run of C on x and so x ∈ L(C). Now, we will show that L(C) ⊆ L1 ∪ L2 : let y = σ1 · σ2 · · · σm ∈ L(C) of length m. So there is an accepting run r of C on y: there are r0 ∈ {q0 , s0 } , r1 , . . . , rm ∈ Q ∪ S with rm ∈ F ∪ G and ri+1 ∈ α(ri , σi+1 ), for all 0 ≤ i < m. By the definition of C, either rm ∈ F or rm ∈ G. w.l.o.g, rm ∈ F . By the definition of α and the fact that Q ∩ S = ∅, the only states from which the transition function α can move from rm−1 to rm is if rm−1 ∈ Q (because α only has the same transitions as A, B, and Q ∩ S = ∅). We continue this way by induction, and conclude that r0 = q0 and that all the states and transitions in the run r exist in A. So the same run of δ(q0 , σ1 ) = r1 , . . . , δ(rm−1 , σm ) = rm ∈ F , is an accepting run of A on y which means that y ∈ L1 . 2.2 REG is Closed Under Concatenation We have already seen that REG is closed under complementation, union, and intersection (as well as some more complex operators). We will now consider the closure of REG under concatenation. Let L1 , L2 ⊆ Σ∗ , then L1 · L2 = {xy : x ∈ L1 , y ∈ L2 } We want to show that if L1 , L2 ∈ REG, then there exists an NFA that recognizes L1 · L2 . Since NFAs read words (rather than a concatenation of two words xy), it is easier to look at L1 · L2 as L1 · L2 = {w = σ1 · · · σn : ∃1 ≤ k ≤ n, σ1 · · · σk ∈ L1 , ∧, σk+1 · · · σn ∈ L2 } Let A = hQ, Σ, q0 , δ, F i and B = hS, Σ, s0 , η, Gi be DFAs such that L(A) = L1 and L(B) = L2 . We may assume that Q ∩ S = ∅. We define the NFA C = hQ ∪ S, Σ, {q0 } , α, Gi where α is defined as follows: For q ∈ Q and σ ∈ Σ, set α(q, σ) = {δ(q, σ)}. For s ∈ S and σ ∈ Σ, set α(s, σ) = {η(s, σ)}. In addition for every q ∈ F , we have α(q, ) = {s0 } and for every q ∈ (Q ∪ S) \ F , we have α(q, ) = ∅. We claim that L(C) = L1 · L2 . As you have seen, a language recognized by an NFA (even with -transitions) is regular, so this would imply that L1 · L2 ∈REG. Note: we used -transitions here. Try to apply -removal to this construction, and see what the resulting construction is. We first claim that L1 · L2 ⊆ L(C). Let w ∈ L1 · L2 . Then w = x · y for some x ∈ L1 and y ∈ L2 . Denote the run of A on x by a0 , a1 , . . . , a|x| and the run of B on y by b0 , b1 , . . . , b|y| . Note that a0 = q0 and b0 = s0 . Also, since A and B accept x and y, respectively, it holds that a|x| ∈ F and b|y| ∈ G. We claim that the concatenation a0 , a1 , . . . , a|x| b0 , b1 , . . . , b|y| is an accepting run of C on w. This straightforward to verify: The transition from a|x| to b0 is an -transition, while the other transitions are induced by δ and η. We now claim that L(C) ⊆ L1 · L2 . Let w ∈ L(C) and let t0 , . . . , tm be an accepting run of C on w. Note that t0 = q0 and tm ∈ G ⊆ S. Also, note that it is impossible to go from S to Q in C. Thus, there is an index k such that {t1 , . . . , tk } ⊆ Q and {tk+1 , . . . , tm } ⊆ S. The only way to go from Q to S in C is by an -transition from F to S0 (and this is the only -transition in this run, so |w| = m − 1). Therefore, tk ∈ F and tk+1 = s0 . We conclude that, taking x = w1 . . . wk and y = wk+1 . . . wm−1 the run t0 , . . . , tk is an accepting run of A on x and the run tk+1 , . . . , tm is an accepting run of B on y. Thus, w = x · y with x ∈ L1 and y ∈ L2 so w ∈ L1 · L2 proving the claim. 2.3 REG is Closed Under the Kleene star Let L ⊆ Σ∗ . Recall the definition of the Kleene star of L from the lecture. We now prove that if L is recognizable by an NFA, then so is L∗ . 2 Proof. Let A = hΣ, Q, δ, q0 , F i be a DFA s.t. L(A) = L. To prove that L∗ ∈ REG, we build an NFA A0 s.t. L(A0 ) = L∗ . Define A0 = hΣ, Q ∪ {qstart } , δ 0 , {qstart } , {qstart }i, where qstart is a new state, where for each q ∈ Q ∪ {qstart } and σ ∈ Σ we define ( {δ(q, σ)} if q ∈ Q δ 0 (q, σ) = , ∅ if q = qstart and we define ∅ 0 δ (q, ) = {qstart } {q0 } if q ∈ Q \ F . if q ∈ F if q = qstart We claim that L(A0 ) = L∗ . The proof is left as an exercise for the reader. 3 Recap (If time permits) We defined the class REG as the class of languages recognized by DFAs. You also have seen in class and today that regular languages can be defined by NFAs and -NFAs. The plan next is to define regular languages by means of regular expressions, regexes for short. So DFAs, NFAs and regexes are equivalent in the sense that they capture the class of regular languages: Figure 1: Equivalence of different models As we’ve seen, it is easy to prove that regular languages are closed under concatenation using NFAs, and it is not clear how to prove it using DFAs - in general choosing the model carefully can make proofs easier. As another example, you have shown easily that REG is closed under complementation using DFAs, yet the same proof does not work for NFAs. We sum up several models and properties and how hard it is to prove the property using the model: Model \P roperty L1 L1 ∩ L2 L1 ∪ L2 L1 · L2 L∗1 DFA Easy Easy: product Easy: product Hard Hard NFA Hard Easy: How? Also product! Easy: seen today Easy: seen today Easy: seen today Regex (later) Hard Hard Easy Easy Easy 3 Computability - Recitation 3 6.11.22 - 12.11.22 1 Regular Expressions By now we are fairly comfortable with regular languages. We constructed automata for them, proved some theorems and closure properties about them, etc. As you probably imagine, regular languages are also often used in practice, and are not just a theoretical model. But how can a computer work with a regular language? Or an even simpler question - how do we describe regular languages? Sure, sometimes we can do it in English: “all the words that end with 0”. But we cannot count on computers to be able to parse such complex expressions, and know that they represent regular languages. One simple solution that we have, by now, is to describe the language as a DFA or NFA. These are simple structures that can be parsed and simulated by a computer. Are they a good way to represent a language? Well, clearly they’re good for checking, given a word, whether it’s in the language. Indeed, it is quite easy to check if an automaton accepts a word (even for an NFA). So in that aspect - yes, they are nice models. However, they are sometimes difficult to read and write for humans. Also, some automata for fairly simple properties may be huge. As you have seen in the lecture, there is a nice, compact, formulation of regular languages which we can use. This formulation is called regular expressions. A quick reminder of what you have already seen. We say that t is a regular expression over the alphabet Σ if t is one of the following. • ∅ • ϵ • a∈Σ • r ∪ s, r · s, or r∗ , where r and s are regular expressions. Another way to represent this, for Σ = {a, b}, for example, is to write: r := ∅ | ϵ | a | b | r ∪ r | r · r | r∗ This type of definition is called a grammar, and you have seen similar things in the definition of formulas in Logic 1. Example: Consider the expression (a ∪ b)∗ · bb · (a ∪ b)∗ over Σ = {a, b}. This expression represents the language of all the words that contain the substring bb. Another example: 0 · 0∗ · (1∗ ∪ 2∗ ). This is the language of words that start with at least one 0, followed by either a sequence of 1’s or 2’s. Regexes can be very complicated: (((a ∪ b) · c∗ )∗ · a) ∪ (a · b∗ ), etc. You also have seen that the semantics of regular expressions are defined by induction on the structure of the regular expression. Specifically, for a regular expression r, the language L(r) defined as follows. • L(∅) = ∅. • L(ϵ) = {ϵ}. • For a ∈ Σ, we have that L(a) = {a}. 1 • L(r ∪ s) = L(r) ∪ L(s). • L(r · s) = L(r) · L(s). • L(r∗ ) = L(r)∗ . The reason these are called regular expressions, is the following theorem: Theorem 1.1 L ∈ REG iff there exists a regular expression r such that L(r) = L. The easy direction in the proof, is showing that every regular expression has an equivalent automaton, so every language defined by a regular expression, is in REG. Lemma 1.2 For every regular expression r, there exists an NFA Ar such that L(r) = L(Ar ) Proof: The proof is by induction over the structure of r. • If r = ∅, we let Ar be the NFA that accepts the empty language. • If r = ϵ, we let Ar be the NFA that accepts {ϵ}. • If r = a ∈ Σ, we let Ar be the NFA that accepts {a}. • If r = s ∪ t, we let Ar be the NFA that accepts L(As ) ∪ L(At ). The union construction seen in class works, but a simpler construction nondeterministically chooses which automaton (of As , At ) to start in (example on blackboard). • If r = s · t, you have seen in class how to construct an automaton for concatenation, so we simply let Ar be the automaton that accepts the concatenation of L(As ) and L(At ). • If r = s∗ , we need to show that we can construct an NFA for L(As )∗ from As . This is given in the exercise. Example 1.1 For example, let’s try this on the regex (a ∪ b) · b (on the blackboard). The second direction in the proof of Theorem 1.1 is a bit harder. We need to prove that every regular language can be defined by a regular expression. Lemma 1.3 For every DFA A there exists a regular expression r such that L(A) = L(r). Proof: The formal proof of this claim is to give an algorithm that converts an NFA into a regular expression. Since this algorithm is tedious, we will explain it with an example. The algorithm utilizes a new type of automaton called a generalized NFA (GNFA, for short). A GNFA is like an NFA, with the exception that the labels on the edges are not letters in Σ, but rather regular expressions over Σ. For example, see the GNFA in Figure 1. For simplicity, we will require the following conditions from a GNFA: • There is a single initial state, with only outgoing edges. • There is a single accepting state, with only incoming edges. • The initial state and the accepting state are distinct. 2 Figure 1: GNFA Example. We leave it as an exercise to understand why we can always make sure our GNFA satisfies these requirements, but the example should demonstrate it. The idea behind the algorithm is the following. We start with a DFA A for our language. We translate it to an equivalent GNFA, with two more states. We then start removing states from the GNFA until we end up with 2 states, at which point the edge between them will be labeled with the regex r that is equivalent to A. The clever part is understanding how to remove states, which we demonstrate with the example in Figure 2, taken from M.Sipser’s “Introduction to the Theory of Computation”. Figure 2: DFA to regex. 3 2 The Pumping Lemma You have seen at least two time that there are non-regular languages. In Recitation 1, you have seen that there are ℵ0 regular languages, but ℵ languages. But this argument is not constructive. That is, it says that there are non-regular languages, but it doesn’t give an example. Then, you have seen in the lecture a concrete example of a non-regular language. Specifically, you have seen that the language L = {0n 1n : n ∈ } is not regular using a pumping argument. The pumping argument you have seen is a specific case of a more general claim - the pumping lemma. N Lemma 2.1 (The Pumping Lemma) Let L ∈ REG, then there exists a constant p > 0 such that for every w ∈ L such that |w| > p, there exist x, y, z ∈ Σ∗ such that w = xyz and such that: 1. |y| > 0. 2. |xy| ≤ p 3. ∀i ∈ N ∪ {0}, xyiz ∈ L. What can we do with the pumping lemma? Can we prove that a language is regular? No! We can only use it to show that a language is not regular. Example 2.1 2 Let L1 = {1n : n ≥ 0}. Is L1 regular? No. Let’s prove with the pumping lemma. Assume by way of 2 contradiction that L1 is regular, and let p be a pumping constant for it. Consider the word w = 1p . By the j k l pumping lemma, we can write w = xyz such that |xy| ≤ p. We can write x = 1 , y = 1 , z = 1 such that k > 0, j + k ≤ p and j + k + l = p2 . If we pump with i = 2 we get xy 2 z = 1j 1k 1k 1l = 1p 2 +k However, p2 < p2 + k ≤ p2 + p < p2 + 2p + 1 = (p + 1)2 The first inequality follows from the fact that k > 0, and the second inequality follows from the fact that j + k ≤ p. We conclude that p2 + k is not a perfect square and thus xy 2 z is not in L1 , contradicting the pumping lemma. So L1 ∈ / REG. Important remark: We can use the pumping lemma with a proof by way of contradiction in order to prove that a language L is not regular. The converse does not hold: there are non-regular languages that satisfy all the conditions of the pumping lemma. So use with caution! Example 2.2 Let Σ = {0, 1}. Consider the following language: L2 = {w ∈ Σ∗ : #0 (w) = #1 (w)} Is L1 regular? We will now prove that it is not. 4 A proof using the pumping lemma Assume by way of contradiction that L2 is regular, and let p be a pumping constant for it. Consider the word w = 0p 1p ∈ L1 . Clearly, |w| > p, so we can write w = xyz such that: • |y| > 0. • |xy| ≤ p • ∀i ∈ N ∪ {0}, xyiz ∈ L2. If we pump y with some i > 1, we get that the word xy i z ∈ L2 . Now since |xy| ≤ p, we have that xy consists only of 0s, thus we can write x = 0j , y = 0k , z = 0l 1p where j, l ≥ 0, k > 0, and j + k + l = p. Therefore, xy i z = 0j 0ik 0l 1p = 0p+(i−1)k 1p . As i > 0 and k > 0, this means that #0 (w) > #1 (w), which is a contradiction, and so L2 ∈ / REG. N An alternative proof (Optional) Consider the language L = {w ∈ {0, 1}∗ : w = 0n 1m , n, m ∈ }. Clearly, L is regular as it is the language of the regex r = 0∗ 1∗ (make sure you understand why!). Assume by contradiction that L2 is regular. It holds that L ∩ L2 = {0n 1n : n ∈ }, thus {0n 1n : n ∈ } is regular, and we have reached a contradiction. N 5 N Computability - Recitation 4 1 The Pumping Lemma - Cont. Sometimes, we need to prove or disprove the regularity of a family of languages. We show an example for such case, and we utilize the pumping lemma for our proof. Reminder. N → N| ∀c ∈ N ∃N ∈ N ∀n ≥ N, cn ≤ g(n)} g(n) = ∞} = {g : N → N| lim n→∞ n ω(n) = {g : N N N Lemma 1.1. Let f : → be a monotonically-increasing function such that f (n) = ω(n). Then, for all N, k ∈ , there exists n > N such that f (n + 1) − f (n) > k. N Proof. If not, there exists N, k ∈ such that for all n > N it holds that f (n + 1) − f (n) ≤ k. Note that it means that the differences are bounded (since there are only finite number of differences until N , and from this point they are all bounded by k). So there exists a bound M ∈ such that for all n ∈ it holds that f (n + 1) − f (n) ≤ M . Then, f (2) ≤ f (1) + M , f (3) ≤ f (2) + M ≤ f (1) + 2M , ..., f (n) ≤ f (1) + (n − 1) · M . (n) f (1) Now, divide both sides by (n − 1) and get fn−1 ≤ M + n−1 which contradicts limn→∞ f (n)/n = ∞. N N N → N be a monotonically-increasing function such that f (n) = ω(n). Then the N is not regular. Proposition 1.1. Let f : language Lf = af (n) : n ∈ Proof. Let f (n) = ω(n). We will show that Lf does not satisfy the conditions of the pumping lemma. Assume by way of contradiction that the pumping lemma holds, and let p > 0 be the pumping constant. We apply the lemma for N = k = p (the pumping factor), and get that there exists n > p such that f (n + 1) > f (n) + p. Since f is monotonically-increasing, it holds that f (n) ≥ n > p. Consider the word af (n) , then by the pumping lemma there exist words x, y, z such that af (n) = xyz, and it holds that 0 < |y| ≤ |xy| ≤ p, and for every i ∈ ∪ {0} we have that xy i z ∈ Lf . Let m = |y|, then for i = 2 we have that af (n)+m = xy 2 z ∈ Lf . However f (n) < f (n) + m ≤ f (n) + p < f (n + 1), so (f (n) + m) is not in the image of f , which is a contradiction. N Remark. This lemma can not be used in the current exercise (Exercise 3). This is because you need to practice using the pumping lemma, and deducing that a certain language is not regular immediately from this lemma will not teach you anything... 2 2.1 The Myhill-Nerode Theorem Myhill-Nerode classes Let Σ be a finite alpabet, and let L ⊆ Σ∗ be a language. In class, we defined the Myhill-Nerode equivalence relation ∼L ⊆ Σ∗ × Σ∗ as follows. ∀x, y ∈ Σ∗ , x ∼L y iff ∀z ∈ Σ∗ , x · z ∈ L ⇐⇒ y · z ∈ L 1 That is, x and y are equivalent if there is no separating suffix. For every w ∈ Σ∗ ,we defined [w] = {x : w ∼L x} to be the equivalence classes of w. In class, you proved the following result. Theorem 2.1. Let L ⊆ Σ∗ , then L ∈ REG iff ∼L has finitely many equivalence classes. How do we use this theorem? Unlike the pumping lemma, this theorem gives us a complete characterization of regular languages, so we can use it either to prove that a language is regular, or to prove that a language is not regular. Example: Consider the language L = {ak : k is not a power of 2}. Is this language regular? Let’s try n m to see how many equivalence classes it has. For every n ̸= m ∈ , consider the words a2 and a2 . W.l.o.g assume that n < m. Now we have that 2n + 2n = 2n+1 , but 2m + 2n = 2n (2m−n + 1) which has an odd n n m n n factor, so it is not a power of 2. Thus, we have a2 · a2 ∈ / L, but a2 · a2 ∈ L, so a2 is a separating n m suffix, so a2 and a2 are in different equivalence classes. Since this is true for all n, m ∈ , then there are infinitely many equivalence classes, so the language is not regular. N N Now, let’s look at the “positive” direction. Example: Consider the language L = {w ∈ {a, b}∗ : w ends with an a}. We calculate the equivalence classes. Let u, v ∈ {a, b}∗ . If u, v both end with an a, for every suffix x we have u · x ∈ L iff x = ϵ or x ends with a, iff v · x ∈ L. So u ∼L v. Similarly, if u, v both do not end with an a (i.e end with b or are ϵ), then for every suffix x we have that u · x ∈ L iff x ends with a, iff v · x ∈ L. So u ∼L v. Finally, if one ends with an a and one does not, then we can separate them with ϵ. We covered all the words, so we can conclude that there are exactly 2 equivalence classes: {u : u ends with a} and {u : u does not end with a} By Theorem 2.1, we get that L ∈ REG. Can we think of a DFA for L? Yes, see Figure 1. Figure 1: A DFA for L = {w ∈ {a, b}∗ : w ends with an a} In the latter example, we have seen that the number of states in the smallest DFA we could think of was exactly the number of Myhill-Nerode equivalence classes. As you can see in the proof of the Myhill-Nerode theorem, this is not surprising, and the size of the minimal DFA for a language is the number of equivalence classes. This brings us to the following question - given a DFA, can we minimize it? That is, do we have a procedure that allows us to take a DFA and output a new DFA that is equivalent, but has a minimal number of states? Fortunately, we do. You will see it thoroughly in the exercise. 2 3 Recap Questions 3.1 Pumping Quiz Question Let A = ⟨Q, {0, 1}, q0 , δ, F ⟩ be a DFA with |Q| = r states. Assume that 0r 1r ∈ L(A). Which of the following is necessarily correct: 1. L(0∗ 1∗ ) ⊆ L(A): not correct. Consider a 2-state DFA for the language of even number of 0’s. 2. L(A) ⊊ L(0∗ 1∗ ): not correct. Consider the counter example in item 1. 3. 1. is not necessarily correct, yet for every i ≥ 1, it holds that 0ir 1ir ∈ L(A): not correct. Consider the DFA in Figure 2 for i = 2, r = 3. Figure 2: A counter example 4. 1. is not necessarily correct, yet there exists k ≥ 1 such that for every i ≥ 1, it holds that 0r+ik 1r+k ∈ L(A): correct. Assume that w = 0r 1r is accepted by A, and let s = s0 , s1 , ..., s2r be the accepting run of A on w. As |Q| = r, we get that the run s0 , s1 , ..., sr visits some state twice. Similarly, the run sr , sr+1 , ..., s2r also visits some state twice. Meaning that we have two cycles in the run s. The first cycle is relevant to transitions labeled with 0, and the second is relevant to transitions labeled with 1. Let k1 and k2 be the lengths of the first and second cycle, respectively. Then, we take k = k1 · k2 . Note that pumping the first cycle i · k2 additional times, and the second cycle i · k1 additional times, corresponds to an accepting run of A over the word 0r+ik 1r+ik . 3 3.2 Myhill-Nerode Quiz Question Explanation on the board! 4 3.3 Myhill-Nerode Exam Question Draw a minimal DFA A for the language L = {w ∈ {a, b}∗ : (#a (w) is even) ∨ (w ends with a)}, and prove that A is minimal. Solution: the following is a minimal DFA for the language Figure 3: A minimal DFA for L = {w ∈ {a, b}∗ : (#a (w) is even) ∨ (w ends with a)} To show minimality, we show that there are at least 3 Myhill-Nerode equivalent classes (why is this sufficient?). Specifically, we show that the words ϵ, a, ab are pairwise nonequivalent. • z = ϵ separates between ϵ and ab: ϵ ∈ L and ab ∈ / L. • z = b separates between ϵ and a: b ∈ L and ab ∈ / L. • z = ϵ separates between a and ab: a ∈ L and ab ∈ / L. 5 Computability - Recitation 5 20/11/22 - 27/11/22 1 Context-Free Grammars (CFGs) So far we discussed the class of regular languages. We have seen that regular languages can be characterized (equivalently) by DFAs, NFAs, and regular expressions. We have also seen an algebraic characterization of regular languages by means of the Myhill-Nerode equivalence classes. Our study showed that while regular languages are nice and simple to define and to reason about, they are not very expressive. For example, the language {an bn : n ∈ } is not regular. It is therefore reasonable that we want a formalism that can capture more languages. One way to obtain such a model, is to come up with some augmentation to NFAs. For example by adding memory. We will take that approach later on. For now, we introduce a new type of language-formalism, which is generative. In generative models, we show how to generate words in the language, rather than how to check whether a word belongs to the language. Let’s start with an example, before we define things formally. N Example 1.1: The following is a context-free grammar: A→ 0A1 | B B→ # How do we interpret this? We start with the vatiable A, and then we use one of the two derivation rules that are available from A. We can either convert A to 0A1, or to B. The letters 0, 1 are called terminals, and they are the alphabet of the language. This kind of model allows us to generate words in the language, by repeatedly applying rules. For example, we can generate the word 000#111 as follows: A =⇒ 0A1 =⇒ 00A11 =⇒ 000A111 =⇒ 000B111 =⇒ 000#111 Can you see which rule was applied in each derivation? One can also think of the generation of a word as a parse tree. We now turn to define context-free grammars formally. Definition 1.1. A context-free grammar is a tuple G = hV, Σ, R, Si where: • V is a finite set of variables. • Σ is a finite alphabet of terminals (disjoint from V ). • R is a finite set of rules, where each rule is a variable and a string of variables and terminals (formally, R ⊆ V × (V ∪ Σ)∗ ). • S ∈ V is the start variable. 1 This is the syntax of context-free grammars. We now define the semantics. Consider strings u, v, w of variables and terminals, a variable A, and a rule A → w, then we say that ∗ uAv =⇒ uwv (read: uAv yields uwv). We write u =⇒ v if u = v or if there exists a finite set of strings u1 , ..., uk such that u =⇒ u1 =⇒ ... =⇒ uk =⇒ v n o ∗ We define the language of G to be L(G) = w ∈ Σ∗ : S =⇒ w . Note that we only consider words in Σ∗ . Thus, a derivation that ends in a string that contains variables is not “complete”. We define the class of context-free languages (CFL, for short) of languages that are generated by CFGs. Now that we know the formalism, let’s look at a couple of examples. Example n 2n 1.2: Can you think of a context-free grammar for the language a b :n≥0 ? S→ Example 1.3: How about ai bj : j ≥ i ? S→ T → Example 1.4: aSbb | ǫ aSb | bT | ǫ Tb | ǫ How about ai bj cj di : i, j ∈ N ? S→ aSd | T | ǫ T → bT c | ǫ We hope you get the hang of it. Let’s move on. Remark: Why do we call these grammars context free? The reason is that the left-hand side of the rules contains only single variables. So intuitively, a variable is interpreted without taking into account its surrounding variables/terminals, or its context. There are grammars which are context-sensitive, where we can have rules such as A0B1 → B11. These capture a much more expressive class of languages. We probably won’t discuss those in this course. 2 Closure properties We defined the class CFL, and the natural thing to do would be to start investigating its closure properties. We’ll start with an easy one: Theorem 2.1. If L1 , L2 ∈ CF L, then L1 ∪ L2 ∈ CF L. Proof. Let G1 = hV1 , Σ, R1 , S1 i and G2 = hV2 , Σ, R2 , S2 i be CFGs such that L1 = L(G1 ) and L2 = L(G2 ). Assume w.l.o.g that V1 ∩ V2 = ∅ (otherwise we can change the names of the variables in V2 ). We can obtain a CFG G for L1 ∪ L2 as follows: Let S be a new variable S ∈ / V1 ∪ V2 , then G = hV1 ∪ V2 , Σ, R1 ∪ R2 ∪ {S → S1 | S2 }, Si. That is, we simply put both of the grammars together, with the rule S → S1 | S2 . 2 Let’s kick it up a notch: Theorem 2.2. If L1 , L2 ∈ CF L, then L1 · L2 ∈ CF L. Proof. Let G1 = hV1 , Σ, R1 , S1 i and G2 = hV2 , Σ, R2 , S2 i be CFGs such that L1 = L(G1 ) and L2 = L(G2 ). Assume w.l.o.g that V1 ∩ V2 = ∅ (otherwise we can change the names of the variables in V2 ). We can obtain a CFG G for L1 ∪ L2 as follows: Let S be a new variable S ∈ / V1 ∪ V2 , then G = hV1 ∪ V2 , Σ, R1 ∪ R2 ∪ {S → S1 · S2 }, Si. That is, we simply put both of the grammars together, with the rule S → S1 · S2 . We saw that the class REG is closed under many operations of languages. What about the class CFL? You saw in class that it is closed under union but not under intersection. In the previous Recitation, we proved that it is closed under concatenation and union. In the exercise you showed that CFL is closed under the Kleene star operation. What about complement? Corollary 2.3. The class CFL is not closed under complement. Proof. If it was, then by De Morgan’s law (L1 ∩ L2 = L1 ∪ L2 ) it would follow that CFL is closed under intersection, and we saw in class that isn’t. 3 The Pumping Lemma for CFL Our main tool for proving that a language is not in CFL is the Pumping Lemma for CFL’s, which you’ve seen in class. Let’s recall it. Lemma 3.1 (The pumping lemma). Let L ∈ CF L, then there exists p ∈ |w| ≥ p then we can write w = uvxyz such that: N such that for every w ∈ L, if 1. |vxy| ≤ p. 2. |vy| > 0. 3. ∀i ∈ N, uvixyiz ∈ L. Let’s use the pumping lemma! Proposition 3.2. Let Σ = {a, b} and consider the language L = {ww | w ∈ Σ∗ }. Then L ∈ / CF L. Proof. Assume by way of contradiction that L ∈CFL, and let p be be its pumping constant. Let w = ap bp ap bp . Since w ∈ L and |w| ≥ p, there exist words u, v, x, y, z ∈ Σ∗ such that w = uvxyz and conditions 1,2,3 of the lemma are satisfied. We now consider three distinct cases (drawn on the board): • If vxy is contained in the first half of w, let s be the number of a’s in vy and let t be the number of b’s in vy. By condition 2, s + t > 0. By condition 3, the word uxz = ap−s bp−t ap bp is in L, which is a contradiction. • The case where vxy is contained in the second half of w is handled similarly to the first case. • If vxy is composed of parts from both halves of w, then by condition 1, vxy is contained in the last bp of the first half of w and the first ap of the second half of w. Again, let s be the number of a’s in vy and let t be the number of b’s in vy. Then vxy = ap bp−t ap−s bp ∈ / L, which is a contradiction to condition 3. This example also yields an alternative proof to the fact that CFL is not closed to complement. 3 Proposition 3.3. For the same language L as in the previous example, it holds that L ∈ CF L. Proof. Note that L = K1 ∪K2 where K1 = {uw | |u| = |w| and u 6= w ∈ Σ∗ } and K2 = {w ∈ Σ∗ | |w| is odd}. Since CFL is closed to union, it is enough to show that K1 and K2 are in CFL. Note that K2 is clearly in CFL since it is in REG, which is contained in CFL. We are thus left with proving that K1 ∈ CF L. Let w ∈ K1 such that |w| = 2n (note that all words in K1 are of even length). There must exist some 1 ≤ i ≤ n such that wi 6= wn+i . Assume that wi = a and wn+i = b. Then we can write w = xaybz where x, y, z ∈ Σ∗ and |x| = i − 1, |y| = n − 1 and |z| = n − i. In general, define K1′ as the language of all words of the form xaybz or xbyaz, such that |x| = i − 1, |y| = n − 1 and |z| = n − i for some n and i such that 1 ≤ i ≤ n. Since i is arbitrary, then another equivalent definition for K1′ is the language of all words xaybz or xbyaz such that |x| + |z| = |y| (make sure you see why). We have just shown that K1 ⊆ K1′ . It is also easy to see that K1′ ⊆ K1 . Thus, K1′ = K1 . Now, to prove that K1 ∈ CF L, we describe a CFG for K1′ . S → AB|BA A → aAa|aAb|bAa|bAb|a B → aBa|aBb|bBa|bBb|b The idea behind the grammar is as follows: we allow to derive any words as x and z, but for every letter we add to either of them, we also add a letter to y. We thus get that |y| = |x| + |z|, and we are done. Corollary 3.4. CFL is not closed to complement Proof. This follows directly from the last two examples. Proposition 3.5. Consider the language L = {w#x : w, x ∈ {0, 1}∗ , x is a substring of w}. Then L ∈ / CF L Proof. We prove that it is not context free using the pumping lemma. Assume by way of contradiction that L ∈ CF L, and let p be the pumping constant. Consider the word w = 0p 1p #0p 1p ∈ L. Let uvxyz be a decomposition of w such that the pumping lemma conditions hold. The trick in CFG pumping is to play with the location of the |vxy| part. If y ends before the #, then using i = 0 we can shorten the part before #, which is clearly a contradiction (a long string is never a substring of a shorter string). Similarly, if v starts after the #, we can use i = 2 to lengthen the substring, which is again a contradiction. So the entire |vxy| part is on the 1p #0p part. If # ∈ v or # ∈ y then if we choose i > 1 we get too many #’s - contradiction. So # ∈ x. Thus, we have that v is a substring of 1p and y is a substring of 0p . If y = ǫ, then by pumping with i = 0 we get less than p 1’s left of the #, which is a contradiction. Otherwise, |y| > 0, so if we pump with i = 2 we get too many 0′ s in the substring - contradiction. We conclude that L is not a CFL. Proposition 3.6. Let L be a language such that L ⊆ {a}∗ . Then L satisfies the pumping lemma for context free languages then it also satisfies the pumping lemma for regular languages. Proof. Let L ⊆ {a}∗ that satisfies the pumping lemma for context free languages. Let p be the pumping constant for L. z ∈ L such that |z| ≥ p. By the pumping lemma for CFL, we know that we can write z = uvwxy such that: 1. |vwx| ≤ p 2. |vx| > 0 3. for all i ≥ 0 it holds that uv i wxi y ∈ L 4 Because L is over an unary alphabet, we can change the order of the sub-words and the word z will not change, meaning we can also write z = wvxuy. So. for all i ≥ 0, uv i wxi y = wv i xi uy = w(vx)i uy. Let’s write a little different: u′ = w, v ′ = vx, w′ = uy. and we have that u′ (v ′ )i w′ ∈ L. It’s easy to see that |u′ v ′ | ≤ p and |v ′ | > 0. So we can conclude that for the same p, the conditions of pumping lemma for regular languages holds. Example 3.1: From exam: Let L ⊆ Σ∗ . We define the equivalence relation ≈L on Σ∗ as follows: for any x, y ∈ Σ∗ we say the x ≈L y iff ∀z ∈ Σ∗ such that |z| is even it holds that xz ∈ L ⇐⇒ yz ∈ L (i.e. there is no even-length separating suffix between x, y). For example, let L = {an |nmod6 = 0} then it holds that: • a ≈L a3 since for every k ∈ N it holds that a2k+1 ∈ / L and a2k+3 ∈ /L • a2 6≈ a4 , as z = a2 is a separating suffix of even length. 1. How many equivalence classes does the relation ≈L induce on Σ∗ for the language L = (ab)∗ ? Solution: There are 2 equivalence classes: We will prove the equivalence classes are exactly L, L: Let z ∈ {a, b}∗ . Then |z| is even iff one of the following holds: • z ∈ L (this means that z = (ab)k for some k ∈ N ∪ {0} • z ∈ L and z ∈ {aa, ab, ba, bb}k1 {aa, ba, bb}k2 {aa, ab, ba, bb}k3 for some k1 , k3 ∈ N ∪ {0}, k2 ∈ N Now, let x, y ∈ L: So x = (ab)m , y = (ab)n for some m, n ∈ N ∪ {0}. Let z ∈ Σ∗ such that |z| is odd. • If z = (ab)k for some k ∈ N ∪ {0} then xz = (ab)m+k ∈ L and yz = (ab)n+k ∈ L • If z ∈ L and z ∈ {aa, ab, ba, bb}k1 {aa, ba, bb}k2 {aa, ab, ba, bb}k3 for some k1 , k3 ∈ N ∪ {0}, k2 ∈ N then xz ∈ / L and yz ∈ /L So x ≈L y Now, Let x ∈ L, and let z ∈ Σ∗ such that |z| is even. We will show that xz ∈ / L: If |x| is odd then |xz| is odd and yz ∈ / L. If |x| is even then since x ∈ / L it holds that x ∈ {aa, ab, ba, bb}k1 {aa, ba, bb}k2 {aa, ab, ba, bb}k3 for some k1 , k3 ∈ N ∪ {0}, k2 ∈ N. So xz contains as a sub-string at least one of {aa, ba, bb} and thus xz ∈ / L. Finally, for any x ∈ L, y ∈ L, ǫ is a separating suffix of even length, and we are done. 2. Let ∼L be the Myhill-Nerode relation of L. Prove that for all x, y ∈ Σ∗ it holds that x ∼L y iff x ≈L y and for every σ ∈ Σ xσ ≈L yσ Solution: Assume that x ∼L y. So there is no separating suffix of even length, and thus x ≈L y. Now, assume towards a contradiction that there exists some σ ∈ Σ such that xσ 6≈L yσ thus there exists a separating suffix of even length z ∈ Σ∗ such that w.l.o.g xσz ∈ L and yσz ∈ / L. Thus zσ is a separating suffix between x, y (of odd length) and thus x 6∼L y. Assume now that x ≈L y and for every σ ∈ Σ xσ ≈L yσ. Let z ∈ Σ∗ . If |Z| is even then xz ∈ L ⇐⇒ yz ∈ L. otherwise, |z| is odd. Write z = σw for some σ ∈ Σ, w ∈ Σ∗ , and |w| is even. From our assumption we know that xσ ≈L yσ, so it must hold that (xσ)w ∈ L ⇐⇒ (yσ)w ∈ L which is equivalent to x(σw) ∈ L ⇐⇒ y(σw) ∈ L which in turn xz ∈ L ⇐⇒ yz ∈ L which means that a ∼L y and we are done. 5 Example 3.2: From exam: Let L= {ai bj ci dj |i, j ∈ N ∪ {0}}. Is L1 context free? No. We will prove this suing the pumping lemma: Assume towards a contradiction that L1 ∈ CFL and let p be the pumping constant. Let w = ap bp cp dp . Clearly |w| > p. We can write w = zuxvy, |uxv| ≤ p. There are three possible cases: 1. uxv is a substring of ap bp . In this case the word zu0 xv 0 y is of the form am bn cp dp having m + n < 2p. Thus either m < p or n < p, so zu0 xv 0 y ∈ / L1 . This is a contradiction to the pumping lemma, so we are done. 2. uxv is a substring of bp cp . In this case the word zu0 xv 0 y is of the form ap bm cn dp having m + n < 2p. Thus either m < p or n < p, so zu0 xv 0 y ∈ / L1 . This is a contradiction to the pumping lemma, so we are done. 3. uxv is a substring of cp dp . In this case the word zu0 xv 0 y is of the form ap bp cm dn having m + n < 2p. Thus either m < p or n < p, so zu0 xv 0 y ∈ / L1 . This is a contradiction to the pumping lemma, so we are done. Example 3.3: From exam: Let L2 = {ai bj cj di |i, j ∈ N ∪ {0}}. Is L1 context free? Yes! It suffices to show that there is a context-free grammar G such that L(G) = L2 . We define G = h{S, T }, {a, b}, R.Si where R is: S → aSd|T T → bT c|ǫ 6 Computability - Recitation 6 27/11/22 - 3/12/22 1 Turing Machines We have seen that NFAs and DFAs recognize the class of regular languages. This class is rather small, and the computational model is indeed quite weak - a finite state machine with no memory. As evidence, we have seen that even languages such as {an bn : n ∈ } are not regular. So how can we give automata more “power” to recognize more interesting languages? One of the obvious ways is to add a memory model. Thus, we now replace our stack with a list, or an array. This model is called a Turing Machine. N 1.1 Definitions Turing Machines, (TM, for short) are finite automata, equipped with an infinite tape, bounded on the left, and a read/write head. The idea is that the input is written on the tape, and the machine can read and write on the tape, as well as move between states. The tape acts as the memory. Formally: Definition 1.1. A (deterministic) Turing Machine is a 7-tuple M = hQ, Σ, Γ, δ, q0 , qaccept , qreject i where 1. Q is a finite set of states. 2. Σ is a finite alphabet not containing the blank symbol xy. 3. Γ is a finite tape alphabet such that Σ ⊆ Γ and xy ∈ Γ. 4. q0 , qaccept , qreject ∈ Q, qaccept 6= qreject . 5. δ : Q × Γ → Q × Γ × {R, L}. This is the syntax of a TM, and now we need to define its semantics. That is, how are words accepted and rejected. Intuitively, given a word w = w1 · · · wn ∈ Σ∗ , a TM M starts with w written on the tape. It then follows the rules of the transition function, using the state and the symbol which is written on the tape under the location of the head. During the run, the head moves left and right, and might change the contents of the tape, which will affect the run in later times. If, at any point, the machine reaches the state qaccept (resp. qreject ), it halts and accepts (resp. rejects). To define this formally, we first need to define what a configuration of a TM is. A configuration of a TM is a piece of information which encodes everything that is needed in order to continue the run of the TM. Formally, a configuration is a string uqσv, where the contents of the input tape is uσv, and the head is currently below the letter σ. Note that since the tape is infinite to the right, we write v only up to the point from which there are only xy symbols. We now define when two configurations are consecutive. • We say that configuration uqabv yields the configuration ucq ′ bv if δ(q, a) = (q ′ , c, R). 1 • We say that configuration uaqbv yields the configuration uq ′ acv if δ(q, b) = (q ′ , c, L). • A special case is when there is a left transition at the left end of the tape. For a transition δ(q, a) = (q ′ , b, L), the configuration qau yields the configuration q ′ bu. Note that there is no “notification” that we have reached the end of the tape. We are now ready to define runs on words. For a word w ∈ Σ∗ , a partial run of M on w is a finite sequence of configurations c1 · · · ck , where c1 = q0 w, and for every 1 ≤ i < k, ci yields ci+1 . A partial run is an accepting run (resp. rejecting run) if the state in the last configuration is qaccept (resp. qreject ). Finally, a word is accepted (resp. rejected) if there is an accepting (resp. rejecting) run of M on it. Note there may be words on which there is no accepting run and no rejecting run. This will become very important later on. Finally, we define L(M ) = {w : M accepts w}. An Important Remark - Configurations. Every computational model that is based on a finite state control (e.g. DFAs, NFAs, PDAs, TMs) has some notion of configurations. A configuration is some data such that knowing this data is enough to continue the run (or all runs) of the model. For example, in NFAs, knowing the current state is enough to continue the run on a word. In PDAs, however, knowing the state is not enough - we also need to know the entire contents of the stack (think why is it not enough to know just the top of the stack?). And as we have just seen, in TMs we need to know the entire contents of the tape, the location of the head, and the state (again - think why excluding each of these elements will prevent us from defining the runs of a TM). Given the notion of a configuration, we can talk about sequences of configurations. These are known as computations (or runs), and are the formal model we use to reason about computations. Note how beautifully generic this notion is, and how it fits both our conceptual notion of a computation, as well as our need for a simple formal definition. 1.2 Expressive power The first interesting question that rises is whether this model gives us any additional computational power over CFL. The (very) surprising answer is that not only do we get more computational power, but we get computational power equivalent to a computer. In fact, the Church-Turing thesis states that anything that can be computed on any physically-feasible computational model can be simulated by a TM. 1.3 Computing functions So we see that TMs can recognize languages that are not context free. What else can they do? In order to get accustomed to TMs, let’s see some other uses. Apart from recognizing languages, TMs can also compute functions. For example, we will now see a TM that computes the functions f (n) = n + 1, given a binary representation of n. As we will see shortly, when designing such a TM, we immediately encounter a small problem with the model, namely: how can we tell when we are in the leftmost cell? Indeed, recall that when the head reaches the leftmost cell of the tape, there is no indication of it, and we want some indication to know that we are there. It would be nice to assume that there is some symbol $ before the input. To assume this, we first show how to move the entire input one cell to the right, and to put a marker at the left end cell. This is done with the 4 state TM in Figure 1. Note that this is not a full TM, just a piece of one. You can think of it as a subroutine. Now we are ready to compute a function. Example 1.4: This time, we will only give a low level description of the TM, and it is left as an exercise to describe the TM fully. In low level, the machine starts in q0 , and scans the tape to the right, until it reaches xy, it then moves left and goes to state q1 . In q1 , if it sees 1, it changes it to 0 and continues left. If it sees 0, it changes into 2 a → a, R q1 a → $, R xy → a, L q0 b → a, R a → b, R b → $, R xy → b, L q2 a → a, L q3 b → b, L b → b, R Figure 1: TM that moves the entire input one cell to the right, and puts a $ at the beginning. 1 and halts. If $ is encountered, meaning that the input was of the form 111...1, we change $ to 1, and then move the entire input one cell to the right, and write $ again to the left (to keep things nice and clean). 2 Robustness of the TM model We have already seen several examples of computational model equivalence: DFA, NFA and regular expression are equivalent. The idea of computational model equivalence also extends to TMs. In fact, a nice property of the TM computational model is that it is robust: By changing minor technical properties of the TM model, we usually get an equivalent computational model. Some informal examples: • A bidirectionally unbounded TM (BUTM) is similar to a TM, except that its tape is unbounded to the right and also to the left (as opposed to unbounded just to the right). The BUTM model is equivalent to a TM. • A stay TM (STM) is similar to a TM, but its transition function is of the type Q×Γ → Q×Γ×{L, R, S}, where S tells the head to stay in the same cell. The STM model is also equivalent to a TM. • A 2-dimensional TM has a tape that’s an infinite square grid. It is also equivalent to a TM. One particularly useful variant of a TM is a two-taped TM. In the next section, we define this model and prove its equivalence to a TM. Before we do so, we need to ask ourselves what does it mean for a TM to be equivalent to another model. Recall that a TM running on an input w can either accept, reject or not halt on w. Thus, we need to generalize our previous definition of computational model equivalence. Definition 2.1. Two machines (not necessarily of the same model) M and N are equivalent if for every w ∈ Σ∗ the following hold: • M accepts w iff N accepts w. • M rejects w iff N rejects w. • M does not halt on w iff N does not halt on w. 3 Definition 2.2. Two computational models X and Y are equivalent if the following hold: • For every machine of type X there exists an equivalent machine of type Y. • For every machine of type Y there exists an equivalent machine of type X . 3 TM with 2 tapes A two-taped TM is an ordinary Turing machine, only with two tapes. Each tape has its own head for reading and writing. Initially the input appears on tape 1, and the second tape starts out blank. The transition function is changed to allow for reading, writing, and moving the heads on both tapes simultaneously. Formally, δ : Q × Γ2 → Q × Γ2 × {R, L}2 The expression δ(q, γ1 , γ2 ) = (q ′ , γ1′ , γ2′ , L, R) means that if the machine is in state q, head 1 is on γ1 and head 2 is on γ2 , then the machine moves to q ′ , writes γ1′ on tape 1 and moves head 1 left, and writes γ2′ on tape 2 and moves head 2 right. An important thing to notice is that the “useful” part of two tapes is not the additional tape, but rather the additional head. Indeed, mapping two tapes onto one is pretty easy. However, adding another head is a whole new feature. It allows synchronous movement, which, a-priori, seems more powerful than a single head. It is obvious (i.e. left as an exercise) that two-tape machines are at least as expressive as ordinary TMs. The interesting point is that the models are equivalent. Theorem 3.1. For every two-tape TM M , there exists an equivalent TM M ′ . Proof. There are several ways to prove this theorem. The general idea is to somehow simulate the actions of the two tapes using only one tape. Perhaps the most intuitive solution is to just write the two tapes consecutively on one tape, separated by some special character #. It is not hard to prove that such a construction works. Here, we show a different solution. Let M = hQ, Σ, Γ, δ, q0 , qacc , qrej i be the two-tape TM. We construct ′ ′ , qrej i as follows. The tape alphabet Γ′ will simulate the two tapes, as well as the M ′ = hQ′ , Σ, Γ′ , δ ′ , q0′ , qacc positions of the two heads. Γ′ = (Γ × Γ × {0, 1} × {0, 1}) ∪ Σ ∪ {xy} The letter (a, b, 0, 1) means that at this position, the first tape contains “a”, the second contains “b”, and head 2 is also in this position, while head 1 is not. The single tape machine operates as follows. 1. The machine starts by going over the input word, and replacing each occurrence of σ ∈ Σ with (σ, xy, 0, 0). It then goes back to the left, and marks the first cell as (σ, xy, 1, 1). That is, both heads are at this position. 2. The simulation phase: the machine encodes in its states the state of the two-tape machine. At every stage, the machine scans the tape from left to right, searching for the letter where head 1 is on. Once it is found, the machine remembers that letter (encodes it in its state). It then continues to the right, and starts scanning back to find head 2. Thus, once it is back to the left, the machine encoded the letters under the two reading heads. So the machine is now ready to decide what to do based on δ (since it “knows” q, γ1 , and γ2 ). Again, the machine scans the tape from left to right, searching for the letter where head 1 is on. Once it is found the machine updates the tape according to δ, by changing the letter of the first tape (that is, the first component of the state), and moving head 1 left or right (which means turning it to 0 on 4 the current component, and to 1 on an adjacent component). Then, the machine goes to right of the tape, and goes back left, doing the same thing with head 2. When it reaches the left end of the tape (xy), it has finished updating one move of the tape. It now repeats, until M states that it goes to qacc or qrej , in which case M ′ does the same. Note that while Q′ and Γ′ are larger than Q and Γ respectively, they are both still finite. Think of M ′ as an emulator for M . The configuration of M ′ holds an encoding of the configuration of M configuration. Whenever M makes a step which changes its configuration from c to c′ , M ′ makes a series of steps in order to change its encoding of c to an encoding of c′ . Remark: It is very difficult to get much more formal, without having to get dirty with indices and various end-cases. This is problematic in the context of a course: ”am I being formal enough for the graders?”, but it is even worse in “real life”: “am I being formal enough for the result to be true?”. One of the main skills you need to acquire is the ability to see potential problems, and to explain why the proof overcomes them. An important question we often ask when translating between variants of a model is what is the cost of this translation? In automata, this question had a very clear meaning - what is the blowup in the number of states/transitions. In TMs, however, this question is no longer simple. First, we may ask what the size of the new TM is. Even this is not trivial - what do we mean by “size”? The number of states? transitions? alphabet? Also, a completely different aspect of cost comes up - what is the runtime of the new machine on a word? As you have seen in class, TMs may run for a long time on even a short word (in fact, they may not stop at all). Typically, the more interesting questions in TM is the latter - how is the runtime on a word affected by a change in model (although the other questions are also widely studied). For the construction above, observe that in every step of the original TM, we need to scan the tape twice. That is, we do O(n) operations for every operation of the original machine, where n is the length of the (longer) tape of the two tapes. Assume the original machine ran for t steps on a word w. Within t steps, the original machine could write at most t symbols in each tape. So the maximal distance between the two heads at every step is t. So in every step we make at most O(t) steps. Thus, the total runtime is t · O(t) = O(t2 ). In particular, note that this is polynomial in the runtime of the machine. This will become important later on in the course. Finally, observe another interesting blowup in this translation - we increased the size of the alphabet from |Γ| to 4|Γ|2 . We could have done the same procedure for k tapes (instead of 2), for the price of 2k |Γ|k - exponential in k, and with a runtime of O(t2 ) (think why). Thus, from now on, we may use machines with a fixed number of tapes. It is important to note that while a k-tape machine is equivalent to a TM for any constant k, the value of k must be constant. In other words, it must not depend on the input. 4 Closure properties of R and RE For a TM M , we say that M recognizes a language L if L(M ) = L. We say that M decides L if L(M ) = L and M halts on every input. That is, for every w ∈ Σ∗ we have that M accepts w if w ∈ L and M rejects w if w ∈ / L. We define RE = {L : L is recognizable by a TM}, R = {L : L is decidable by a TM}, and coRE = L : L is recognizable by a TM . In class you will see that R = RE ∩ coRE. The natural thing to do now is to study under which operations these classes are closed. Let’s start with something easy: Proposition 4.1. If L ∈ R then L ∈ R. Proof. Let M be a machine that decides L, we swap qacc and qrej in M . Since M always halts, then for every w, its run on w either gets to qacc or to qrej . Using this, it is easy to prove the correctness of this modification. 5 Note that we crucially use the fact that M is a decider, and that M is deterministic. Indeed, for nondeterministic machines (which we will see later on in the course), or for machines that only recognize their language, this no longer works. Proposition 4.2. If L1 , L2 ∈ R, then L1 ∪ L2 ∈ R. Proof. Let M1 , M2 be TMs that decide L1 , L2 respectively. We construct a TM M for L1 ∪ L2 as follows. The main idea is to run M1 on w, and if it rejects, run M2 . If either machine accepts, we accept. Otherwise we reject. To give a more detailed description, we proceed as follows. M starts by marking a special delimiter # left of the input w. It then copies w to the left of # (assume a bidirectional tape). M then passes control to M1 , with # acting as the left bound of the tape. If, at any point, M1 moves to its accepting state, M also accepts. If M1 moves to the rejecting state, M erases the contents of the tape up to #, then copies back w to the right of #, and passes control to M2 and answers the same. Correctness is relatively clear here - M accepts only if at least of of M1 or M2 accepts. Also, all the “overhead” operations are things we know how to do with a TM - erase the input and copy strings. Note that again, we rely on the fact that M1 and M2 are deciders, since otherwise M1 may get stuck, and we will never try to run M2 . So can we show that RE is also closed under union? As it turns out - yes, but we need to be somewhat more clever. Proposition 4.3. If L1 , L2 ∈ RE, then L1 ∪ L2 ∈ RE. Proof. Let M1 be TM that recognizes L1 , and M2 a TM that recognizes L2 . We need to somehow run both M1 and M2 in parallel. There are several ways of doing that. Perhaps the simplest is to use two tapes, and simulate each machine independently on its own tape. This is rather simple: start by copying the word to the second tape, and then apply the transition function of each TM in its separate tape. While this works, we take a different approach here. What really happens here? If we consider the TM that is equivalent to the 2-tape TM we used here, we actually see that in a way, we simply run the machines one step at a time. That is, M1 runs for a step, then M2 , and so forth. This is a sort of parallel run, and it’s impressive that we can simulate a parallel run with the (serial) model of TM. We take a slightly different approach here. We construct a TM M that recognizes L1 ∪ L2 as follows. Start by copying the input to a “safe” place, left of the original input. Additionally, we store a counter i somewhere on the tape (left of the input, for example). We now proceed to run M1 on the input for i steps. How can we do that? well - if you’re re-reading this after seeing a universal machine, then this is trivial. Otherwise, we modify M1 such that after taking a transition, it marks the place of the head on the tape, then goes to the left of the tape, updates a step-counter, compares it to i, and if it hasn’t reached it yet, goes back to where the head was, and continues for another step. After M1 runs for i steps, if it accepts, then M accepts. Otherwise, clean the tape, copy w back to the right of #, and run M2 on w for i steps. If it accepts, we can accept. Otherwise, increment i by 1, and repeat the process. To prove correctness, note that if w ∈ L1 ∪ L2 , then there exists some n > 0 such that either M1 or M2 accepts w within n steps. Thus, when the counter value i reaches n, our machine will accept w. For the second direction, if M accepts w, then clearly either M1 or M2 accept w, so w ∈ L1 ∪ L2 , and we are done. Note that we do not prove that M always halts, as indeed - it may not halt at all! 6 5 Parallel runs Running parallel computations is crucial in order to show closure properties of certain classes, as well as to recognize certain languages. This technique usually comes in handy when we work with machines that are not deciders. The general problem is that if we are given a machine, and we are not guaranteed that it is a decider, we do not want to simply let it run, because it may never halt. Instead, we only run it for a certain amount of steps, thus giving us time to decide if we want to do something else if the machine does not halt after a certain number of steps. Parallel runs are not a specific concept, but rather a general scheme. We demonstrate the idea here by showing that RE is closed under concatenation. Theorem 5.1. Let L1 , L2 ∈ RE, then L1 · L2 ∈ RE. Proof. Let M1 , M2 be TMs that recognize L1 , L2 respectively. Observe that we cannot assume M1 and M2 halt on inputs that are not in their languages. Let w ∈ Σ∗ be the input. The naive solution idea is to try every partition of w into w = uv, and then run M1 on u and M2 on v. If both accept, then w ∈ L1 · L2 and we accept. Otherwise, we try the next partition. However, this is not as simple as it sounds - when we try the first partition, it could be the case that M1 does not halt. In this case, our TM also does not halt, and we will never check other partitions! We break the problem into two parts. First, we construct a new machine M3 . This machine reads a word of the form u#v (if the word is not of this form, it rejects). Upon reading this word, M3 runs M1 on u, and then, if M1 accepts, erases the tape and runs M2 on v. Thus, M3 recognizes the language L3 = {u#v : u ∈ L1 , v ∈ L2 } Note that M3 is not necessarily a decider! Now, we are left with a simple task: given w, check whether we can split w as w = uv such that M3 accepts u#v. As we stated earlier, running M3 serially on every partition is not good enough, since it might not halt on the first one, even though a later partition is accepted. So we want to run M3 “in parallel” on all the partitions. This is done as follows. We construct a TM M as follows: We add a new character ⊥ to the alphabet of M (in addition to ΓM1 , ΓM2 , ΓM3 ). ⊥ will act as a delimiter. Start by copying the input to a safe place, left of the original input, followed by ⊥. Next store a counter right after the first ⊥ and initialize it to 1 (i will keep track of how many steps the simulation is to run), then add another ⊥. Now add yet another counter j, initialized to |w|, followed by yet another ⊥. j will keep track of the current partition. The rest of the tape will be used to simulate the run of M3 on every partition of the input w for i steps. Simulating M3 this way requires operating on the 4th part of the tape (after the last ⊥) the same as M3 operates, but after every transition of M3 , the counter i is reduced by 1 (while keeping a copy of its original value to use in the next partition) and compared to 0. If it is 0, M stops simulating M3 , and moves to the next partition. The simulation of M3 works as follows. M reads the value i of the counter, it then simulates M3 on every partition of w, but only for i steps. When all the partitions have been simulated, the counter value is increased by 1 and the same process begins. If, at any point, M3 accepts, then M accepts as well. Otherwise, M does not halt. (keeping track on the current partition is done using the counter j. At every change of partition, reduce j by 1 and compare to 0. If 0 is reached, the current simulation is done. The counter j is the restored to |w| and i is increased by 1. Now, if w ∈ L1 · L2 , then there exists a partition w = uv such that u#v is accepted by M3 . Since it is accepted, there exists a finite number k such that M3 accepts u#v after k iterations. So M also accepts w after the counter i reaches value k. Conversely, if M accepts w, then one of the partitions is accepted by M3 , so w ∈ L1 · L2 . 6 Universal TM In the late 80’s, for those who remember, hand-held video games were very popular. These devices ran a single game. In the early 90’s, came the GAMEBOYr , and there was a big hype, which changed the 7 hand-held game industry forever. What was the big change? The main advantage of the GAMEBOY was allowing different programs to run on the same machine. This ability seems trivial in computers today (for those who remember the old floppy disk, perhaps it doesn’t). Enough history, back to TMs. A TM runs a single program, by definition. However, for various uses, we want TMs to be able to run different programs. Can we do that? The answer, as expected (after claiming TMs are as strong as computers), is yes. 6.1 Encoding The first observation we need to make is that we can encode a TM as a finite string over a finite alphabet. We will show how to do that shortly. A universal TM is a TM that can take as input an encoding of a TM and simulate it on some input (given or fixed). We start by describing how we can encode a TM. We will use the alphabet {0, 1, #}. Let M = hQ, Σ, Γ, δ, q0 , qacc , qrej i. We encode M as follows. The states are encoded as increasing binary numbers. That is, if Q = {q1 , ..., qn }, then the encoding starts with 0#1#10#11#100#... We end the state string with a triple ###. Next, we encode Σ and Γ. We encode each letter with a binary string of length ⌈log |Γ|⌉ (we will later see why). We first encode the letters of Σ as increasing binary numbers, separated by #. To encode Γ, we start from the last symbol we encode from Σ, and encode Γ \ Σ as binary numbers. We separate Σ and Γ with ###. Example 6.2: If Σ = {a, b} and Γ = {xy, 0, 1} ∪ Σ then we have |Γ| = 5. Thus, we use 3 bits to encode each letter. The encoding will be 000#001#010#011#101 Corresponding to a, b, xy, 0, 1. Next, we encode δ. A single transition δ(q, γ) = (q ′ , γ ′ , L) is encoded as a tuple hqi#hγi#hq ′ i#hγ ′ i#hLi## Where hLi = 0 and hRi = 1. We end the description of δ with ###. Finally, we encode q0 , qacc , qrej with their binary encoding, separated by ###. Example 6.3: Assume hqi = 101, hγi = 010, hq ′ i = 1 and hγ ′ i = 100. Assume we have the transition δ(q, γ) = (q ′ , γ ′ , R), then the encoding will contain 101#010#1#100#1## Given a TM, we denote it’s encoding by hM i. We note here that in the same spirit, we can encode many other different objects, such as NFAs, graphs, matrices, logic formulas, and many more, which you will see during this course. 6.4 A universal machine We want to construct a TM U that, given an encoding hM, wi of a TM M and a word w, can simulate the run of M on w. That is, U accepts/rejects/gets stuck if M accepts/reject/gets stuck on w, respectively. To construct U , we use a TM with 3 tapes. The first tape will hold the description of M , the second will hold the working tape of M , and the third tape will be used to hold the current state of M , and for calculations. 8 U starts with hM i written on the first tape. Let’s also assume that we also get as input a word w. We want to run M on w. For that, we assume that we are actually given an encoding of w, according to the encoding of the alphabet of M . That is, w ∈ {0, 1, #}∗ , and every # separates consecutive letters. U starts by finding the beginning of w and copying it to tape 2. It then restores head 2 to the beginning of w. Next, U finds q0 in the description of M and writes it on tape 3. At the beginning of every iteration, we assume that head 2 is pointing to the letter that the head of M is supposed to point. U operates as follows. First, it scans tape 3 and compares it to qacc and qrej . If one of the comparisons succeeds, U acts accordingly (accept/reject). Next, U scans all the transitions in the description of δ, and for each one it compares the first two components to tapes 3 and 2 respectively. That is, U searches for the appropriate transition. Once a transition is found, U finds the letter that should be written, and replaces it with the current letter in tape 2. Recall that “letters” in M are encoded as strings of the same length in hM i, so replacing the letter will not require us to shrink or push tape 2. Next, U scans where it should move the head to (R or L), and moves head 2 to the next letter. Finally, U finds the new state M should go to, and writes it on tape 3. Recap on the operation of U : 1. Scan tape 1 to find the beginning of w. 2. Copy w to tape 2, reset heads 1 and 2. 3. Scan tape 1 to find q0 , copy q0 to tape 3. Reset heads 1,3. 4. In every iteration, repeat the following. (a) Compare tape 3 to qacc , qrej , and act accordingly if successful. (b) Scan tape 1 and find the beginning of δ. (c) Compare tape 3 and current letter in tape 2 until appropriate transition is found. (d) Replace current letter in tape 2 with letter from transition. (e) Move head of tape 2 left or right according to transition, to next letter. (f) Replace content of tape 3 with new state. Observe that we have constructed a single machine, not a general abstract machine. There are many specific implementations of such universal machines. An important thing to notice about the construction is that we can make small changes to create useful variants of U , for example: • A machine that runs M on w and acts the opposite of M . • A machine that runs M on w for a certain number of steps (not infinitely). • A machine that runs M on w only as long as M stays in the limits of a certain portion of the tape. Final remark: there are many computational models that you have not seen in this course (and probably won’t see). One measure for the strength of a model is whether it can simulate a TM. 6.5 Remark about TM comparing two numbers The careful reader might notice that in the last two proofs we used indices, and had to compare the current index to 0, and stop if the numbers are equal. But(!) how does a TM determine if two numbers are indeed equal? We did not formally explain this yet. Indeed, we can define a TM that accomplishes this task, and even describe the states!. A detailed explanation can be found here: TM machine as comparator of numbers :) 9 Computability - Recitation 7 4/12/22 - 11/12/22 1 Closure properties of R and RE For a TM M , we say that M recognizes a language L if L(M ) = L. We say that M decides L if L(M ) = L and M halts on every input. That is, for every w ∈ Σ∗ we have that M accepts w if w ∈ L and M rejects w if w ∈ / L. We define RE = {L : L is recognizable by a TM}, R = {L : L is decidable by a TM}, and coRE = L : L is recognizable by a TM . In class you will see that R = RE ∩ coRE. The natural thing to do now is to study under which operations these classes are closed. Let’s start with something easy: Proposition 1.1. If L ∈ R then L ∈ R. Proof. Let M be a machine that decides L, we swap qacc and qrej in M . Since M always halts, then for every w, its run on w either gets to qacc or to qrej . Using this, it is easy to prove the correctness of this modification. Note that we crucially use the fact that M is a decider, and that M is deterministic. Indeed, for nondeterministic machines (which we will see later on in the course), or for machines that only recognize their language, this no longer works. Proposition 1.2. If L1 , L2 ∈ R, then L1 ∪ L2 ∈ R. Proof. Let M1 , M2 be TMs that decide L1 , L2 respectively. We construct a TM M for L1 ∪ L2 as follows. The main idea is to run M1 on w, and if it rejects, run M2 . If either machine accepts, we accept. Otherwise we reject. To give a more detailed description, we proceed as follows. M starts by marking a special delimiter # left of the input w. It then copies w to the left of # (assume a bidirectional tape). M then passes control to M1 , with # acting as the left bound of the tape. If, at any point, M1 moves to its accepting state, M also accepts. If M1 moves to the rejecting state, M erases the contents of the tape up to #, then copies back w to the right of #, and passes control to M2 and answers the same. Correctness is relatively clear here - M accepts only if at least of of M1 or M2 accepts. Also, all the “overhead” operations are things we know how to do with a TM - erase the input and copy strings. Note that again, we rely on the fact that M1 and M2 are deciders, since otherwise M1 may get stuck, and we will never try to run M2 . So can we show that RE is also closed under union? As it turns out - yes, but we need to be somewhat more clever. Proposition 1.3. If L1 , L2 ∈ RE, then L1 ∪ L2 ∈ RE. 1 Proof. Let M1 be TM that recognizes L1 , and M2 a TM that recognizes L2 . We need to somehow run both M1 and M2 in parallel. There are several ways of doing that. Perhaps the simplest is to use two tapes, and simulate each machine independently on its own tape. This is rather simple: start by copying the word to the second tape, and then apply the transition function of each TM in its separate tape. While this works, we take a different approach here. What really happens here? If we consider the TM that is equivalent to the 2-tape TM we used here, we actually see that in a way, we simply run the machines one step at a time. That is, M1 runs for a step, then M2 , and so forth. This is a sort of parallel run, and it’s impressive that we can simulate a parallel run with the (serial) model of TM. We take a slightly different approach here. We construct a TM M that recognizes L1 ∪ L2 as follows. Start by copying the input to a “safe” place, left of the original input. Additionally, we store a counter i somewhere on the tape (left of the input, for example). We now proceed to run M1 on the input for i steps. How can we do that? well - if you’re re-reading this after seeing a universal machine, then this is trivial. Otherwise, we modify M1 such that after taking a transition, it marks the place of the head on the tape, then goes to the left of the tape, updates a step-counter, compares it to i, and if it hasn’t reached it yet, goes back to where the head was, and continues for another step. After M1 runs for i steps, if it accepts, then M accepts. Otherwise, clean the tape, copy w back to the right of #, and run M2 on w for i steps. If it accepts, we can accept. Otherwise, increment i by 1, and repeat the process. To prove correctness, note that if w ∈ L1 ∪ L2 , then there exists some n > 0 such that either M1 or M2 accepts w within n steps. Thus, when the counter value i reaches n, our machine will accept w. For the second direction, if M accepts w, then clearly either M1 or M2 accept w, so w ∈ L1 ∪ L2 , and we are done. Note that we do not prove that M always halts, as indeed - it may not halt at all! 2 Parallel runs Running parallel computations is crucial in order to show closure properties of certain classes, as well as to recognize certain languages. This technique usually comes in handy when we work with machines that are not deciders. The general problem is that if we are given a machine, and we are not guaranteed that it is a decider, we do not want to simply let it run, because it may never halt. Instead, we only run it for a certain amount of steps, thus giving us time to decide if we want to do something else if the machine does not halt after a certain number of steps. Parallel runs are not a specific concept, but rather a general scheme. We demonstrate the idea here by showing that RE is closed under concatenation. Theorem 2.1. Let L1 , L2 ∈ RE, then L1 · L2 ∈ RE. Proof. Let M1 , M2 be TMs that recognize L1 , L2 respectively. Observe that we cannot assume M1 and M2 halt on inputs that are not in their languages. Let w ∈ Σ∗ be the input. The naive solution idea is to try every partition of w into w = uv, and then run M1 on u and M2 on v. If both accept, then w ∈ L1 · L2 and we accept. Otherwise, we try the next partition. However, this is not as simple as it sounds - when we try the first partition, it could be the case that M1 does not halt. In this case, our TM also does not halt, and we will never check other partitions! We break the problem into two parts. First, we construct a new machine M3 . This machine reads a word of the form u#v (if the word is not of this form, it rejects). Upon reading this word, M3 runs M1 on u, and then, if M1 accepts, erases the tape and runs M2 on v. Thus, M3 recognizes the language L3 = {u#v : u ∈ L1 , v ∈ L2 } Note that M3 is not necessarily a decider! Now, we are left with a simple task: given w, check whether we can split w as w = uv such that M3 accepts u#v. 2 As we stated earlier, running M3 serially on every partition is not good enough, since it might not halt on the first one, even though a later partition is accepted. So we want to run M3 “in parallel” on all the partitions. This is done as follows. We construct a TM M as follows: We add a new character ⊥ to the alphabet of M (in addition to ΓM1 , ΓM2 , ΓM3 ). ⊥ will act as a delimiter. Start by copying the input to a safe place, left of the original input, followed by ⊥. Next store a counter right after the first ⊥ and initialize it to 1 (i will keep track of how many steps the simulation is to run), then add another ⊥. Now add yet another counter j, initialized to |w|, followed by yet another ⊥. j will keep track of the current partition. The rest of the tape will be used to simulate the run of M3 on every partition of the input w for i steps. Simulating M3 this way requires operating on the 4th part of the tape (after the last ⊥) the same as M3 operates, but after every transition of M3 , the counter i is reduced by 1 (while keeping a copy of its original value to use in the next partition) and compared to 0. If it is 0, M stops simulating M3 , and moves to the next partition. The simulation of M3 works as follows. M reads the value i of the counter, it then simulates M3 on every partition of w, but only for i steps. When all the partitions have been simulated, the counter value is increased by 1 and the same process begins. If, at any point, M3 accepts, then M accepts as well. Otherwise, M does not halt. (keeping track on the current partition is done using the counter j. At every change of partition, reduce j by 1 and compare to 0. If 0 is reached, the current simulation is done. The counter j is the restored to |w| and i is increased by 1. Now, if w ∈ L1 · L2 , then there exists a partition w = uv such that u#v is accepted by M3 . Since it is accepted, there exists a finite number k such that M3 accepts u#v after k iterations. So M also accepts w after the counter i reaches value k. Conversely, if M accepts w, then one of the partitions is accepted by M3 , so w ∈ L1 · L2 . 3 Universal TM In the late 80’s, for those who remember, hand-held video games were very popular. These devices ran a single game. In the early 90’s, came the GAMEBOYr , and there was a big hype, which changed the hand-held game industry forever. What was the big change? The main advantage of the GAMEBOY was allowing different programs to run on the same machine. This ability seems trivial in computers today (for those who remember the old floppy disk, perhaps it doesn’t). Enough history, back to TMs. A TM runs a single program, by definition. However, for various uses, we want TMs to be able to run different programs. Can we do that? The answer, as expected (after claiming TMs are as strong as computers), is yes. 3.1 Encoding The first observation we need to make is that we can encode a TM as a finite string over a finite alphabet. We will show how to do that shortly. A universal TM is a TM that can take as input an encoding of a TM and simulate it on some input (given or fixed). We start by describing how we can encode a TM. We will use the alphabet {0, 1, #}. Let M = hQ, Σ, Γ, δ, q0 , qacc , qrej i. We encode M as follows. The states are encoded as increasing binary numbers. That is, if Q = {q1 , ..., qn }, then the encoding starts with 0#1#10#11#100#... We end the state string with a triple ###. Next, we encode Σ and Γ. We encode each letter with a binary string of length ⌈log |Γ|⌉ (we will later see why). We first encode the letters of Σ as increasing binary numbers, separated by #. To encode Γ, we start from the last symbol we encode from Σ, and encode Γ \ Σ as binary numbers. We separate Σ and Γ with ###. 3 Example 3.2: If Σ = {a, b} and Γ = {xy, 0, 1} ∪ Σ then we have |Γ| = 5. Thus, we use 3 bits to encode each letter. The encoding will be 000#001#010#011#101 Corresponding to a, b, xy, 0, 1. Next, we encode δ. A single transition δ(q, γ) = (q ′ , γ ′ , L) is encoded as a tuple hqi#hγi#hq ′ i#hγ ′ i#hLi## Where hLi = 0 and hRi = 1. We end the description of δ with ###. Finally, we encode q0 , qacc , qrej with their binary encoding, separated by ###. Example 3.3: Assume hqi = 101, hγi = 010, hq ′ i = 1 and hγ ′ i = 100. Assume we have the transition δ(q, γ) = (q ′ , γ ′ , R), then the encoding will contain 101#010#1#100#1## Given a TM, we denote it’s encoding by hM i. We note here that in the same spirit, we can encode many other different objects, such as NFAs, graphs, matrices, logic formulas, and many more, which you will see during this course. 3.4 A universal machine We want to construct a TM U that, given an encoding hM, wi of a TM M and a word w, can simulate the run of M on w. That is, U accepts/rejects/gets stuck if M accepts/reject/gets stuck on w, respectively. To construct U , we use a TM with 3 tapes. The first tape will hold the description of M , the second will hold the working tape of M , and the third tape will be used to hold the current state of M , and for calculations. U starts with hM i written on the first tape. Let’s also assume that we also get as input a word w. We want to run M on w. For that, we assume that we are actually given an encoding of w, according to the encoding of the alphabet of M . That is, w ∈ {0, 1, #}∗ , and every # separates consecutive letters. U starts by finding the beginning of w and copying it to tape 2. It then restores head 2 to the beginning of w. Next, U finds q0 in the description of M and writes it on tape 3. At the beginning of every iteration, we assume that head 2 is pointing to the letter that the head of M is supposed to point. U operates as follows. First, it scans tape 3 and compares it to qacc and qrej . If one of the comparisons succeeds, U acts accordingly (accept/reject). Next, U scans all the transitions in the description of δ, and for each one it compares the first two components to tapes 3 and 2 respectively. That is, U searches for the appropriate transition. Once a transition is found, U finds the letter that should be written, and replaces it with the current letter in tape 2. Recall that “letters” in M are encoded as strings of the same length in hM i, so replacing the letter will not require us to shrink or push tape 2. Next, U scans where it should move the head to (R or L), and moves head 2 to the next letter. Finally, U finds the new state M should go to, and writes it on tape 3. Recap on the operation of U : 1. Scan tape 1 to find the beginning of w. 2. Copy w to tape 2, reset heads 1 and 2. 3. Scan tape 1 to find q0 , copy q0 to tape 3. Reset heads 1,3. 4. In every iteration, repeat the following. 4 (a) Compare tape 3 to qacc , qrej , and act accordingly if successful. (b) Scan tape 1 and find the beginning of δ. (c) Compare tape 3 and current letter in tape 2 until appropriate transition is found. (d) Replace current letter in tape 2 with letter from transition. (e) Move head of tape 2 left or right according to transition, to next letter. (f) Replace content of tape 3 with new state. Observe that we have constructed a single machine, not a general abstract machine. There are many specific implementations of such universal machines. An important thing to notice about the construction is that we can make small changes to create useful variants of U , for example: • A machine that runs M on w and acts the opposite of M . • A machine that runs M on w for a certain number of steps (not infinitely). • A machine that runs M on w only as long as M stays in the limits of a certain portion of the tape. Final remark: there are many computational models that you have not seen in this course (and probably won’t see). One measure for the strength of a model is whether it can simulate a TM. 3.5 Remark about TM comparing two numbers The careful reader might notice that in the last two proofs we used indices, and had to compare the current index to 0, and stop if the numbers are equal. But(!) how does a TM determine if two numbers are indeed equal? We did not formally explain this yet. Indeed, we can define a TM that accomplishes this task, and even describe the states!. A detailed explanation can be found here: TM machine as comparator of numbers :) 4 Nondeterministic TMs A nondeterministic TM (NTM, for short) is exactly the same as a TM, with the only difference that δ : Q\{qacc , qrej } × Γ → 2Q×Γ×{R,L} \∅ Thus, at every stage, the machine can nondeterministically choose a transition. As in regular automata, this means that every word has many possible runs. Formally, we say that a configuration d follows configuration c if there exists a transition rule in δ according to which d follows from c as in the deterministic case. We say that w ∈ L(M ) if w has an accepting run. This implies that when M runs on w, there could be runs that reject and runs that get stuck, but if there is even one run that accepts, the word is accepted. The NTM M is considered a decider if for every input w, every run of M on w is a halting run. Example 4.1 Consider the language C = {hni : n is a composite number}. First, let’s see why C is recognizable by an NTM: an obvious answer is that C is decidable. But let’s use the non-determinism ad-hoc. Given a number n, an NTM can non-deterministically write on the tape every number p between 2, ..., n − 1 (one number per non-deterministic choice), and check whether p divides n - if so, accept. Thus, n is accepted iff there exists an accepting run, which happens iff there exists a factor of n. 5 4.2 Equivalence of NTMs and TMs TMs are a particular case of NTMs, so they are at most as powerful. That is, every language that can be recognized by a TM can also be recognized by an NTM. The interesting question is whether the converse it true. In the context of computability, the answer is yes - every NTM has a TM that recognizes the same language. Before proving this, we go through some notions and definitions. 4.2.1 Runtrees of NTMs Let N = hQ, Σ, Γ, δ, q0 , qrej , qacc i be an NTM, and let w ∈ Σ∗ be an input for N . The runtree of N w.r.t w, denoted TN,w = hV, Ei, is formally defined as follows. Let C denote the set of all configurations of N . We have the following: • V ⊆ C × (N ∪ {0}). That is, every vertex hc, ii corresponds to the configuration c of N that lies in the level i of the tree TN,w . • The root of TN,w is hq0 w, 0i. That is, the root corresponds to the initial configuration of N on w, q0 w, and it has no incoming edges. S • E ⊆ i≥0 (C × {i}) × (C × {i + 1}) is such that forall i ≥ 0, it holds that E(hc, ii, hd, i + 1i) iff there exists a transition rule in δ according to which d follows from c. That is, from every vertex that corresponds to the configuration c there is an edge to a vertex (in the next level) that correcponds to the configuration d whenever d is a consecutive configuration of c. For simplicity, we abuse notation and refer to a vertex by the configuration it corresponds to. For example, when we say that the vertex v is an accepting configuration, then we mean that v is of the form hc, ii, where c is an accepting configuration. Intuitively, the run tree TN,w encodes all the runs of N on w - a path from the root q0 w to some node in the tree corresponds to a partial run of N on w, a path from the root to a leaf corresponds to an accepting or a rejecting run, and infinite paths from the root correspond to non-halting runs. Note that different vertices in the tree can correspond to the same configuration. This is both because a configuration can be reached via different runs and because as a run of the machine can be be stuck in a loop, in which case the configuration re-appears in the same path of the tree. Finally, as we are interested in the runs of N on w, we may assume that all the vertices in the tree are reachable from the root q0 w. Also note that if a configuration is a halting configuration (that is, its state is qrej or qacc ), then it is a leaf in the tree. Finally, note that while V may be infinite, it is finite when N is a decider. remark Let N be a fixed NTM. Then there is a constant k that depends on |hN i| such that for every input w for N , k bounds the branching degree of TN,w , that it, the maximal number of children of a node in the tree is at most k. The following lemma follows form the definitions, and is left as an exercise. Lemma 4.1. An NTM N is a decider iff all its runtrees are finite. Proof. Left to the reader. The hard direction is to show that if N is a decider, then for every input w, the runtree TM,w is finite. To show that, you can use Remark 4.2.1 and König’s lemma. Theorem 4.2. For every decider NTM N , there exists a decider TM D with L(N ) = L(D). Proof. Let N = hQ, Σ, Γ, δ, q0 , qrej , qacc i be a decider NTM. We decsribe an equivalent TM D. The idea is as follows. Given input w, the machine D scans that runtree TN,w to find an accepting configuration. However, note that D has no explicit description of TN,w , so D needs to apply the scan given only w and the decription of N (which can be hardcoded in D). For this, we introduce the notion of addresses of vertices in TN,w . Let k be a constant that bounds the branching degree of every runtree of N , and consider the alphabet Σk = {1, 2, . . . , k}. We can think of every word u in Σ∗k as an address of a vertex in TN,w . For example, the 6 word u = ǫ is the address of the root q0 w, and the word u = 13 is the address of the configuration d that we reach from the root by following the path q0 w → c → d in the tree, where c is the first1 child of q0 w, and d is the third child of c. Note that there are words u ∈ Σ∗k that do not describe an address of a vertex in the tree, and we denote them by invalid adresses. Also, note that given a word u ∈ Σ∗k , the machine D can check whether u is a valid address to an accepting configuration in TN,w . Indeed, given an address u = u1 · u2 · · · ui , D can write down the current configuration c0 = q0 w which is the initial configuration of N on w. Then, D checks (according to δ) whether c0 has a u1 ’th following configuration, c1 . If c1 exists, D writes it down and proceeds similarly to the letter u2 in order to compute the configuration c2 . If at some point, D discovers that the following configuration according to the address u does not exist, then we know that u is invalid. Otherwise, once D computes the configuration ci , we check whether it is accepting. We’re now ready to define D. Given input w, the machine D operates in iterations. In the i’th iteration: 1. D writes down all the addresses u ∈ Σ∗k of length i in lexicographic order. 2. D goes over the writeen addresses and checks whether one of them describes a path from the root q0 w to an accepting configuration in TN,w . If such an address is found, then D accepts. Otherwise, proceed to 3. 3. If all addresses u of length i are invalid, D rejects. Otherwise, proceed to the next iteration. Intuitively, considering the addresses in minlex order corresponds to scanning the runtree TN,w in a BFS manner. Correctness follows easily. Indeed, if w ∈ L(N ), then the runtree TM,w has an accepting configuration and eventually D’s scan finds it and accepts. Conversely, if D accepts, then there is some u that describes an address of an accepting configuration in TN,w and thus w ∈ L(N ). Finally, note that D is a decider. Indeed, if there is a word w ∈ / L(N ), then there is no accepting configuration in TN,w . Now by Lemma 4.1, TN,w has a finite height. Hence, there is some t such that all the words of length t over Σk describe invalid addresses and thus D cannot have more than t iterations. Hence, it has to reject w. A very important point that needs to be made regarding this construction is its runtime cost. Assuming the NTM N has at most k consecutive configurations for each configuration, and that it performs t steps on some input, the construction yields a TM with a runtime of O(t2 · k t ) = 2O(t) steps determined by the size of the tree and the length of an encoding of a run. Thus, this translation from NTMs to TMs is not efficient. Can we make it efficient? The surprising answer is that we don’t know. Finally, note that the above construction works if we assume that the machine N is not a decider, that is, we can generalize the previous theorem to the following. Theorem 4.3. For every NTM N , there exists a TM D with L(N ) = L(D) and: 1. If N is a decider, then D is a decider. 2. For every word w ∈ / L(N ), it holds that D halts on w iff all the runs of N on w are halting runs. Indeed, if we apply the same construction for a non decider TM N that has non-halting runs and accepting ones, then D will eventually find an accepting configuration in the runtree as it scans the tree in a BFS manner. Also, note that D does not halt when N has an infinite run but has no accepting runs. 1 Note that we are assuming that there is an order on the consecutive configurations. This is okay, as the description of N (that is encoded in D) defines such an order. 7 Computability - Recitation 8 December 13, 2022 1 The concept of mapping reduction We start with a reminder. Let L1 , L2 ⊆ Σ∗ . We say that L1 is mapping-reducible to L2 , and denote L1 ≤m L2 , if there exists a computable function f : Σ∗ → Σ∗ such that for every x ∈ Σ∗ it holds that x ∈ L1 iff f (x) ∈ L2 . f is then called a reduction from L1 to L2 . We remind that since f is computable, this means that there exists a TM T such that given input x ∈ Σ∗ , T always halts with f (x) ∈ Σ∗ written on the tape. The intuition behind this, is that L2 is “harder” than L1 , in the sense that if we have a TM M that decides L2 , we can decide L1 as follows. Given input x to L1 , run T on x to obtain T (x), then run M on T (x) and answer the same.1 The following theorem demonstrates the intuition. Theorem 1.1. Let L1 , L2 ⊆ Σ∗ such that L1 ≤m L2 . The following holds: 1. If L2 ∈ RE then L1 ∈ RE. 2. If L2 ∈ co-RE then L1 ∈ co-RE. Proof. For the first part, assume that L2 ∈ RE, so there exists a TM M that recognizes L2 . By our assumption, there exists a reduction f : Σ∗ → Σ∗ from L1 to L2 . We define a TM N , that given input x works as follows: • Compute y = f (x). • Simulate M on the input y. It holds that N accepts x iff M accepts f (x) iff f (x) ∈ L2 iff x ∈ L1 . So N recognizes L1 , and L1 ∈ RE. As for the second part, notice that a reduction from L1 to L2 is also a reduction from L1 and L2 , so the second part can be deduced by applying the first part of the claim to the languages L1 and L2 . So how do we use this theorem? Given a language L, if we think that L ∈ / RE, we look for a language to reduce from. As you will see in the exercise, reductions are transitive. This means that with every language for which we prove undecidability, we increase our arsenal. Currently, our main “weapons” are ATM ∈ RE \ co-RE, and ATM ∈ co-RE \ RE where ATM = {⟨M, w⟩ | w ∈ L (M )}. 2 Classifying languages into computability classes We have seen that RE ∩ co-RE = R. Hence, every language belongs to exactly one of the following four sets: R, RE \ R, co-RE \ R or RE ∪ co-RE. The tools that we have allow us to classify many languages into the correct set. Let’s see some examples. 1 Observe that this is somewhat a “strong” notion - we not only use M to solve L , we do it with a single use at the end. 1 There is a weaker notion of reduction called Turing reductions. 1 2.1 ALLTM Define the language ALLTM = {⟨M ⟩ : L(M ) = Σ∗ }. Proposition 2.1. ALLTM ∈ RE ∪ co-RE Proof. We split the proof into two claims. Claim: ATM ≤m ALLTM (and thus, ALLTM ∈ / co-RE) Construction: The reduction proceeds as follows. On input ⟨M, w⟩ for ATM , the reduction returns ⟨K⟩ where K is a machine that on input x, simulates M on w and answers the same (i.e. if M accepts, so does K, and if M rejects, so does K. Clearly if M does not halt, so does K). Note that K ignores the input x. Correctness: If ⟨M, w⟩ ∈ ATM then K accepts every input, so L(K) = Σ∗ , so ⟨K⟩ ∈ ALLTM . Conversely, if ⟨M, w⟩ ∈ / ATM , then M either gets stuck or rejects w. In any case, K does not accept x, so L(K) = ∅ = ̸ Σ∗ , so ⟨K⟩ ∈ / ALLTM . Thus, the reduction is correct. Computability: Finally, we still need to show that the reduction is computable. This is often the confusing part, since it’s usually clear that it’s computable, but you don’t know how formal you need to get. The answer is that you should explain the nontrivial parts. In this case: The reduction is computable since from the encoding of ⟨M, w⟩ we construct the new machine K by making it a universal machine and hard-coding its input to be ⟨M, w⟩, so we can create ⟨K⟩ using an algorithm (and so using a TM). Remark 2.2. Note that the reduction didn’t run M on w, it only constructed ⟨K⟩ from ⟨M, w⟩. We now proceed to show the second reduction, and it’s a bit trickier. The problem is that we want to simulate M on w, but we actually want to do something if M does not accept w. Now, if M rejects w, that’s fine. We’ll wait until it rejects, and go about our business. However, what happens if M does not halt on w? Well, then we need some tricks. Claim: ATM ≤m ALLTM . (and thus, ALLTM ∈ / RE) Construction: The reduction proceeds as follows. On input ⟨M, w⟩, the reduction constructs ⟨K⟩ where K is a machine that on input x, simulates M on w for |x| steps. If, during this, M accepts, then K rejects. Otherwise, K accepts. Correctness: If ⟨M, w⟩ ∈ ATM then M does not accept w, and in particular, M does not accept w within |x| steps, for all x. Thus, K accepts every input, so L(K) = Σ∗ , and so ⟨K⟩ ∈ ALLTM . Conversely, if ⟨M, w⟩ ∈ / ATM , then M accepts w, and therefore there exists some n ∈ such that M accepts w within n steps. Then, for every x such that |x| > n, K rejects x, So L(K) ̸= Σ∗ , so ⟨K⟩ ∈ / ALLTM , so the reduction is correct. N Computability: Finally, this reduction is computable for the same reasons the previous construction was computable - we can compute ⟨K⟩ from ⟨M, w⟩, for example by hard-coding all the parts of ⟨K⟩ that are independent of ⟨M, w⟩ into the TM computing the reduction. 2 2.2 USELESS Let USELESS = {⟨M ⟩ : there exists a state q ∈ / {qacc , qrej } in M that is never reached, on any input}. We claim that USELESS ∈ co-RE \ R. First, to show that USELESS ∈ co-RE it’s enough to show that given input ⟨M ⟩, we can always reject if ⟨M ⟩ ∈ / USELESS. Given input ⟨M ⟩, a TM can simulate M on every input in parallel (incrementally 2 ), while keeping track of visited states. If, at any point, every state was visited, then ⟨M ⟩ ∈ / USELESS and we can reject. Otherwise we do not halt. Now we want to show that USELESS ∈ / RE. We show that by showing AT M ≤m USELESS. Construction: The reduction machine, T , works as follows. Given input ⟨M, w⟩, T construct the machine H that works as follows. On input x, H simulates M on w (without restricting the steps). If M accepts w, H moves to a new state, from which it traverses every state of itself, and then accepts. If M rejects w, then H rejects. Correctness: If ⟨M, w⟩ ∈ AT M , then M does not accept w. Thus, H never reaches the special traversing state, so ⟨H⟩ ∈ USELESS. Conversely, if ⟨M, w⟩ ∈ / AT M , then M accepts w, so H always reaches the traversal state, and thus visits every state in the machine. So ⟨H⟩ ∈ / USELESS. Computability: Here we face a gap that was left from the construction: how to construct a traversingstate? It’s not trivial, and it’s the sort of think we must explain in order for this answer to be correct. The idea is to write on the tape a special symbol @, and from every state in H make a transition to a “next” state upon reading @, (having ordered the states arbitrarily), without moving the head.3 The traversing-state writes this special symbol on the tape, and starts the traversal. This promises that every state is visited, but that it won’t be used before we get to the special state. There is a small technical problem here - we also want to make sure there is some input on which H visits qreject , but this cannot be part of the traversal. To solve this, we modify H such that on a certain x (e.g x = ϵ), it goes straight to qreject . Finally, we remark that the rest of the construction is computable - the simulation part is as we have seen in class, and the traversal part is easy to compute - simply order the states of the machine and add the appropriate transitions. 2.3 From 2017 moed B exam The following problem is taken from last year’s exam. The school solution given here uses the language HALTTM = {⟨M, w⟩ | M halts on w}. It is not a hard exercise to see that HALTTM ∈ RE \ R. enumerating Σ∗ , run 1 step on the first word, then 2 steps on the first two words, 3 on the first three, and so on. you don’t work with a TM that has a “stay” option - add the symbol on two cells, and move left and right repeatedly. 2 By 3 If 3 4 3 Rice’s Theorem We now formulate the notion of some of the reductions we’ve encountered so far to a theorem, due to Henry Gordon Rice. Theorem 3.1 (Rice’s Theorem). Let P be a nontrivial semantic property of TMs, then LP = {⟨M ⟩ : M ∈ P } is undecidable. We need to explain what this means, of course. A semantic property P of TMs is a set of TM’s, with the following property: for every two TMs M1 , M2 , if L(M1 ) = L(M2 ), then M1 ∈ P iff M2 ∈ P . Intuitively, P is a set of machines, but is defined through their languages. A semantic property P is nontrivial if there exist two machines M1 , M2 such that M1 ∈ P and M2 ∈ / P. That is, P is not all of the machines, nor no machines. To prove the theorem, we prove the following lemma. Lemma 3.2. Let P be a nontrivial semantic property of TMs, such that T∅ ∈ / P (where L(T∅ ) = ∅), then AT M ≤m LP . Proof. We show a reduction f from AT M to LP . We are given an input ⟨M, w⟩ for AT M , and we need to construct an input f (⟨M, w⟩) = ⟨T ⟩ such that ⟨M, w⟩ ∈ AT M iff ⟨T ⟩ ∈ LP . Let H be a machine such that H ∈ P . We know that such a machine exists, because P is nontrivial. The reduction f works as follows. Given M, w, construct the machine T such that works as follows. 1. On input x, simulate M on w. If it halts and rejects, reject. If it accepts, proceed to 2. 2. Simulate H on x. If it accepts, accept, if it rejects, reject (otherwise we get stuck). Now we need to prove formally that this reduction is correct. First, we need to explain why it is computable. This is easy, since we know we can simulate a TM on a word, so we can do it for both M on w and H on x. Now we need to show why it works. First, if ⟨M, w⟩ ∈ AT M , then M accepts w, so L(T ) = L(H), since H ∈ P and P is a semantic property, this means that T ∈ P , so ⟨T ⟩ ∈ LP . Conversely, if ⟨M, w⟩ ∈ / AT M , then T does not accept anything. So L(T ) = ∅, but T∅ ∈ / P , so T ∈ / P , so ⟨T ⟩ ∈ / LP . We conclude that ⟨M, w⟩ ∈ AT M iff f (⟨M, w⟩) ∈ LP , so AT M ≤m LP . We can now prove the theorem. Proof of Theorem 3.1. Let P be a nontrivial semantic property. If T∅ ∈ / P , then from the lemma we conclude that LP ∈ / co-RE, and in particular - undecidable. / co-RE, so LP ∈ / RE, and in Otherwise, consider LP , then from the lemma we conclude that LP ∈ particular - undecidable. The useful thing in Rice’s theorem is clearly the lemma, not the theorem, since it gives us more information. Example: Let L = {⟨M ⟩ : ∀w ∈ Σ∗ , w ∈ L(M ) ⇐⇒ wwR wwR ∈ L(M )} This is a nontrivial semantic property (make sure you understand why). Also, ⟨T∅ ⟩ ∈ L, so from the lemma we get that L ∈ / RE. An important point: The lemma does not tell us anything about the converse. That is, we may conclude that L ∈ / RE, but we do not know whether L ∈ co-RE or not. For this we need creative tricks, and there is no general known way to tell. 5 Computability - Recitation 9 December 27, 2022 1 One More Reduction Given two languages L1 and L2 , we say that they 10-agree if there are at least 10 distinct words w1 , . . . , w10 such that for every 1 ≤ i ≤ 10, we have wi ∈ L1 iff wi ∈ L2 . Let’s classify the language: L = {⟨M1 , M2 ⟩ : L(M1 ) and L(M2 ) 10-agree} We show that L ∈ RE ∪ co-RE. We will describe two reductions. One from the language HALTϵTM = {⟨M ⟩ : M halts on ϵ} and one from its complement. You have seen in class that HALTϵTM ∈ RE \ R (and hence HALTϵTM ∈ co-RE \ R). We start with the reduction HALTϵTM ≤m L, which shows that L ∈ / co-RE. • Construction: Given ⟨M ⟩, the reduction outputs ⟨M1 , M2 ⟩ where M1 , M2 are defined as follows. M1 immediately accepts (for every input). M2 ignores its input, simulates M on ϵ, and if M halts, M2 accepts (otherwise, M2 runs forever). • Correctness: Suppose ⟨M ⟩ ∈ HALTϵTM . Then M halts on ϵ, so both M1 and M2 accept every word. We have L(M1 ) = L(M2 ), so they 10-agree (in fact they agree on every word), hence ⟨M1 , M2 ⟩ ∈ L. In the other direction, suppose ⟨M ⟩ ∈ / HALTϵTM . Then M does not halt on ϵ, so M2 does not accept any word. We have L(M1 ) = Σ∗ and L(M2 ) = ∅. They disagree on every word, so in particular they do not 10-agree, and ⟨M1 , M2 ⟩ ∈ / L. • Computability: M1 can be constructed with q0 = qacc . M2 can be constructed with two parts: the first part erases the input, and the second part is a copy of M . To show that L ∈ / RE, we use a similar reduction from HALTϵTM . It works the same as above, except that M1 now rejects immediately. • Correctness: Suppose ⟨M ⟩ ∈ HALTϵTM . Then M does not halt on ϵ, so both M1 and M2 never accept (for every input). We have L(M1 ) = L(M2 ), so they 10-agree (in fact they agree on every word), hence ⟨M1 , M2 ⟩ ∈ L. In the other direction, suppose ⟨M ⟩ ∈ / HALTϵTM . Then M halts on ϵ, so M2 accepts every word. We have L(M1 ) = ∅ ∗ and L(M2 ) = Σ . They disagree on every word, so in particular they do not 10-agree, and ⟨M1 , M2 ⟩ ∈ / L. 2 Rice’s Theorem We now formulate the notion of some of the reductions we’ve encountered so far to a theorem, due to Henry Gordon Rice. Theorem 2.1 (Rice’s Theorem). Let P be a nontrivial semantic property of TMs, then LP = {⟨M ⟩ : M ∈ P } is undecidable. 1 We need to explain what this means, of course. A semantic property P of TMs is a set of TM’s, with the following property: for every two TMs M1 , M2 , if L(M1 ) = L(M2 ), then M1 ∈ P iff M2 ∈ P . Intuitively, P is a set of machines, but is defined through their languages. A semantic property P is nontrivial if there exist two machines M1 , M2 such that M1 ∈ P and M2 ∈ / P . That is, P is not all of the machines, nor no machines. To prove the theorem, we prove the following lemma. Lemma 2.2. Let P be a nontrivial semantic property of TMs, such that T∅ ∈ / P (where L(T∅ ) = ∅), then AT M ≤m LP . Proof. We show a reduction f from AT M to LP . We are given an input ⟨M, w⟩ for AT M , and we need to construct an input f (⟨M, w⟩) = ⟨T ⟩ such that ⟨M, w⟩ ∈ AT M iff ⟨T ⟩ ∈ LP . Let H be a machine such that H ∈ P . We know that such a machine exists, because P is nontrivial. The reduction f works as follows. Given M, w, construct the machine T such that works as follows. 1. On input x, simulate M on w. If it halts and rejects, reject. If it accepts, proceed to 2. 2. Simulate H on x. If it accepts, accept, if it rejects, reject (otherwise we get stuck). Now we need to prove formally that this reduction is correct. First, we need to explain why it is computable. This is easy, since we know we can simulate a TM on a word, so we can do it for both M on w and H on x. Now we need to show why it works. First, if ⟨M, w⟩ ∈ AT M , then M accepts w, so L(T ) = L(H), since H ∈ P and P is a semantic property, this means that T ∈ P , so ⟨T ⟩ ∈ LP . Conversely, if ⟨M, w⟩ ∈ / AT M , then T does not accept anything. So L(T ) = ∅, but T∅ ∈ / P , so T ∈ / P , so ⟨T ⟩ ∈ / LP . We conclude that ⟨M, w⟩ ∈ AT M iff f (⟨M, w⟩) ∈ LP , so AT M ≤m LP . We can now prove the theorem. Proof of Theorem 2.1. Let P be a nontrivial semantic property. If T∅ ∈ / P , then from the lemma we conclude that LP ∈ / co-RE, and in particular - undecidable. Otherwise, consider LP , then from the lemma we conclude that LP ∈ / co-RE, so LP ∈ / RE, and in particular undecidable. The useful thing in Rice’s theorem is clearly the lemma, not the theorem, since it gives us more information. Example: Let L = {⟨M ⟩ : ∀w ∈ Σ∗ , w ∈ L(M ) ⇐⇒ wwR wwR ∈ L(M )} This is a nontrivial semantic property (make sure you understand why). Also, ⟨T∅ ⟩ ∈ L, so from the lemma we get that L ∈ / RE. An important point: The lemma does not tell us anything about the converse. That is, we may conclude that L ∈ / RE, but we do not know whether L ∈ co-RE or not. For this we need creative tricks, and there is no general known way to tell. 3 NP and Equivalent Definitions There are two equivalent definitions for NP. The first definition is NP = ∞ [ NTIME(nk ). k=0 That is, NP is the set of languages that can be decided by an NTM in polynomial time. 2 Example: define: A positive integer n is called composite if there are integers a, b > 1 such that ab = n. Let Σ = {0, 1} and COMPOSITE = {w : w is a composite number in binary} An NTM N can decide COMPOSITE in polynomial time as follows. Suppose that the input w is a binary representation of the number n. Note that the input size is |w| = ⌊log n⌋ + 1. N uses its nondeterminism to guess a factor a between 2 and n − 1. It writes the first bit of a nondeterministically (0 or 1) on the tape, then N moves to the right and guesses the next bit of a, and so on. Note that numbers up to n can be represented using at most |w| bits, so it takes N a polynomial number of steps to write ⟨a⟩. Once N has written ⟨a⟩, it checks whether a is a factor of n. If so, N accepts. Otherwise, it rejects. Checking whether a is a factor of n can be done deterministically in polynomial time (for example, using long division and checking whether there is a remainder). Therefore COMPOSITE ∈ NP. In this algorithm we had to nondeterministically guess a. If instead we were given a, we could verify that it is indeed a factor, and deduce that w ∈ COMPOSITE in polynomial time. We then say that ⟨a⟩ is a witness to the fact that w ∈ COMPOSITE. We show a second definition of NP which generalizes this observation. Remark: We could have defined N to guess both a and b, and then accept iff ab = n. In that case the witness is the pair ⟨a, b⟩. Note that this is still polynomial in |w|. We say that at TM V is a verifier for a language L if L = {w : There exists c such that ⟨w, c⟩ ∈ L(V )}. In our example, the verifier is a machine V that decides whether the witness a is a factor of n. The second, equivalent definition of NP is: NP = {L : There is a verifier V for L that runs on input ⟨w, c⟩ in time polynomial in |w|} We require V to run in time polynomial in |w| because we are only interested in polynomial witnesses. Equivalently, we could define NP to be the set of all languages L that have a polynomial verifier. That is, L ∈ NP iff there is a machine V that runs in polynomial time and: L = {w : There is c such that |c| is polynomial in |w| and ⟨w, c⟩ ∈ L(V )} 3.1 Equivalence Theorem 3.1. A language L can be decided by an NTM in polynomial time iff there is a polynomial verifier for L. Proof. In the first direction, suppose there is a polynomial verifier V for L. Since it’s polynomial, there is k ∈ N such that V runs in at most |w|k steps, where |w| is the input size. Define an NTM N for L that works as follows: for an input w, guess a witness c of length at most |w|k and run V on ⟨w, c⟩. If V accepts, N accepts. Otherwise, N rejects. In the other direction, suppose there is an NTM N that runs in polynomial time and decides L. Construct a polynomial verifier V for L: V accepts ⟨w, r⟩ iff r is an accepting run of N on w. Since N runs in polynomial time, |r| is polynomial in |w|. V verifies that r is a valid accepting run by checking that the initial configuration is q0 w, the last configuration contains qacc , and each configuration yields the next one using a transition of N . 3 Computability - Recitation 10 January 2, 2023 1 1.1 NP and NP-Completeness Reminders We have seen two equivalent definitions for NP. The first definition is NP = ∞ [ NTIME(nk ). k=0 That is, NP is the set of languages that can be decided by an NTM in polynomial time. For the second definition, we say that at TM V is a verifier for a language L if L = {w : There exists c such that ⟨w, c⟩ ∈ L(V )}. A language L is in NP if it has a verifier V which runs on input ⟨w, c⟩ in time polynomial in |w|. A polynomial time reduction is a mapping reduction computable by a TM that runs in polynomial time in its input. If there is a polynomial time reduction from a language K to a language L, we write K ≤p L. We have also seen that Lemma 1.1. If L ∈ P and K ≤p L then K ∈ P. 1.2 NP-Completeness Definition 1.2. A language L is NP-Hard if every K ∈ NP it holds that K ≤p L. You will prove in the exercise that the relation ≤p is transitive. Hence, if L is NP-Hard and L ≤p J then J is also NP-Hard. Definition 1.3. If L is NP-Hard and also L ∈ NP, we say that L is NP-Complete. We do not know whether P = NP. However, we do know that the NP-Complete languages are the hardest languages in NP. The following theorem formalizes this: Theorem 1.4. If there exists an NP-Complete language in P then P = NP. Equivalently: If P ̸= NP then every NP-Complete language is not in P. Proof. We prove the first form of the statement. Assume that L ∈ P is NP-Complete. Consider a language K ∈ NP. Since L is NP-Hard, K ≤p L. Since L ∈ P, it follows from Lemma 1.1 that K ∈ P. Thus, NP ⊆ P. Since P ⊆ NP, we conclude that P = NP. Our goal in this recitation is to get to know many new exciting languages, and use polynomial reductions to prove that these languages are NP-Complete. 1 2 The clique, vertex cover and dominating set problems 2.1 CLIQUE Recall that a clique in a graph G = (V, E) is a set C ⊆ V such that for every x, y ∈ C where x ̸= y, we have {x, y} ∈ E. We define the language. CLIQUE = {⟨G, k⟩ : The graph G has a clique of size k.} You will see in class that CLIQUE is NP-Complete. Today, we will use this fact. 2.2 VC We now consider the vertex-cover problem. Given a graph G = ⟨V, E⟩, a vertex cover in G is a set C ⊆ V such that for every e ∈ E there exists x ∈ C such that x ∈ e. That is, a vertex cover is a set of vertices that touch every edge in the graph. We consider the problem VC = {⟨G, k⟩ : G has a vertex cover of size at most k}. Proposition 2.1. VC is NP-Complete. Proof. First, we show that VC ∈ NP. We do this by describing a polynomial time verifier for VC. Given input ⟨G, k⟩, the witness that ⟨G, k⟩ ∈ VC is a set of vertices S. The verifier checks that |S| ≤ k, and that for every e ∈ E there exists v ∈ S such that e touches v. This is done as follows: first, the verifier counts the size of S (which is at most the size of V , and thus takes polynomial time in |⟨G⟩|). Then, the verifier traverses all the edges in the graph, and for each edge, compares it against all the vertices in S. This takes O(|E| · |S|) = O(|E| · |V |), which is polynomial in the size of G. The verifier accepts iff indeed |S| ≤ k and S is a vertex cover. We now proceed to show that VC is NP-Hard, by showing a reduction from CLIQUE (which by now we know is NP-Hard). Construction: Given input ⟨G, k⟩, the reduction outputs ⟨G, n − k⟩, where G is the complement graph of G (that is, an edge exists in G iff it does not exist in G), and n is the number of vertices in G. Runtime: This is clearly polynomial, since all we need to do is traverse every edge and “flip” it. That takes O(|V |2 ) flips, and we assume a polynomial access to each edge. Also, computing n − k given k can be done in polynomial time. Correctness: This is the interesting part. We claim that G has a clique of size k iff G has a vertex cover of size at most n − k. Let G = ⟨V, E⟩. Then, G = ⟨V, E⟩. If G has a clique C of size k, consider the set C. Since |C| = k, then |C| = n − k. We claim that C is a vertex cover in G. Indeed, let e = {x, y} ∈ E. If x, y ∈ / C, then x, y ∈ C, but {x, y} ∈ E, so {x, y} ∈ / E, so C is not a clique, which is a contradiction. We conclude that every edge has a vertex in C, so C is a vertex cover of size n − k, so ⟨G, n − k⟩ ∈ VC. Conversely, If G has a vertex cover S of size n−k (why don’t we need “at most”?), consider the set S. Since |S| = n−k, / E, then {x, y} ∈ E, but x, y ∈ / S, so S is then S = k. We claim that S is a clique in G. Indeed, let x, y ∈ S. If {x, y} ∈ not a vertex cover of G, which is a contradiction. We conclude that every edge in S exists, so S is a clique of size k, so ⟨G, k⟩ ∈ CLIQUE. 2.3 DS Given an undirected graph G = ⟨V, E⟩, a dominating set in G is a set D ⊆ V such that for every v ∈ V , either v ∈ D or there exists u ∈ D such that {u, v} ∈ E. That is, a set of vertices that is in distance 1 from every other vertex. Let DS = {⟨G, k⟩ : G has a dominating set of size k}. Proposition 2.2. DS is NP-Complete. 2 Proof. First, we show that DS ∈ NP. We do this by describing a polynomial time verifier for DS: the witness is the dominating set in the graph, and given a dominating set D ⊆ V , it is easy to verify that every vertex is indeed connected to a vertex in D, by going over all the edges that touch vertices in D, and making sure we reach all the vertices in V . Thus, DS ∈ NP. To prove that DS is NP-Hard, we should probably show a reduction from some NP-Hard problem. Any suggestions? We show that VC ≤p DS. N Construction: Given input G = ⟨V, E⟩ and k ∈ , T first goes over the vertices and counts all the vertices that are not connected to any edge. Denote this number by f . Next, T outputs the pair ⟨G′ , k ′ ⟩ where G′ = ⟨V ′ , E ′ ⟩ is obtained from G as follows. For every edge e ∈ E, we add a new vertex ve , and define V ′ = V ∪ {ve }e∈E . As for the edges, for every edge e = {u, v} ∈ E we add two edges: {v, ve } and {u, ve }. We thus have: E ′ = E ∪ {{v, ve }, {u, ve } : e = {u, v} ∈ E} Finally, we define k ′ = k + f . Runtime: Finding isolated vertices takes O(|E|) at most, and constructing the new graph involves adding |E| vertices, and connecting them with new 2|E| edges. Even if we have to construct an edge matrix, this is polynomial (indeed, quadratic) in |V | + |E|. Finally, computing k + f is clearly polynomial, so the reduction is polynomial. Correctness: For the first direction, assume G has a vertex cover C of size k. Let F be the set of isolated vertices in V , we claim that F ∪ C is a dominating set of size at most k ′ = k + f in V ′ . Indeed, the size of F ∪ C is at most k + f . Let v ∈ V ′ . If v ∈ F then v ∈ F ∪ C. If v ∈ V \ F , then since C is a vertex cover, either v ∈ C or v is in an edge {u, v} such that u ∈ C. Thus, C ∪ F is in distance 1 from V . If v ∈ V ′ \ V , then v = ve for some edge {x, y} ∈ E. Since C is a vertex cover, then w.l.o.g x ∈ C, since there is an edge {x, ve } by our construction, then v is in distance 1 from C. We conclude that C is a dominating set in G′ of size at most k ′ , so ⟨G′ , k ′ ⟩ ∈ DS. Conversely, assume G′ has a dominating set Ds of size at most k ′ . First, observe that every isolated vertex in V (which is also isolated in V ′ ) must be in Ds. Let D be the set Ds without the isolated vertices. Thus, |D| ≤ k ′ − f ≤ k. Next, we claim that w.l.o.g all the vertices in D are from V (and not V ′ \ V ). Indeed, assume D contains a vertex ve for some e = {x, y} ∈ E. then ve is connected with a single edge only to x and y. Thus, replacing ve with x may touch only more vertices (since x, y, and ve are still in distance 1 from D). Now that we assume D ⊆ V , we claim that D is a vertex cover in G. Indeed, consider an edge {x, y} ∈ E, then either x or y are in D, since they are the only vertices that are in distance 1 from ve , and D is a dominating set. We conclude that G has a vertex cover of size at most k, and we are done. 3 CNF and 3-CNF satisfiability (if time permits) A CNF (conjunctive normal form) formula is a conjunction of disjunction of literals (”AND of ORs”). For example, ϕ = (x1 ∨ x2 ) ∧ (x4 ∨ x2 ∨ x3 ∨ x5 ∨ x1 ) ∧ (x5 ) is in CNF. Each disjunction, such as (x1 ∨ x2 ), is called a clause. A 3-CNF formula is a CNF formula in which every clause has exactly 3 literals. For example, ϕ above is CNF but not in 3-CNF. The following is 3-CNF: ϕ′ = (x1 ∨ x2 ∨ x3 ) ∧ (x4 ∨ x2 ∨ ∨x1 ) ∧ (x5 ∨ x4 ∨ x4 ) A formula is called satisfiable if there is an assignment to its variables such that the formula evaluates to true. The satisfiability problem is to determine whether a formula is satisfiable. Formally, define the languages: CNF-SAT = {⟨ϕ⟩ : ϕ is a satisfiable CNF formla} 3-SAT = {⟨ϕ⟩ : ϕ is a satisfiable 3-CNF formla} 3 These languages are NP-Complete. They are in NP, because given an assignment (witness), a TM can verify in polynomial time that the formula is indeed satisfied. You will see in class that CNF-SAT is NP-Hard. Since every 3-CNF formula is CNF, there is a very simple reduction from 3-SAT to CNF-SAT (what does it need to do?). We now show a polynomial reduction in the other direction. Proposition 3.1. We have CNF-SAT ≤p 3-SAT. Proof. We show how to translate CNF to 3-CNF in such a way that preserves satisfiability. Clauses smaller than 3 can be padded (for example, (x1 ∨ x2 ) is equivalent to (x1 ∨ x2 ∨ x2 ). A larger clause can be replaced with multiple clauses of size 3 that are constructed to be equivalent to it, as shown below. Construction: Given input ⟨ϕ⟩, where ϕ is a CNF formula, the reduction constructs a 3-CNF formula ϕ′ and returns ⟨ϕ ⟩. The formula ϕ′ is built as follows. For every clause ct in ϕ: ′ 1. If ct contains exactly 3 literals, add it to ϕ′ . 2. Otherwise, if ct contains fewer than 3 literals: (a) If ct contains exactly 1 literal, ct = (l1 ): add the clause (l1 ∨ l1 ∨ l1 ) to ϕ′ . (b) If ct contains exactly 2 literals, ct = (l1 ∨ l2 ): add the clause (l1 ∨ l2 ∨ l2 ) to ϕ′ . t and add the following 3. If ct contains more than 3 literals, ct = (l1 ∨ l2 ∨ . . . ∨ lk ), introduce new variables y1t , . . . , yk−1 ′ clauses to ϕ : t t t t ∨ yk−1 ) ∧ (lk ∨ yk−1 ∨ yk−1 ) (l1 ∨ y1t ∨ y1t ) ∧ (l2 ∨ y1t ∨ y2t ) ∧ (l3 ∨ y2t ∨ y3t ) ∧ · · · ∧ (lk−1 ∨ yk−2 Runtime: For each clause, the reduction writes a polynomial number of clauses of size 3 in the output. Therefore the reduction runs in time polynomial in ⟨ϕ⟩. Correctness: First, assume that ⟨ϕ⟩ ∈ CNF-SAT. Denote the variables of ϕ by x1 , . . . , xn , and let a be a satisfying assignment for ϕ. We prove that ⟨ϕ′ ⟩ ∈ 3-SAT by building a satisfying assignment b for it. First, let b agree with a on the variables x1 , . . . , xn . Now, consider a clause ct = (l1 ∨ . . . ∨ lk ) in ϕ. Since ct is satisfied by a, there exists some i such that li is a literal t satisfied by a. The assignment b then assigns the value TRUE to the variables y1t , . . . , yi−1 , and FALSE to the variables t t yi , . . . , yk−1 . We note that this assignment satisfies all of the clauses that our construction derived from ct . Hence, the assignment b satisfies ϕ′ . For the other direction, assume that ⟨ϕ′ ⟩ is satisfiable by some assignment b. Let a be the restriction of b to the variables of ϕ. We claim that a satisfies ϕ, and therefore ⟨ϕ⟩ ∈ CNF-SAT. Consider a clause ct = (l1 ∨ . . . ∨ lk ) of ϕ. Suppose by contradiction that a does not satisfy ct . Hence, none of the literals l1 , . . . , lk are satisfied by b. However, b does satisfy the corresponding clauses in ϕ′ : t t t t (l1 ∨ y1t ∨ y1t ) ∧ (l2 ∨ y1t ∨ y2t ) ∧ (l3 ∨ y2t ∨ y3t ) ∧ · · · ∧ (lk−1 ∨ yk−2 ∨ yk−1 ) ∧ (lk ∨ yk−1 ∨ yk−1 ) Since l1 , . . . , lk are not satisfied by b, this implies that b must satisfy the clauses t t t t (y1t ∨ y1t ) ∧ (y1t ∨ y2t ) ∧ (y2t ∨ y3t ) ∧ · · · ∧ (yk−2 ∨ yk−1 ) ∧ (yk−1 ∨ yk−1 ) t (why?). Observe that b must assign TRUE to y1t and FALSE to yk−1 . Thus, there exists i such that b assigns TRUE to t t t yi and FALSE to yi+1 (why?). Consequently, b does not satisfy the clause (yit ∨ yi+1 ), resulting in contradiction. Hence, a satisfies ϕ. 4 Computability - Recitation 11 January 9, 2023 1 Completeness in co-NP Because of the asymmetric nature of NTMs, if a language L is NP, it is not clear whether L ∈ NP. From the perspective of verifiers, we know that if L ∈ NP, then we can “convince” that x ∈ L by providing a witness, but we don’t necessarily know how to convince that x ∈ / L (think of SAT for example. We don’t know of a short way to convince that a Boolean formula is not satisfiable). This is the reason we define the class co-NP of languages whose complement is in NP. It is generally believed that NP ̸= co-NP, but proving that would prove that P ̸= NP, as appears in the exercise. Similar to NP, we can defined co-NP-hardness and completeness. Definition 1.1. A language L is co-NP-hard if for every K ∈ co-NP it holds that K ≤p L. It is co-NP-complete if also L ∈ co-NP. Claim 1.2. L is NP-hard iff L is co-NP-hard. Corollary 1.3. L is NP-complete iff L is co-NP-complete. Proof. On one direction, if L is NP-hard, then for every K ∈ co-NP, it holds that K ≤p L. The same reduction shows that also K ≤p L, thus L is co-NP-hard. The other direction is similar. Example. A boolean formula φ is called a tautology if every assignment of true/false values to variables yields a true value. It is called a contradiction if every assignment yields a false value. We claim that the languages CONTRADICTION = {⟨φ⟩) : φ is a contradiction} TAUTOLOGY = {⟨φ⟩) : φ is a tautology} are co-NP-complete. First, they are polynomially reducible to each other by f (⟨φ⟩) = ¬(φ) (the same reduction works in both directions). Second, we know that SAT is NP-complete. Therefore, SAT = CONTRADICTION is co-NP-complete 1 . Remark. We can also define (NP ∩ co-NP)-completeness, however, it is not known whether such a language exists. In Figure 1, the 3 possibilities for the relationship between P, NP and co-NP are depicted. 1 Strictly speaking, this is not an equality, since SAT also contains all strings that are not a well-formatted encoding of a formula. This, however, is not a big issue, and there are several ways to fix this (think how!). 1 Option 1: If P = NP Option 2: If P ≠ NP, but coNP = NP Option 3: If coNP ≠ NP NPC = coNPC P = NP = coNP ≈ NPC = coNPC NP = coNP P NPC coNPC NP coNP P Figure 1: The 3 possibilities for the relationship between P, NP and co-NP. Make sure you understand all the relationships (consult the exercise, for the reason that NPC is disjoint from coNP in Option 3). The reason for the ≈ notation in Option 1 is that NPC does not contain ∅ and Σ∗ . Note that it is not known whether P = NP ∩ co-NP or not. 2 Problems involving Hamiltonian paths Recall that a Hamiltonian Path in a graph G is a path that passes through each of G’s vertices exactly once. Similarly, a Hamiltonian Cycle is a cycle that passes through each vertex exactly once. We define six languages related to the notion of Hamiltonian paths and cycles. These are: • D-ST-HAMPATH = {⟨G, s, t⟩ : G is a directed graph that has a Hamiltonian path from s to t} • D-HAMPATH = {⟨G⟩ : G is a directed graph that has a Hamiltonian path} • D-HAMCYCLE = {⟨G⟩ : G is a directed graph that has a Hamiltonian cycle} • The languages U-ST-HAMPATH, U-HAMPATH and U-HAMCYCLE, defined analogously for undirected graphs. It turns out that all six of these languages are NP-complete. In this recitation, we will prove that D-ST-HAMPATH is NP-complete. In the exercise, you will use this fact in order to show that a few other of these languages are NP-complete as well. To show that D-ST-HAMPATH is in NP, it is enough to show that there exists a polynomial-time verifier for it. This is easy: a verifier gets as input ⟨G, s, t⟩, and a sequence of n vertices (where n is the number of vertices in G). It then checks that the sequence is a Hamiltonian path in G. Clearly this can be done in polynomial time. We now show that D-ST-HAMPATH is NP-hard, by showing that 3-SAT ≤p D-ST-HAMPATH. The notes are taken from M. Sipser’s “Theory of Computation”. Sipser calls the D-ST-HAMPATH language by the name ”HAMPATH”. 2 We will now use the fact that D-ST-HAMPATH is NP-hard to prove that U-ST-HAMPATH is NP-hard as well. Theorem 2.1. U-ST-HAMPATH is NP-complete. Proof. We need to show two things. First, that U-ST-HAMPATH ∈ NP, and second, that it is NP-hard. To show that U-ST-HAMPATH is in NP, it is enough to show that there exists a polynomial-time verifier for it. This is easy: a verifier gets as input ⟨G, s, t⟩, and a sequence of n vertices (where n is the number of vertices in G). It then checks that the sequence is a Hamiltonian path in G. Clearly this can be done in polynomial time. We now show that U-ST-HAMPATH is NP-hard, by showing that D-ST-HAMPATH ≤p U-ST-HAMPATH Construction: Let ⟨G, s, t⟩ be an input for D-ST-HAMPATH, with G = ⟨V, E⟩. The reduction T constructs the input ⟨G′ , sin , tout ⟩ to U-ST-HAMPATH, where G′ is defined as follows. 1. For every vertex v ∈ V , T introduces the vertices vin , vmid and vout , and the edges {vin , vmid } and {vmid , vout }. 2. For every edge (u, v) ∈ E we define the edge {uout , vin }. Thus, G′ = ⟨V ′ , E ′ ⟩ where V ′ = {vin , vout , vmid : v ∈ V } and E ′ = {{vin , vmid }, {vmid , vout } : v ∈ V }∪{{uout , vin } : (u, v) ∈ E}. Runtime: Clearly the reduction is polynomial, since the size of G′ is 3 times the size of G, and computing it is straightforward. Correctness: For the easy direction, assume ⟨G, s, t⟩ ∈ D-ST-HAMPATH. Then, let s, u1 , u2 , ..., uk , t be a directed Hamiltonian path in G. The path induces the following path in G′ : sin , smid , sout , u1in , u1mid , u1out , ..., ukin , ukmid , ukout , tin , tmid , tout Since the original path contained all the vertices, so does the induced path. Thus, ⟨G′ , sin , tout ⟩ ∈ U-ST-HAMPATH. For the hard direction, assume that ⟨G′ , sin , tout ⟩ ∈ U-ST-HAMPATH. Thus, there is a Hamiltonian path in G′ , but we do not know what this path “looks like”. We proceed with the following claim. A Hamiltonian path that starts at sin and ends at tout does not contain a directed traversal of the form (vin , uout ). That is, we never go “backward” on edges. The proof of this claim is by contradiction. Assume that there is such a directed traversal, and let (vin , uout ) be the first one in the path. Since the path is Hamiltonian, we must visit vmid at some point. If we already visited it, then we must have visited it from vout (since we are just now visiting vin ). But how did we reach vout ? it must have been from some xin (since these are the only possible edges left). This is a contradiction to the minimality of (vin , uout ). Thus, we must visit vmid after uout , but that means we reach it from vout , at which point we get stuck, and in particular we cannot end in tout . We conclude that the edges in the path are of the forms (vin , vmid ), (vmid , vout ) and (vout , uin ). Thus, the Hamiltonian path is of the form sin , smid , sout , u1in , u1mid , u1out , ..., ukin , ukmid , ukout , tin , tmid , tout which can be easily mapped to a Hamiltonian path in G. Remark: We used the word “we” in this proof a lot. Usually, this is legitimate writing, even in mathematical texts. However, if you are not careful, it can lead to trouble (from ambiguity to computing uncomputable functions). So use it wisely. 8 Computability - Recitation 12 January 17, 2023 1 Savitch’s Theorem Recall that we have seen the following containments: NP ⊆ PSPACE, and PSPACE ⊆ EXPTIME. Similarly, NPSPACE ⊆ NEXPTIME and clearly PSPACE ⊆ NPSPACE. Similarly to the famous NP v.s. P question, we wonder what is the relation between PSPACE and NPSPACE? Quite surprisingly, in the space-case, we actually have a definite answer, which is obtained via Savitch’s theorem. Savitch’s theorem (due to Walter Savitch, 1970) is one of the earliest results on space complexity, and one of the few conclusive results we have in complexity theory, and is stated as follows. Theorem 1.1 (Savitch). For every function f : N → N such that f (n) = Ω(log n), it holds that NSPACE(f (n)) ⊆ SPACE(f 2 (n)) We will actually prove the theorem under the condition f (n) ≥ n. This is only to slightly simplify things. Next week we will define logarithmic space complexity, and see that that the proof works for f (n) = Ω(log n). Another simplifying assumption we make, is that f (n) is constructible in O(f (n)) space. That is, there is a TM M that takes as input 1n and halts with f (n) on the tape, while using at most O(f (n)) tape cells. This assumption actually loses generality, but there are ways to overcome this problem (see Sipser for details). → be a function, and let N be an NTM deciding a language A in NSPACE(f (n)). We construct a TM Let f : M that decides A in SPACE(f 2 (n)). Before we start the proof, let’s understand the problem. Given a word w of length n, every run of N on w terminates after using at most f (n) cells. From configuration-counting arguments, every run of N on w runs for at most 2O(f (n)) steps. To simulate this naively with a deterministic TM, we need to encode runs, or at least encode the “address” of a run – the nondeterministic choices that need to be made in order to trace the run. Since the depth of the run tree may be up to 2O(f (n)) , so is the length of an address in the run tree. Thus, a naive simulation may take up to 2O(f (n)) space, which is too much. Instead, we use a different approach. Assume N has a single accepting configuration on w. Then, M accepts w iff there exists a “path” in the “configuration graph” from the initial configuration c0 , to the accepting one cacc , where this path is of length t. Thus, there must exist a configuration cm such that the run reaches cm from c0 in exactly 2t steps, and reaches cacc from cm in exactly 2t steps. We can now recursively solve these smaller problems. What do we need to write on the tape? At every level of the recursion, we need to write two configurations, and t. A configuration is of length O(f (n)), and 1 ≤ t ≤ 2O(f (n)) , so we can encode t in O(f (n)) cells. Thus, at every level we need O(f (n)) cells. Furthermore, at every level, t is half what it was in the level before. Since initially t = 2O(f (n)) , we need only O(f (n)) levels. We conclude that we need O(f 2 (n)) space. N N N N → be a function, and let N be an NTM deciding a language A in NSPACE(f (n)). We construct a Proof. Let f : TM M that decides A in SPACE(f 2 (n)). We start by describing the procedure can-yield(c1 , c2 , t), that takes in two configurations c1 , c2 and a number t (in binary), and outputs whether N can get from configuration c1 to configuration c2 in at most t steps. The procedure works as follows: can-yield(c1 , c2 , t) 1. If t = 1, then test directly whether c1 = c2 or whether c1 yields c2 in one step according to the (nondeterministic) rules of N . Accept if either test succeeds; reject if both fail. 1 2. If t > 1, then for every configuration cm of N on w using space f (n): (a) Run can-yield(c1 , cm , 2t ). (b) Run can-yield(cm , c2 , 2t ). (c) If both accept, then accept. 3. Reject. Assume w.l.o.g that N has a single accepting configuration on every word w. We do not lose generality, since every machine N has such an equivalent machine (that also works in NSPACE(f (n))) that is obtained by changing the accepting state to a state that erases all the tape before accepting. Let d ∈ such that N has at most 2d·f (n) configurations. We now construct M as follows. First, M computes f (n) within O(f (n)) space and writes the result. Then, M simulates can-yield(c0 , caccept , 2d·f (n) ), where c0 is the initial configuration of N on w. In order to simulate can-yield, M keeps track of every level of the recursion by holding c1 , c2 , t and whether the first call had accepted (if it is a simulation of 2.b in the algorithm). In total, we need 3O(f (n)) + 1 = O(f (n)) space at every level. The recursion depth is log t = O(f (n)), and thus M uses at most O(f 2 (n)) space. Finally, the procedure can-yield clearly determines whether c2 is reachable from c1 in t steps, and thus our machine decides the correct language. N Now assume f is a polynomial. Then, f 2 (n) is also a polynomial. This implies the following very important corollary. Corollary 1.2. PSPACE = NPSPACE. Also, since PSPACE is a deterministic class, it is closed under complementation (prove it!), so we have that NPSPACE = PSPACE = co–PSPACE = co–NPSPACE 2 PSPACE and TQBF We show that a language called TQBF is PSPACE-complete. 2.1 TQBF in PSPACE For this section, we identify 0 with False and 1 with True. A quantified Boolean formula is a boolean formula preceded by ∃ and ∀ quantifiers on the variables. A fully quantified formula is one such that every variable is under the scope of a quantifier. A fully quantified formula is always either true of false. Example: The formula ∀x∃y((x ∨ y) ∧ (x ∨ y)) is true, since if x = 0 we can take y = 1, and if x = 1 we can take y = 0. On the other hand, the formula ∃y∀x((x ∨ y) ∧ (x ∨ y)) is false. Since for both y = 0 and for y = 1 we can find a x (namely x = y ) such that one of the conjuncts does not hold. We define the language TQBF = {hϕi : ϕ is a true fully quantified Boolean formula.} This language is sometimes known as QSAT (quantified SAT). We want to show that TQBF ∈ PSPACE. We devise the following recursive algorithm (TM) T . T : On input hϕi, a fully quantified Boolean formula: 1. If ϕ contains no quantifiers, then it is an expression with only constants, so evaluate ϕ and accept if it is true; otherwise, reject. 2. If ϕ equals ∃xψ, recursively call T on ψ, first with 0 substituted for x and then with 1 substituted for x. If either result is accept, then accept; otherwise, reject. 2 3. If ϕ equals ∀xψ, recursively call T on ψ, first with 0 substituted for x and then with 1 substituted for x. If both result is accept, then accept; otherwise, reject. Algorithm T obviously (well, you can prove by induction) decides TQBF. To analyze its space complexity we observe that the depth of the recursion is at most the number of variables m. Indeed, we make (at most) two recursive calls for every quantifier. At each level in the recursion we need to store the formula with the replaced variables, so the total space used is O(n). At the bottom of the recursion, we need to evaluate the formula, which takes another O(n) space, where n is the length of the formula. Therefore, T runs in space O(m · n) = O(n2 ) (since m = O(n)). 2.2 Encoding configurations using Boolean formulas Towards proving that TQBF is PSPACE-complete, we need to give some background on how to encode configurations of a Turing-machine using Boolean formulas. This section might be a bit cumbersome, but after that we will have great tools in our hands! Let M be a TM, and let s be a tape size. We encode a configuration that uses at most s tape cells using Boolean variables as follows: • For each i ∈ [s] and a ∈ Γ, we have a variable xi,a , such that xi,a = 1 iff a is written in the i-th cell. • For each i ∈ [s], we have variable yi , such that yi = 1 iff the head of the machine is over the i-th cell. • For each q ∈ Q, we have a variable zq such that zq = 1 iff the configuration in state q. We denote the tuple of all those variables by c. We claim the following “encoding” theorem. Theorem 2.1. 1. There exists a Boolean formula ϕvalid (c), that evaluates to 1 iff c is a valid encoding of a configuration. 2. There exists a Boolean formula ϕ(c1 , c2 ), that evaluates to 1 iff c2 is a consecutive configuration to c1 . Moreover, both formulas can be calculated in time polynomial in s. Proof. 1. We define ϕvalid (c) = ^ _ i∈[s] a∈Γ xi,a ∧ ^ b∈Γ\{a} xi,b ∧ _ i∈[s] yi ∧ ^ j∈[s]\{i} yj ∧ _ q∈Q zq ∧ ^ r∈Q\{q} and one should note that the length of the formula is O(s2 ) (Q and Γ are both constants). 2. For each i ∈ [s], a ∈ Γ, q ∈ Q, if δ(q, a) = (r, b, R) then we define 2 ∧ zr2 ∧ ψi,a,q (c1 , c2 ) = (x1i,a ∧ yi1 ∧ zq1 ) → x2i,b ∧ yi+1 ^ ^ j∈[n]\{i} d∈Γ zr x1j,d ↔ x2j,d and if δ(q, a) = (r, b, L), we do the same thing with i − 1. We remark several exceptions in the definition of ψi,a,q : (a) If the head is on the first place, and the move is L, then the head stays. (b) If the head is on the s-th place, and the move is R, then there is no valid consecutive configuration. (c) If the configuration is accepting, then we require that the configuration stays the same. Finally, we define ϕ(c1 , c2 ) = ϕvalid (c1 ) ∧ ϕvalid (c2 ) ∧ ^ ^ ^ i∈[s] a∈Γ q∈Q and one should note that the length of the formula is O(s2 ). 3 ψi,a,q (c1 , c2 ) Remark. 2.3 A similar construction also works for an NTM. TQBF is PSPACE-hard So now we know that TQBF is in PSPACE. In fact, TQBF is PSPACE-complete, but we have not defined yet what it means to be PSPACE-complete. Well, PSPACE-hardness is defined with respect to polynomial time reductions. The reason is that we want to study the relation between P and PSPACE, so any reduction stronger than P (such as PSPACEreductions), will not give us any information regarding problems in P, since we can solve the problems mid-reduction. This is a delicate point, and we suggest you go back to it after you feel comfortable with the rest of the material. Definition 2.2. A language K is PSPACE-hard if for every L ∈ PSPACE it holds that L ≤p K. Theorem 2.3. TQBF is PSPACE-hard. Proof. We show that L ≤p TQBF for every L ∈ PSPACE. Let L ∈ PSPACE, and let M be a TM that decides L using at most s(n) space, where s is some polynomial. Assume w.l.o.g that M has a single accepting configuration on every word w. We do not lose generality, since every machine M has such an equivalent machine (that is also in PSPACE) that is obtained by changing the accepting state to a state that erases all the tape before accepting. Given a word w, our reduction will output a formula η, such that η has value true iff M accepts w. How this is done? Well, using the tools that we developed in the last section. Let n = |w|, s = s(n), c0 the start configuration, and cacc the accepting configuration. A very naı̈ve attempt would be to define ! t−1 ^ ϕ(ci , ci+1 ) . ∃c1 , c2 , . . . , ct (c1 = c0 ) ∧ (ct = cacc ) ∧ i=1 However, the run time t can be exponential in s (thus, in n), and this would yield a formula of exponential length. We can use recursion to solve this issue. M accepts a word w iff there is a path of length at most t from c0 to cacc , where t is the runtime of M , and is exponential in s(n). This happens iff there is a path from c0 to some cm of length 2t , and a path from cm to cacc of length t 2 (we may assume w.l.o.g. that t is a power of two). First attempt: The first attempt at a reduction is a wrong one, but is in the right direction. We construct, inductively, a formula ϕt (c0 , cacc ) which states that cacc is reachable from c0 within at most t steps. More generally, we construct the formula ϕk (c1 , c2 ) which states that c2 is reachable from c1 within at most k steps. The formula is constructed as follows. First, if k = 1, then ϕ1 (c1 , c2 ) is simply the formula ϕ(c1 , c2 ) that we constructed earlier, stating that c2 is consecutive to c1 . Then, we define: ϕk (c1 , c2 ) = ∃cm ϕk/2 (c1 , cm ) ∧ ϕk/2 (cm , c2 ) Brilliant, is it not? The formula simply asks if there exists a configuration that functions as the middle configuration of the run. Clearly η = ϕt (c0 , cacc ) is true iff M accept w. Also, since t is single-exponential in s(n) (that is, t = 2O(s(n)) ), then the recursion depth in constructing the formula is polynomial. So what’s wrong? Well, while the recursion depth is polynomial, the construction tree of the formula has degree 2, which means that it’s a binary tree of polynomial depth, so it has an exponential number of nodes. That is, the formula is still too big. How can we overcome this? Correct Solution: Here comes a clever trick, which is the crux of the proof. Instead of asking whether there is a cm that works with c1 and with c2 , we combine these two cases into one, as follows: ϕk (c1 , c2 ) = ∃cm ∀c3 , c4 (((c3 = c1 ) ∧ (c4 = cm )) ∨ ((c3 = cm ) ∧ (c4 = c2 ))) → ϕk/2 (c3 , c4 ) Now the degree of the tree is 1, and it is still of polynomial depth, so the length of ϕ is polynomial, and we are done. 4 Remark. Note that we required that a TQBF would have all its quantifiers in the beginning (this is sometimes called prenex normal form). However, our construction in not in such a form, because of the condition on c3 and c4 . This, however, is not a concern, because we can push the quantifiers out using the rules α → ∃xβ ≡ ∃x(α → β) and α → ∀xβ ≡ ∀x(α → β). 5 Computability - Recitation 13 21/1/23 - 28/1/23 1 The Time Hierarchy Theorem Let’s review this great theorem. N N → , where t(n) = Ω(n log n), is called time-constructible, if the function that maps Definition 1.1. A function t : 1n (n in unary form) to the binary representation of t(n) is computable in time O(t(n)). Our goal is to prove the following theorem: N N Theorem 1.2. Let t : → be a time-constructible function, then there exists a language L that is decidable in O(t(n)) time, but not decidable in o(t(n)/ log t(n)) time. The proof consists of two parts. The first is to show that we can perform a simulation of a given TM “efficiently enough”. Then, we use a standard diagonalization argument to prove the theorem. Both parts are not easy to comprehend. Let’s start with the simulation. Claim 1.3. There exists a TM S that, given hM, t, wi as input, S computes the configuration of M when ran on w for t steps, and it does that in (t log t) · p(|hM i|), where p is some fixed polynomial. Let’s understand the theorem first, before going over the proof. Should we be surprised by this theorem? At first glance - not really. We need to simulate M on w for t steps. Simulating one step shouldn’t be really expensive, but it does depend on the length of hM i, since we need to scan the encoding. This is the p(|hM i|) part. Second, we need to keep a counter, and update it every step. This is the log t part. Finally, we do it for t steps, which is the t factor. If we had 3 tapes, we would be done. Just keep hM i on one tape, as well as the current state. Keep the counter on a second tape, and the simulation tape on a third. But we only have one tape. So we need much more careful “accounting”. The problem with simply keeping all this information consecutively, is that if the simulation tape is really long, then in every iteration we need to go very far to find the encoding of M and the counter. We solve this by increasing the alphabet, so that every letter represents 3 letters. So in a way, we now have 3 tapes. But remember we only have 1 head! So it’s essentially 3 tapes, but all the heads act in sync. So all we need to do is perform the simulation on the third tape, but every time the head moves. we move the encoding of M and the counter to stay “close by”. So in every simulation step, we really do only need to scan the encoding of hM i to figure out where to move, and to decrease the counter. Which is what we wanted. Good, so now we want to prove Theorem 1.2. We start by defining the language we want to use. This a very ad-hoc language. ) ( k ′ k M does not accept hM i#0 within t (n, m) steps, L = hM i#0 t(n) where n = |hM i#0k |, m = |hM i| and t′ (n, m) = p(m)·log t(n) You can already sense the diagonalization here - we want machines that reject themselves. This always leads to trouble... We start by showing that L can be decided in time O(t(n)). We define a Turing machine T . On input x, T acts as follows: 1. Parse x as hM i#0k . If the parsing fails, reject. 2. Compute the binary representation of n = |x| and m = |hM i|. 1 3. Compute t = t(n). 4. Compute t′ = t′ (n, m) = t/(p(m) · log t). 5. Simulate M on x for t′ steps. 6. If the M accepted x, then reject. Otherwise, accept. It is clear that T decides L. It remain to analyze its running time. We do it for each step separately. Step 1 takes O(n) time. Step 2 takes O(n log n) time. Step 3 takes O(t(n)) time. Step 4 takes time polynomial in log(t(n)) (note that m ≤ n ≤ t(n)). As for step 5, takes (t′ log t′ ) · p(m) to simulate. It holds that (t′ log t′ ) · p(m) ≤ (t′ log t) · p(m) = t(n) In total, since t(n) = Ω(n log n), we got a total time of O(t(n)) as wanted. Now for the real point of the proof - assume by way of contradiction that L is decidable in o(t(n)/ log t(n)) time. Let M be a machine that decides L in r(n) = o(t(n)/ log t(n)) time. Let n be large enough such that n ≥ m + 1 and t(n) 1 k r(n) < p(m) log t(n) , where m = | hM i |. There exists such n since M is now fixed. Let k ≥ 0 such that |hM i#0 | = n, and consider the behavior of M on hM i#0k . If M accepts hM i#0k within r(n) steps, then hM i#0k ∈ / L, so M should not accept it - contradiction. If M rejects hM i#0k within r(n) steps, then hM i#0k ∈ L, but this is again a contradiction, and we are done! Corollary 1.4. For any two real number 1 ≤ ǫ1 < ǫ2 , we have TIME(nǫ1 ) ( TIME(nǫ2 ). Corollary 1.5. P ( EXPTIME. Proof. For every k, it holds that nk = O(2n ), so TIME(nk ) ⊆ TIME(2n ), and therefore P ⊆ TIME(2n ). By Theorem 1.2, 2 we know that TIME(2n ) ( TIME(2n ) ⊆ EXPTIME. 2 The Space Hierarchy Theorem Since we understand space better than time, the analogous hierarchy theorem is cleaner, and provides a tighter bound. N N Definition 2.1. A function s : → , where s(n) = Ω(log n), is called space-constructible, if the function that maps 1n (n in unary form) to the binary representation of s(n) is computable in space O(s(n)). N N Theorem 2.2. Let s : → be a space-constructible function, then there exists a language L that is decidable in O(s(n)) space, but not decidable in o(s(n)) space. In the exercise you are asked to prove this theorem. To help you in your task, we now discuss space-efficient simulation. Simulation: As before, we need to carefully discuss simulation complexity, this time with respect to space complexity. Recall that we consider a two-tapes model, in which we have a read-only input tape and a read and write work tape. Now, suppose we have hM, w, s, ti as input. We would like to efficiently simulate the run of M on w for t steps, using no more than s tape cells. The simulator allocates the work space for the following purposes: 1. Current state of the simulated machine. Bounded by |hM i|. 2. s simulated tape cells, each is bounded by |hM i|, so at most s · |hM i| space in total. 3. A pointer for keeping track of the current position of the input tape head (remember that we cannot modify the input tape, and copying hwi onto the work tape requires too much space). This requires log |w| space. 4. A counter for t. Requires log t space. So in total we have O(log |w| + s · |hM i| + log t) space. We omit the details on how the simulator works, but you should figure out for yourself. To summarize: 2 Claim 2.3. There exists a TM S that, given hM, w, s, ti as input, • If the run of M on w for t steps does not use more than s tape cells, then S computes the configuration of M when ran on w for t steps. • Otherwise, S outputs “fail”. Moreover, this is done in space O(log |w| + s · |hM i| + log t). Remark. Even though we assumed that the input for S is written on the input tape, it can also be the case that some of the inputs are written on the work tape, and then the claim states how much additional space we need. 3 SCC is NL-Complete We define SCC := {hGi : G is a strongly connected directed graph}. We prove that SCC is NL-complete. We recall that we have seen in class that P AT H is NL-complete. Let’s show first that SCC ∈ N L. There are two ways: • From Immerman’s theorem (N L = coN L), it is sufficient to show SCC ∈ N L. Nondeterministically choose a pair of vertices s, t ∈ V , and run a machine that solves P AT H on hG, s, ti. • Alternatively, iterate over all u, v ∈ V . Check whether hG, u, vi ∈ P AT H. If not, reject. If there is a path, proceed to the next pair. If we succeed in guessing a path for every pair, accept. Next, we want to show that SCC is NL-Hard. We do this by reducing P AT H ≤L SCC. Let (G, s, t) be an instance of P AT H problem. We construct graph G′ by adding edges so that s has an incoming edge from every other vertex and that t has an outgoing edge to every other vertex. We claim that G′ ∈ SCC ⇐⇒ (G, s, t) ∈ P AT H. Assume there is a path π from s to t in G and consider any two vertices u, v ∈ G′ . Then there is a path u → s → t → v. Conversely, assume that there is no path between s and t in G. But this also means they are not connected in G′ , as all added edges were coming into s and going out of t, hence G′ is not strongly connected. Complexity: to add edges, our reduction loops over all pairs of nodes and so uses only logarithmic space to determine what pair of nodes we are currently working with. 4 2SCC is NL-Complete We define 2SCC = {hGi : G is a directed graph with exactly two strongly connected components}. We prove that 2SCC is NL-complete. We first show that 2SCC ∈ N L. A TM that decides 2SCC in using at most logarithmic space in the work tape works as follows: • Nondeterministically choose two vertices u, v ∈ V , they will be used as representatives of the two components. Run a machine that checks hG, u, vi ∈ P AT H. If it accepts, proceed. Otherwise reject. • It is left to verify that there are at most two components. For every w ∈ V , check whether there is a path from w to u and from u to w. If so, continue to the next w. Otherwise, check the same for v. If both rejected, reject (this means w may belong to a third component). To show that 2SCC is NL-Hard we show that SCC ≤L 2SCC. The reduction works as follows: On input hGi return hG′ i, where G′ is constructed from G by adding a new isolated vertex. This way, if hGi is strongly connected, hG′ i has exactly two strongly connected components. Otherwise, if hGi is not strongly connected, it has k ≥ 2 strongly connected components, which means that hG′ i has k + 1 ≥ 3 strongly connected components and is therefore not in 2SCC. The reduction requires only logarithmic space in the work tape, since hGi can be copied to the output tape one vertex/edge at a time, and there is only on extra vertex to add. 3