Uploaded by gerina7233

אוגדן תרגולים

advertisement
Computability 2022 - Recitation 1
23/10/22 - 29/10/22
1
Set Theory
You already know items 1-5. These are just quick reminders.
1. A set is a collection of elements. The empty set is denoted by ∅. For every element x we write x ∈ A
if x is a member of A. It is always true that either x ∈ A or x ∈
/ A, and exactly one of them hold.
2. Set operations and relations:
• Union: The union of two sets is the set containing all the elements from each set. Formally,
A ∪ B = {x : x ∈ A or x ∈ B}
• Intersection: The intersection between two sets is the set containing all the elements that are in
both sets. Formally,
A ∩ B = {x : x ∈ A and x ∈ B}
• Complementation: The complement of a set A (with respect to a set C) is the set of all the
elements not in A.
• Difference: The difference between sets A, B is the set containing all the elements in A that are
not in B. Formally,
A \ B = {x : x ∈ A and x ∈
/ B}
Exercise: Prove that A \ B = A ∩ B.
• Containment: We say that A ⊆ B if every element of A is an element of B. Formally, if
∀x, x ∈ A =⇒ x ∈ B
• Equality: We say that A = B iff A ⊆ B and B ⊆ A. This is important!!!
3. Power set: Let A be a set, the power set of A, denoted P (A) or 2A is defined as
2A = {C : C ⊆ A}
Example:
If A = {a, b}, then 2A = {∅, {a}, {b}, {a, b}}
4. Cartesian product: Let A, B be two sets. The Cartesian product A × B is the set of ordered pairs from
A and B.
A × B = {(a, b) : a ∈ A, b ∈ B}
Example:
{1, 2} × {a, b, c} = {(1, a), (2, a), (1, b), (2, b), (1, c), (2, c)}
1
We allow “longer” Cartesian products, where the elements are sequences. That is:
A1 × A2 × ... × An = {(a1 , ..., an ) : ∀i, ai ∈ Ai }
The product A × A × ... × A is denoted An .
{z
}
|
n
5. A relation between sets S, T is a subset R ⊆ S × T . We consider mostly relations between a set A and
itself. Example: ≤⊆ × .
N N
A relation R ⊂ A × A is called:
• Reflexive if ∀a ∈ A (a, a) ∈ R.
• Symmetric if ∀a, b ∈ A (a, b) ∈ R =⇒ (b, a) ∈ R
• Transitive if ∀a, b, c ∈ A if (a, b) ∈ R and (b, c) ∈ R then (a, c) ∈ R.
We sometime denote aRb or R(a, b) if the relation holds.
Example:
Let A = {1, 2, 3, 4}, Consider the relation
R = {(a, b) ∈ A × A : |a − b| ≤ 1}
• Is (3, 2) ∈ A? Yes.
• Is R reflexive? Yes, because |a − a| = 0 ≤ 1.
• Is R symmetric? Yes, because |a − b| = |b − a|.
• Is R transitive? No, (1, 2) ∈ R and (2, 3) ∈ R, but (1, 3) ∈
/ R.
A relation that is reflexive, symmetric and transitive is called an equivalence relation. For every element
a ∈ A we can look at the equivalence class of a, [a]R = {b ∈ A : (a, b) ∈ R}. For equivalence
relations,
S
these sets form a partition of A. That is, either [a]R = [b]R or [a]R ∩ [b]R = ∅, and a∈A [a] = A
Example: Let G = hV, Ei be an undirected graph, and let ∼⊆ V × V be the following relation: for
every u, v ∈ V we have that u ∼ v iff there is a path in G from u to v.
It is easy to see that ∼ is an equivalence relation, and its equivalence classes are exactly the connected
components of G.
The converse also holds: Any partition of a set defines an equivalence relation on the set.
Proof: left as an exercise.
6. Cardinalities: The cardinality of a set A, denoted |A|, is a measure of how many elements there are in
a set. For finite sets, |A| is the number of elements in the set. For infinite sets, the situation is slightly
more complicated. For two infinite sets A, B, we say that |A| = |B| iff there exists a bijection between
them. We denote | | = ℵ0 . The following claims hold:
Z
Q
N
• | | = ℵ0 . This can be seen by counting the integers as 0, 1, −1, 2, −2, 3, −3, ...
• | | = ℵ0 .
• |[0, 1]| = 2ℵ0 = |2N |.
Sets that have cardinality ℵ0 are called countable.
We say that |A| ≤ |B| if there exists an injection from A to B. We say that |A| < |B| if |A| ≤ |B|, and
there does not exist a surjection from A to B.
The claim that ℵ0 6= 2ℵ0 is non-trivial, and you will see it later.
2
2
Languages
2.1
Definitions
Let Σ be a finite (non-empty) set, which we will call alphabet. The elements of Σ are called letters.
Example:
Σ = {a, b}
We now consider Σn for n ∈
Example:
N. For n = 0, we define Σ0 = {ǫ}, where ǫ is an “empty sequence”.
Σ3 = {(a, a, a), (a, a, b), (a, b, a), ..., (b, b, b)}.
Define
Σ⋆ =
∞
[
Σn
n=0
⋆
The elements of Σ are sequences of any finite length of letters from Σ, and we refer to those as words. ǫ is
called the empty word. For convenience, we don’t write words as (a, b, b, b, a), but rather as abbba.
We can concatenate words: w1 · w2 = w1 w2 . Observe that since Σ⋆ has words of any finite length, then
it is closed under concatenation.
Example:
abb · bab = abbbab
A formal language over the alphabet Σ is a set L ⊆ Σ⋆ . That is, a set of words.
Example:
• L1 = {ǫ, a, aa, b}. A finite language.
• L2 = {w : w starts with an ’a’}. An infinite language.
• L3 = {ǫ}.
• L4 = ∅. Is this the same as L3 ?
• L5 = {w : |w| < 24}. |w| is the number of letters in w. Finite language.
⋆
It is sometimes convenient to view languages as elements in 2Σ . This is a bit confusing at first, so let’s
recap:
• Languages are sets of words.
• Languages are subsets of Σ⋆ .
⋆
• Languages are elements of 2Σ .
What operations can we do on languages? First, any operation on sets. In particular, observe that complementation is done with respect to Σ⋆ . We can also concatenate languages:
L1 · L2 = {w1 · w2 : w1 ∈ L1 , w2 ∈ L2 }
3
Example:
Consider the languages L1 = {w : w begins with an ’a’} and L2 = {w : w ends with a ’b’}.
• What is L1 ∩ L2 ? All the words that begin with a and end with b.
• What is L1 ∪ L2 ? All the words that begin with a or end with b.
• What is L1 · L2 ? Same as L1 ∩ L2 .
• What is L2 · L1 ? All the words that contain ba.
Example:
Let’s try something harder. Consider the language
L = {ww : w ∈ Σ⋆ }
What is the complement of L? (denoted L)
Answer:
L = {x = x1 · · · xn : n is odd, or n is even and x1 · · · xn/2 6= xn/2+1 · · · xn }
What is L · L?
Answer:
L · L = {wwxx : w ∈ Σ∗ , x ∈ Σ∗ }
2.2
Counting stuff
Assume Σ 6= ∅. How many words are there in Σ⋆ ? ℵ0 .
Proof: We can enumerate the words first by length and then in lexicographical order (after ordering Σ
arbitrarily).
⋆
How many languages are there? |2Σ | = |2N | = 2ℵ0 . Since 2ℵ0 > ℵ0 , this means that there are more
∗
languages over Σ than words over Σ. More formally, there is no surjection from Σ∗ to 2Σ .
2.3
Are all languages regular?
Well, are they? The answer is no. The reason is that there are ℵ0 regular languages, but 2ℵ0 languages.
Indeed, a DFA can be characterized by a finite binary string (representing its states, transitions, etc.), and
since every regular language has a corresponding automaton, there are at most ℵ0 such languages.
But this argument is not constructive. That is, it says that there are non-regular languages, but it
doesn’t give an example. Soon, we will learn how to prove that specific languages are non-regular! his is an
important point. Much of what we do in this course is explore the limitations of computational models.
3
The function δ ∗
Consider the DFA A = hQ, Σ, δ, q0 , F i depicted in Figure 1. Suppose that a computation in A is currently in
the state q and the letter read is σ, what will be the next state? This question is answered by δ : Q × Σ → Q.
The next state will be δ(q, σ). For example, we see in the figure that δ(q1 , a) = q2 .
One unfortunate property of δ is that it only allows us to ask what will happen after reading one
letter. What if we’d like to ask about reading, say, three letters? To answer such questions, We can define
δ ∗ : Q × Σ∗ → Q as follows:
(
q,
if w = ǫ
∗
δ (q, w) =
∗
′
′
′
∗
δ(δ (q, w ), σ), if w = w · σ where w ∈ Σ and σ ∈ Σ
4
Let’s try this new definition out. In A, what would happen if a computation is at state q1 and the letters
b and then a are read successively?
δ ∗ (q1 , ba) = δ(δ ∗ (q1 , b), a) = δ(δ(δ ∗ (q1 , ǫ), b), a) = δ(δ(q1 , b), a) = δ(q0 , a) = q1
Important: Given Q and Σ, not every function Q × Σ∗ → Q is the δ ∗ function of some DFA. Thus, we
may not define a DFA to just have some δ ∗ function that we came up with. Rather, we must define its δ
function and then derive δ ∗ from it.
a, b
b
q0
a
a
q1
q2
b
Figure 1: The DFA A
4
Formally proving what the language of an automaton is
(If time permits)
In class, you have seen several DFAs, and mentioned what their language is. How can we be sure, however,
that the language we think they recognize is actually the language they recognize? “It’s obvious”, as you
know, is a dangerous expression in mathematics. When we say that something is obvious, we mean that if we
are asked to, we can very easily provide the proof. In this section, we will see how to prove the correctness
of a language conjecture. This is a rather painful procedure, and we will do it once here, and once in
the exercise. After that, we will allow you to explain less formally why an automaton accepts a language.
However, whenever you claim that a certain language is recognized by an automaton, always make sure you
could prove it formally at gun point.
Let Σ = {0, ..., 9, #}, and consider the language
∗
L = x#a : x ∈ {0, ..., 9} , a ∈ {0, ..., 9} , a appears in x .
For example, we have 644270#5 ∈
/ L, and 1165319#3 ∈ L.
We want to construct a DFA A for the language, and prove its correctness formally. We start with an
intuition as to how we recognize the language. This will also assist us in the correctness proof. While reading
a word w, A keeps track of the numbers that were read so far, but it does not keep track of their order. Thus,
there is a state corresponding to every subset of 0, ..., 9. When # is read, the state records it by moving
to a new state, while still keeping track of the subset of numbers read. Then, when a digit a is read, the
current state can tell us whether a was already seen. If so, we move to an accepting state, and otherwise to
a rejecting sink. If additional letters are read, we move to a rejecting sink.
It is quite hard to draw this DFA, as it is huge. Thus, we now describe formally the DFA A =
hΣ, Q, δ, q0 , F i as follows. The state space is Q = (2{0,...,9} × {1, 2}) ∪ {qacc , qsink } with F = {qacc } and
q0 = h∅, 1i. Finally, we define the transition function as follows: For a set C ∈ 2{0,...,9} , for a letter σ ∈ Σ,
and for i ∈ {1, 2} we have

hC, 2i
i=1∧σ =#



hC ∪ σ, 1i i = 1 ∧ σ ∈ {0, ..., 9}
δ(hC, ii, σ) =

qacc
i=2∧σ ∈C



qsink
i=2∧σ ∈
/C
Note that the last case also covers the case where we read an extra #.
5
We also define δ(qsink , σ) = δ(qacc , σ) = qsink for every σ ∈ Σ.
Now that we have a fully defined DFA, we are ready to prove correctness.
Proposition 4.1. L(A)=L.
Proof. We want to prove formally that this is indeed the language of the described automaton. We do this
by giving a very detailed analysis of the behavior of the DFA. Specifically, we make the following claim:
∗
After reading a word w ∈ {0, ..., 9} , the DFA A reaches a state hC, 1i where C ∈ 2{0,...,9} such that the
letters that appear in w are exactly C. Formally:
∗
Claim 4.2. For a word w ∈ {0, ..., 9} , define S(w) = {σ ∈ {0, ..., 9} : σ appears in w}. Then δ ∗ (q0 , w) =
hS(w), 1i.
Proof (of Claim 4.2). We prove the claim by induction over |w| = n.
The base case is w = ǫ, in which case we have S(w) = ∅, and indeed δ ∗ (q0 , ǫ) = q0 = h∅, 1i.
∗
We assume correctness for n, and prove for n + 1. Consider a word w = x · σ ∈ {0, ..., 9} , where |x| = n.
∗
By the induction hypothesis, it holds that δ (q0 , x) = hS(x), 1i. Observe that S(x · σ) = S(x) ∪ {σ}. Now,
we have:
δ ∗ (q0 , x · σ) = δ(δ ∗ (q0 , x), σ) = δ(hS(x), 1i, σ) = hS(x) ∪ {σ} , 1i = hS(w), 1i
Where the equalities follow from the definition of the transition function and of δ ∗ .
We now proceed with the proof of the main proposition: Consider a word of the form w#, where
w ∈ {0, ..., 9}. From the claim above, we get that δ ∗ (q0 , w) = hS(w), 1i. By the definition of the transition
function, we now have δ ∗ (q0 , w#) = hS(w), 2i.
We are now ready to complete the proof. Let w ∈ Σ∗ . First, if w ∈ L, then w is of the form x#σ where
∗
σ appears in x and x ∈ {0, ..., 9} . By the second claim above, we have
δ ∗ (q0 , w) = δ(δ ∗ (q0 , x#), σ) = δ(hS(x), 2i, σ) = qacc
So L ⊆ L(A). To prove the converse containment, we split into cases.
• If w does not contain a #, then by the claim above, the run of A on w does not end in the state qacc ,
so w ∈
/ L(A), and indeed w ∈
/ L.
∗
• If w = x# where x ∈ {0, ..., 9} , then again, by the second claim above, the run of A on w does not
end in the state qacc , so w ∈
/ L(A).
∗
• If w can be written as w = x#y where x ∈ {0, ..., 9} and |y| > 1, then by the claims above, we have
δ ∗ (q0 , w) = δ ∗ (hS(x), 2i, y)
Observe that since |y| > 1, then after reading the first two letters, we end up in qsink (either by going
straight to qsink , or by passing through qacc ). In any case, the continuation of the run stays in qsink ,
so w ∈
/ L(A) (we remark that if you want to be extra formal, you can prove by induction that no run
leaves qsink , but this is really trivial).
∗
• Finally, if w = x#σ where x ∈ {0, ..., 9} and σ ∈
/ S(x), then by the claims above, we have
δ ∗ (q0 , w) = δ(hS(x), 2i, σ) = qsink
so w ∈
/ L(A).
This completes the proof of the claim.
6
Computability - Recitation 2
30/10/22 - 05/11/22
1
1.1
NFAs with -transitions
Removing -transitions
We want to show that for every -NFA A there exists an equivalent NFA (without -transitions). Furthermore,
we will explicitly show how to get rid of -transitions efficiently.
Let A = hQ, Σ, δ, Q0 , F i be an -NFA. The intuitive idea is as follows. Consider a state q ∈ Q. If there
is an -transition from q to q 0 , then whenever we reach q, we can also reach q 0 . However, there may be
-transitions from q 0 , so we need to include them as well. For every q ∈ Q, let E(q) ∈ 2Q be the set
E(q) = {q 0 ∈ Q | q 0 is reachable from q using only -transitions} .
Note that, in particular, q always belongs to E(q), since a state is reachable from itself by a path of length
0. This path vacantly consist only of -transitions.
Given the sets E(q) for every q ∈ Q, we define the NFA B:
B = hQ, Σ, η, ∪q∈Q0 E(q), F i,
S
where η(q, σ) = s∈δ(q,σ) E(s) for every q ∈ Q and σ ∈ Σ.
In the exercise, you will prove that this construction is correct, namely, that L(B) = L(A). Furthermore,
we claim that B can be computed in polynomial time given a description of A. Indeed, consider a directed
graph with vertex set Q and with an edge from q to s iff A has an -transition from q to s. Computing E(q)
can be done in polynomial time by running DFS from q on this graph. Hence, all of the sets E(q) can be
computed in polynomial time, and so can B.
2
2.1
Closure properties using NFAs
REG is Closed Under Union
You have seen in class that REG is closed under union. We will now show a much simpler way of getting
the same result using NFAs.
Let L1 , L2 ⊆ Σ∗ , then
L1 ∪ L2 = {x : x ∈ L1 or x ∈ L2 }
We want to show that if L1 , L2 ∈ REG, then there exists an NFA that recognizes L1 ∪ L2 . Let A =
hQ, Σ, q0 , δ, F i and B = hS, Σ, s0 , η, Gi be DFAs such that L(A) = L1 and L(B) = L2 . We can assume
w.l.o.g. that S ∩ Q = ∅.
We define a new NFA C = hQ ∪ S, Σ, {q0 , s0 } , α, F ∪ Gi where α is defined as follows:
(
{δ(q, σ)} if q ∈ Q
α(q, σ) =
{η(q, σ)} if q ∈ S
1
In order to show that L1 ∪L2 = L(C), we will show two way inclusions. First, we show that L1 ∪L2 ⊆ L(C):
let x = σ1 · σ2 · · · σm ∈ L1 ∪ L2 s.t. ∀i ∈ [m], σi ∈ Σ; that is, x is a word of length m in L1 ∪ L2 . So either
x ∈ L1 or x ∈ L2 . w.l.o.g x ∈ L1 (the case x ∈ L2 in identical). Hence the run of A on x is accepting; that
is, there exist a sequence of states r = r0 , r1 , . . . , rm ∈ Q with: r0 = q0 , rm ∈ F , and for all 0 ≤ i < m, it
holds that δ(ri , σi+1 ) = ri+1 . Now by the definition of C the same run r is also a run of C. Indeed, for all
0 ≤ i < m, it holds that ri+1 ∈ α(ri , σi+1 ). Also, note that as r0 = q0 ∈ {q0 , s0 } and rm ∈ F ⊆ F ∪ G, we
conclude that r is an accepting run of C on x and so x ∈ L(C).
Now, we will show that L(C) ⊆ L1 ∪ L2 : let y = σ1 · σ2 · · · σm ∈ L(C) of length m. So there is an accepting
run r of C on y: there are r0 ∈ {q0 , s0 } , r1 , . . . , rm ∈ Q ∪ S with rm ∈ F ∪ G and ri+1 ∈ α(ri , σi+1 ), for
all 0 ≤ i < m. By the definition of C, either rm ∈ F or rm ∈ G. w.l.o.g, rm ∈ F . By the definition of α
and the fact that Q ∩ S = ∅, the only states from which the transition function α can move from rm−1 to
rm is if rm−1 ∈ Q (because α only has the same transitions as A, B, and Q ∩ S = ∅). We continue this way
by induction, and conclude that r0 = q0 and that all the states and transitions in the run r exist in A. So
the same run of δ(q0 , σ1 ) = r1 , . . . , δ(rm−1 , σm ) = rm ∈ F , is an accepting run of A on y which means that
y ∈ L1 .
2.2
REG is Closed Under Concatenation
We have already seen that REG is closed under complementation, union, and intersection (as well as some
more complex operators). We will now consider the closure of REG under concatenation. Let L1 , L2 ⊆ Σ∗ ,
then
L1 · L2 = {xy : x ∈ L1 , y ∈ L2 }
We want to show that if L1 , L2 ∈ REG, then there exists an NFA that recognizes L1 · L2 . Since NFAs read
words (rather than a concatenation of two words xy), it is easier to look at L1 · L2 as
L1 · L2 = {w = σ1 · · · σn : ∃1 ≤ k ≤ n, σ1 · · · σk ∈ L1 , ∧, σk+1 · · · σn ∈ L2 }
Let A = hQ, Σ, q0 , δ, F i and B = hS, Σ, s0 , η, Gi be DFAs such that L(A) = L1 and L(B) = L2 . We may
assume that Q ∩ S = ∅. We define the NFA C = hQ ∪ S, Σ, {q0 } , α, Gi where α is defined as follows: For
q ∈ Q and σ ∈ Σ, set α(q, σ) = {δ(q, σ)}. For s ∈ S and σ ∈ Σ, set α(s, σ) = {η(s, σ)}. In addition for
every q ∈ F , we have α(q, ) = {s0 } and for every q ∈ (Q ∪ S) \ F , we have α(q, ) = ∅. We claim that
L(C) = L1 · L2 . As you have seen, a language recognized by an NFA (even with -transitions) is regular, so
this would imply that L1 · L2 ∈REG.
Note: we used -transitions here. Try to apply -removal to this construction, and see what the resulting
construction is.
We first claim that L1 · L2 ⊆ L(C). Let w ∈ L1 · L2 . Then w = x · y for some x ∈ L1 and y ∈ L2 .
Denote the run of A on x by a0 , a1 , . . . , a|x| and the run of B on y by b0 , b1 , . . . , b|y| . Note that a0 = q0 and
b0 = s0 . Also, since A and B accept x and y, respectively, it holds that a|x| ∈ F and b|y| ∈ G. We claim
that the concatenation a0 , a1 , . . . , a|x| b0 , b1 , . . . , b|y| is an accepting run of C on w. This straightforward to
verify: The transition from a|x| to b0 is an -transition, while the other transitions are induced by δ and η.
We now claim that L(C) ⊆ L1 · L2 . Let w ∈ L(C) and let t0 , . . . , tm be an accepting run of C on w.
Note that t0 = q0 and tm ∈ G ⊆ S. Also, note that it is impossible to go from S to Q in C. Thus, there
is an index k such that {t1 , . . . , tk } ⊆ Q and {tk+1 , . . . , tm } ⊆ S. The only way to go from Q to S in C is
by an -transition from F to S0 (and this is the only -transition in this run, so |w| = m − 1). Therefore,
tk ∈ F and tk+1 = s0 . We conclude that, taking x = w1 . . . wk and y = wk+1 . . . wm−1 the run t0 , . . . , tk is
an accepting run of A on x and the run tk+1 , . . . , tm is an accepting run of B on y. Thus, w = x · y with
x ∈ L1 and y ∈ L2 so w ∈ L1 · L2 proving the claim.
2.3
REG is Closed Under the Kleene star
Let L ⊆ Σ∗ . Recall the definition of the Kleene star of L from the lecture. We now prove that if L is
recognizable by an NFA, then so is L∗ .
2
Proof. Let A = hΣ, Q, δ, q0 , F i be a DFA s.t. L(A) = L. To prove that L∗ ∈ REG, we build an NFA A0
s.t. L(A0 ) = L∗ . Define A0 = hΣ, Q ∪ {qstart } , δ 0 , {qstart } , {qstart }i, where qstart is a new state, where for each
q ∈ Q ∪ {qstart } and σ ∈ Σ we define
(
{δ(q, σ)} if q ∈ Q
δ 0 (q, σ) =
,
∅
if q = qstart
and we define


∅
0
δ (q, ) = {qstart }


{q0 }
if q ∈ Q \ F
.
if q ∈ F
if q = qstart
We claim that L(A0 ) = L∗ . The proof is left as an exercise for the reader.
3
Recap
(If time permits) We defined the class REG as the class of languages recognized by DFAs. You also have
seen in class and today that regular languages can be defined by NFAs and -NFAs.
The plan next is to define regular languages by means of regular expressions, regexes for short. So DFAs,
NFAs and regexes are equivalent in the sense that they capture the class of regular languages:
Figure 1: Equivalence of different models
As we’ve seen, it is easy to prove that regular languages are closed under concatenation using NFAs, and
it is not clear how to prove it using DFAs - in general choosing the model carefully can make proofs easier.
As another example, you have shown easily that REG is closed under complementation using DFAs, yet the
same proof does not work for NFAs. We sum up several models and properties and how hard it is to prove
the property using the model:
Model \P roperty
L1
L1 ∩ L2
L1 ∪ L2
L1 · L2
L∗1
DFA
Easy
Easy: product
Easy: product
Hard
Hard
NFA
Hard
Easy: How? Also product!
Easy: seen today
Easy: seen today
Easy: seen today
Regex (later)
Hard
Hard
Easy
Easy
Easy
3
Computability - Recitation 3
6.11.22 - 12.11.22
1
Regular Expressions
By now we are fairly comfortable with regular languages. We constructed automata for them, proved some
theorems and closure properties about them, etc. As you probably imagine, regular languages are also
often used in practice, and are not just a theoretical model. But how can a computer work with a regular
language? Or an even simpler question - how do we describe regular languages? Sure, sometimes we can do
it in English: “all the words that end with 0”. But we cannot count on computers to be able to parse such
complex expressions, and know that they represent regular languages. One simple solution that we have,
by now, is to describe the language as a DFA or NFA. These are simple structures that can be parsed and
simulated by a computer. Are they a good way to represent a language?
Well, clearly they’re good for checking, given a word, whether it’s in the language. Indeed, it is quite
easy to check if an automaton accepts a word (even for an NFA). So in that aspect - yes, they are nice
models. However, they are sometimes difficult to read and write for humans. Also, some automata for fairly
simple properties may be huge.
As you have seen in the lecture, there is a nice, compact, formulation of regular languages which we can
use. This formulation is called regular expressions. A quick reminder of what you have already seen. We
say that t is a regular expression over the alphabet Σ if t is one of the following.
• ∅
• ϵ
• a∈Σ
• r ∪ s, r · s, or r∗ , where r and s are regular expressions.
Another way to represent this, for Σ = {a, b}, for example, is to write:
r := ∅ | ϵ | a | b | r ∪ r | r · r | r∗
This type of definition is called a grammar, and you have seen similar things in the definition of formulas in
Logic 1.
Example:
Consider the expression (a ∪ b)∗ · bb · (a ∪ b)∗ over Σ = {a, b}. This expression represents the
language of all the words that contain the substring bb.
Another example: 0 · 0∗ · (1∗ ∪ 2∗ ). This is the language of words that start with at least one 0, followed by
either a sequence of 1’s or 2’s.
Regexes can be very complicated: (((a ∪ b) · c∗ )∗ · a) ∪ (a · b∗ ), etc.
You also have seen that the semantics of regular expressions are defined by induction on the structure of
the regular expression. Specifically, for a regular expression r, the language L(r) defined as follows.
• L(∅) = ∅.
• L(ϵ) = {ϵ}.
• For a ∈ Σ, we have that L(a) = {a}.
1
• L(r ∪ s) = L(r) ∪ L(s).
• L(r · s) = L(r) · L(s).
• L(r∗ ) = L(r)∗ .
The reason these are called regular expressions, is the following theorem:
Theorem 1.1 L ∈ REG iff there exists a regular expression r such that L(r) = L.
The easy direction in the proof, is showing that every regular expression has an equivalent automaton, so
every language defined by a regular expression, is in REG.
Lemma 1.2 For every regular expression r, there exists an NFA Ar such that L(r) = L(Ar )
Proof:
The proof is by induction over the structure of r.
• If r = ∅, we let Ar be the NFA that accepts the empty language.
• If r = ϵ, we let Ar be the NFA that accepts {ϵ}.
• If r = a ∈ Σ, we let Ar be the NFA that accepts {a}.
• If r = s ∪ t, we let Ar be the NFA that accepts L(As ) ∪ L(At ). The union construction seen in class
works, but a simpler construction nondeterministically chooses which automaton (of As , At ) to start
in (example on blackboard).
• If r = s · t, you have seen in class how to construct an automaton for concatenation, so we simply let
Ar be the automaton that accepts the concatenation of L(As ) and L(At ).
• If r = s∗ , we need to show that we can construct an NFA for L(As )∗ from As . This is given in the
exercise.
Example 1.1
For example, let’s try this on the regex (a ∪ b) · b (on the blackboard).
The second direction in the proof of Theorem 1.1 is a bit harder. We need to prove that every regular
language can be defined by a regular expression.
Lemma 1.3 For every DFA A there exists a regular expression r such that L(A) = L(r).
Proof: The formal proof of this claim is to give an algorithm that converts an NFA into a regular expression.
Since this algorithm is tedious, we will explain it with an example.
The algorithm utilizes a new type of automaton called a generalized NFA (GNFA, for short). A GNFA
is like an NFA, with the exception that the labels on the edges are not letters in Σ, but rather regular
expressions over Σ. For example, see the GNFA in Figure 1.
For simplicity, we will require the following conditions from a GNFA:
• There is a single initial state, with only outgoing edges.
• There is a single accepting state, with only incoming edges.
• The initial state and the accepting state are distinct.
2
Figure 1: GNFA Example.
We leave it as an exercise to understand why we can always make sure our GNFA satisfies these requirements,
but the example should demonstrate it.
The idea behind the algorithm is the following. We start with a DFA A for our language. We translate it
to an equivalent GNFA, with two more states. We then start removing states from the GNFA until we end
up with 2 states, at which point the edge between them will be labeled with the regex r that is equivalent
to A.
The clever part is understanding how to remove states, which we demonstrate with the example in
Figure 2, taken from M.Sipser’s “Introduction to the Theory of Computation”.
Figure 2: DFA to regex.
3
2
The Pumping Lemma
You have seen at least two time that there are non-regular languages. In Recitation 1, you have seen that
there are ℵ0 regular languages, but ℵ languages. But this argument is not constructive. That is, it says that
there are non-regular languages, but it doesn’t give an example. Then, you have seen in the lecture a concrete
example of a non-regular language. Specifically, you have seen that the language L = {0n 1n : n ∈ } is
not regular using a pumping argument. The pumping argument you have seen is a specific case of a more
general claim - the pumping lemma.
N
Lemma 2.1 (The Pumping Lemma) Let L ∈ REG, then there exists a constant p > 0 such that for
every w ∈ L such that |w| > p, there exist x, y, z ∈ Σ∗ such that w = xyz and such that:
1. |y| > 0.
2. |xy| ≤ p
3. ∀i ∈
N ∪ {0}, xyiz ∈ L.
What can we do with the pumping lemma? Can we prove that a language is regular? No! We can only
use it to show that a language is not regular.
Example 2.1
2
Let L1 = {1n : n ≥ 0}. Is L1 regular? No. Let’s prove with the pumping lemma. Assume by way of
2
contradiction that L1 is regular, and let p be a pumping constant for it. Consider the word w = 1p . By the
j
k
l
pumping lemma, we can write w = xyz such that |xy| ≤ p. We can write x = 1 , y = 1 , z = 1 such that
k > 0, j + k ≤ p and j + k + l = p2 . If we pump with i = 2 we get
xy 2 z = 1j 1k 1k 1l = 1p
2
+k
However,
p2 < p2 + k ≤ p2 + p < p2 + 2p + 1 = (p + 1)2
The first inequality follows from the fact that k > 0, and the second inequality follows from the fact that
j + k ≤ p. We conclude that p2 + k is not a perfect square and thus xy 2 z is not in L1 , contradicting the
pumping lemma. So L1 ∈
/ REG.
Important remark: We can use the pumping lemma with a proof by way of contradiction in order to
prove that a language L is not regular. The converse does not hold: there are non-regular languages that
satisfy all the conditions of the pumping lemma. So use with caution!
Example 2.2
Let Σ = {0, 1}. Consider the following language:
L2 = {w ∈ Σ∗ : #0 (w) = #1 (w)}
Is L1 regular? We will now prove that it is not.
4
A proof using the pumping lemma Assume by way of contradiction that L2 is regular, and let p be a
pumping constant for it. Consider the word w = 0p 1p ∈ L1 . Clearly, |w| > p, so we can write w = xyz such
that:
• |y| > 0.
• |xy| ≤ p
• ∀i ∈
N ∪ {0}, xyiz ∈ L2.
If we pump y with some i > 1, we get that the word xy i z ∈ L2 . Now since |xy| ≤ p, we have that xy
consists only of 0s, thus we can write x = 0j , y = 0k , z = 0l 1p where j, l ≥ 0, k > 0, and j + k + l = p.
Therefore, xy i z = 0j 0ik 0l 1p = 0p+(i−1)k 1p . As i > 0 and k > 0, this means that #0 (w) > #1 (w), which is a
contradiction, and so L2 ∈
/ REG.
N
An alternative proof (Optional) Consider the language L = {w ∈ {0, 1}∗ : w = 0n 1m , n, m ∈ }.
Clearly, L is regular as it is the language of the regex r = 0∗ 1∗ (make sure you understand why!). Assume
by contradiction that L2 is regular. It holds that L ∩ L2 = {0n 1n : n ∈ }, thus {0n 1n : n ∈ } is regular,
and we have reached a contradiction.
N
5
N
Computability - Recitation 4
1
The Pumping Lemma - Cont.
Sometimes, we need to prove or disprove the regularity of a family of languages. We show an example for
such case, and we utilize the pumping lemma for our proof.
Reminder.
N → N| ∀c ∈ N ∃N ∈ N ∀n ≥ N, cn ≤ g(n)}
g(n)
= ∞}
= {g : N → N| lim
n→∞ n
ω(n) = {g :
N N
N
Lemma 1.1. Let f : → be a monotonically-increasing function such that f (n) = ω(n).
Then, for all N, k ∈ , there exists n > N such that f (n + 1) − f (n) > k.
N
Proof. If not, there exists N, k ∈ such that for all n > N it holds that f (n + 1) − f (n) ≤ k. Note that it
means that the differences are bounded (since there are only finite number of differences until N , and from
this point they are all bounded by k). So there exists a bound M ∈ such that for all n ∈ it holds that
f (n + 1) − f (n) ≤ M . Then, f (2) ≤ f (1) + M , f (3) ≤ f (2) + M ≤ f (1) + 2M , ..., f (n) ≤ f (1) + (n − 1) · M .
(n)
f (1)
Now, divide both sides by (n − 1) and get fn−1
≤ M + n−1
which contradicts limn→∞ f (n)/n = ∞.
N
N
N → N be a monotonically-increasing function such that f (n) = ω(n). Then the
N is not regular.
Proposition 1.1.
Let f :
language Lf = af (n) : n ∈
Proof. Let f (n) = ω(n). We will show that Lf does not satisfy the conditions of the pumping lemma.
Assume by way of contradiction that the pumping lemma holds, and let p > 0 be the pumping constant.
We apply the lemma for N = k = p (the pumping factor), and get that there exists n > p such that
f (n + 1) > f (n) + p. Since f is monotonically-increasing, it holds that f (n) ≥ n > p. Consider the
word af (n) , then by the pumping lemma there exist words x, y, z such that af (n) = xyz, and it holds that
0 < |y| ≤ |xy| ≤ p, and for every i ∈ ∪ {0} we have that xy i z ∈ Lf . Let m = |y|, then for i = 2 we have
that af (n)+m = xy 2 z ∈ Lf . However f (n) < f (n) + m ≤ f (n) + p < f (n + 1), so (f (n) + m) is not in the
image of f , which is a contradiction.
N
Remark. This lemma can not be used in the current exercise (Exercise 3). This is because you need to
practice using the pumping lemma, and deducing that a certain language is not regular immediately from this
lemma will not teach you anything...
2
2.1
The Myhill-Nerode Theorem
Myhill-Nerode classes
Let Σ be a finite alpabet, and let L ⊆ Σ∗ be a language. In class, we defined the Myhill-Nerode equivalence
relation ∼L ⊆ Σ∗ × Σ∗ as follows.
∀x, y ∈ Σ∗ , x ∼L y iff ∀z ∈ Σ∗ , x · z ∈ L ⇐⇒ y · z ∈ L
1
That is, x and y are equivalent if there is no separating suffix. For every w ∈ Σ∗ ,we defined [w] = {x : w ∼L x}
to be the equivalence classes of w. In class, you proved the following result.
Theorem 2.1. Let L ⊆ Σ∗ , then L ∈ REG iff ∼L has finitely many equivalence classes.
How do we use this theorem? Unlike the pumping lemma, this theorem gives us a complete characterization of regular languages, so we can use it either to prove that a language is regular, or to prove that a
language is not regular.
Example:
Consider the language L = {ak : k is not a power of 2}. Is this language regular? Let’s try
n
m
to see how many equivalence classes it has. For every n ̸= m ∈ , consider the words a2 and a2 . W.l.o.g
assume that n < m. Now we have that 2n + 2n = 2n+1 , but 2m + 2n = 2n (2m−n + 1) which has an odd
n
n
m
n
n
factor, so it is not a power of 2. Thus, we have a2 · a2 ∈
/ L, but a2 · a2 ∈ L, so a2 is a separating
n
m
suffix, so a2 and a2 are in different equivalence classes. Since this is true for all n, m ∈ , then there are
infinitely many equivalence classes, so the language is not regular.
N
N
Now, let’s look at the “positive” direction.
Example:
Consider the language L = {w ∈ {a, b}∗ : w ends with an a}. We calculate the equivalence
classes. Let u, v ∈ {a, b}∗ . If u, v both end with an a, for every suffix x we have u · x ∈ L iff x = ϵ or x ends
with a, iff v · x ∈ L. So u ∼L v. Similarly, if u, v both do not end with an a (i.e end with b or are ϵ), then for
every suffix x we have that u · x ∈ L iff x ends with a, iff v · x ∈ L. So u ∼L v. Finally, if one ends with an
a and one does not, then we can separate them with ϵ. We covered all the words, so we can conclude that
there are exactly 2 equivalence classes:
{u : u ends with a} and {u : u does not end with a}
By Theorem 2.1, we get that L ∈ REG. Can we think of a DFA for L? Yes, see Figure 1.
Figure 1: A DFA for L = {w ∈ {a, b}∗ : w ends with an a}
In the latter example, we have seen that the number of states in the smallest DFA we could think of was
exactly the number of Myhill-Nerode equivalence classes. As you can see in the proof of the Myhill-Nerode
theorem, this is not surprising, and the size of the minimal DFA for a language is the number of equivalence
classes.
This brings us to the following question - given a DFA, can we minimize it? That is, do we have a
procedure that allows us to take a DFA and output a new DFA that is equivalent, but has a minimal number
of states? Fortunately, we do.
You will see it thoroughly in the exercise.
2
3
Recap Questions
3.1
Pumping Quiz Question
Let A = ⟨Q, {0, 1}, q0 , δ, F ⟩ be a DFA with |Q| = r states. Assume that 0r 1r ∈ L(A). Which of the following
is necessarily correct:
1. L(0∗ 1∗ ) ⊆ L(A):
not correct. Consider a 2-state DFA for the language of even number of 0’s.
2. L(A) ⊊ L(0∗ 1∗ ):
not correct. Consider the counter example in item 1.
3. 1. is not necessarily correct, yet for every i ≥ 1, it holds that 0ir 1ir ∈ L(A):
not correct. Consider the DFA in Figure 2 for i = 2, r = 3.
Figure 2: A counter example
4. 1. is not necessarily correct, yet there exists k ≥ 1 such that for every i ≥ 1, it holds that
0r+ik 1r+k ∈ L(A):
correct. Assume that w = 0r 1r is accepted by A, and let s = s0 , s1 , ..., s2r be the accepting run of
A on w. As |Q| = r, we get that the run s0 , s1 , ..., sr visits some state twice. Similarly, the run
sr , sr+1 , ..., s2r also visits some state twice. Meaning that we have two cycles in the run s. The first
cycle is relevant to transitions labeled with 0, and the second is relevant to transitions labeled with
1. Let k1 and k2 be the lengths of the first and second cycle, respectively. Then, we take k = k1 · k2 .
Note that pumping the first cycle i · k2 additional times, and the second cycle i · k1 additional times,
corresponds to an accepting run of A over the word 0r+ik 1r+ik .
3
3.2
Myhill-Nerode Quiz Question
Explanation on the board!
4
3.3
Myhill-Nerode Exam Question
Draw a minimal DFA A for the language L = {w ∈ {a, b}∗ : (#a (w) is even) ∨ (w ends with a)}, and prove
that A is minimal.
Solution: the following is a minimal DFA for the language
Figure 3: A minimal DFA for L = {w ∈ {a, b}∗ : (#a (w) is even) ∨ (w ends with a)}
To show minimality, we show that there are at least 3 Myhill-Nerode equivalent classes (why is this
sufficient?). Specifically, we show that the words ϵ, a, ab are pairwise nonequivalent.
• z = ϵ separates between ϵ and ab: ϵ ∈ L and ab ∈
/ L.
• z = b separates between ϵ and a: b ∈ L and ab ∈
/ L.
• z = ϵ separates between a and ab: a ∈ L and ab ∈
/ L.
5
Computability - Recitation 5
20/11/22 - 27/11/22
1
Context-Free Grammars (CFGs)
So far we discussed the class of regular languages. We have seen that regular languages can be characterized
(equivalently) by DFAs, NFAs, and regular expressions. We have also seen an algebraic characterization of
regular languages by means of the Myhill-Nerode equivalence classes.
Our study showed that while regular languages are nice and simple to define and to reason about, they
are not very expressive. For example, the language {an bn : n ∈ } is not regular.
It is therefore reasonable that we want a formalism that can capture more languages. One way to obtain
such a model, is to come up with some augmentation to NFAs. For example by adding memory. We will
take that approach later on.
For now, we introduce a new type of language-formalism, which is generative. In generative models,
we show how to generate words in the language, rather than how to check whether a word belongs to the
language.
Let’s start with an example, before we define things formally.
N
Example 1.1:
The following is a context-free grammar:
A→
0A1 | B
B→
#
How do we interpret this? We start with the vatiable A, and then we use one of the two derivation rules
that are available from A. We can either convert A to 0A1, or to B. The letters 0, 1 are called terminals,
and they are the alphabet of the language.
This kind of model allows us to generate words in the language, by repeatedly applying rules. For
example, we can generate the word 000#111 as follows:
A =⇒ 0A1 =⇒ 00A11 =⇒ 000A111 =⇒ 000B111 =⇒ 000#111
Can you see which rule was applied in each derivation?
One can also think of the generation of a word as a parse tree.
We now turn to define context-free grammars formally.
Definition 1.1. A context-free grammar is a tuple G = hV, Σ, R, Si where:
• V is a finite set of variables.
• Σ is a finite alphabet of terminals (disjoint from V ).
• R is a finite set of rules, where each rule is a variable and a string of variables and terminals (formally,
R ⊆ V × (V ∪ Σ)∗ ).
• S ∈ V is the start variable.
1
This is the syntax of context-free grammars. We now define the semantics.
Consider strings u, v, w of variables and terminals, a variable A, and a rule A → w, then we say that
∗
uAv =⇒ uwv (read: uAv yields uwv). We write u =⇒ v if u = v or if there exists a finite set of strings
u1 , ..., uk such that
u =⇒ u1 =⇒ ... =⇒ uk =⇒ v
n
o
∗
We define the language of G to be L(G) = w ∈ Σ∗ : S =⇒ w . Note that we only consider words in
Σ∗ . Thus, a derivation that ends in a string that contains variables is not “complete”.
We define the class of context-free languages (CFL, for short) of languages that are generated by CFGs.
Now that we know the formalism, let’s look at a couple of examples.
Example
n 2n 1.2: Can you think of a context-free grammar for the language
a b :n≥0 ?
S→
Example 1.3:
How about ai bj : j ≥ i ?
S→
T →
Example 1.4:
aSbb | ǫ
aSb | bT | ǫ
Tb | ǫ
How about ai bj cj di : i, j ∈ N ?
S→
aSd | T | ǫ
T →
bT c | ǫ
We hope you get the hang of it. Let’s move on.
Remark: Why do we call these grammars context free? The reason is that the left-hand side of the
rules contains only single variables. So intuitively, a variable is interpreted without taking into account its
surrounding variables/terminals, or its context. There are grammars which are context-sensitive, where we
can have rules such as A0B1 → B11. These capture a much more expressive class of languages. We probably
won’t discuss those in this course.
2
Closure properties
We defined the class CFL, and the natural thing to do would be to start investigating its closure properties.
We’ll start with an easy one:
Theorem 2.1. If L1 , L2 ∈ CF L, then L1 ∪ L2 ∈ CF L.
Proof. Let G1 = hV1 , Σ, R1 , S1 i and G2 = hV2 , Σ, R2 , S2 i be CFGs such that L1 = L(G1 ) and L2 = L(G2 ).
Assume w.l.o.g that V1 ∩ V2 = ∅ (otherwise we can change the names of the variables in V2 ). We can obtain
a CFG G for L1 ∪ L2 as follows:
Let S be a new variable S ∈
/ V1 ∪ V2 , then G = hV1 ∪ V2 , Σ, R1 ∪ R2 ∪ {S → S1 | S2 }, Si. That is, we
simply put both of the grammars together, with the rule S → S1 | S2 .
2
Let’s kick it up a notch:
Theorem 2.2. If L1 , L2 ∈ CF L, then L1 · L2 ∈ CF L.
Proof. Let G1 = hV1 , Σ, R1 , S1 i and G2 = hV2 , Σ, R2 , S2 i be CFGs such that L1 = L(G1 ) and L2 = L(G2 ).
Assume w.l.o.g that V1 ∩ V2 = ∅ (otherwise we can change the names of the variables in V2 ). We can obtain
a CFG G for L1 ∪ L2 as follows:
Let S be a new variable S ∈
/ V1 ∪ V2 , then G = hV1 ∪ V2 , Σ, R1 ∪ R2 ∪ {S → S1 · S2 }, Si. That is, we
simply put both of the grammars together, with the rule S → S1 · S2 .
We saw that the class REG is closed under many operations of languages. What about the class CFL?
You saw in class that it is closed under union but not under intersection. In the previous Recitation, we
proved that it is closed under concatenation and union. In the exercise you showed that CFL is closed under
the Kleene star operation. What about complement?
Corollary 2.3. The class CFL is not closed under complement.
Proof. If it was, then by De Morgan’s law (L1 ∩ L2 = L1 ∪ L2 ) it would follow that CFL is closed under
intersection, and we saw in class that isn’t.
3
The Pumping Lemma for CFL
Our main tool for proving that a language is not in CFL is the Pumping Lemma for CFL’s, which you’ve
seen in class. Let’s recall it.
Lemma 3.1 (The pumping lemma). Let L ∈ CF L, then there exists p ∈
|w| ≥ p then we can write w = uvxyz such that:
N such that for every w ∈ L, if
1. |vxy| ≤ p.
2. |vy| > 0.
3. ∀i ∈
N, uvixyiz ∈ L.
Let’s use the pumping lemma!
Proposition 3.2. Let Σ = {a, b} and consider the language L = {ww | w ∈ Σ∗ }. Then L ∈
/ CF L.
Proof. Assume by way of contradiction that L ∈CFL, and let p be be its pumping constant. Let w =
ap bp ap bp . Since w ∈ L and |w| ≥ p, there exist words u, v, x, y, z ∈ Σ∗ such that w = uvxyz and conditions
1,2,3 of the lemma are satisfied.
We now consider three distinct cases (drawn on the board):
• If vxy is contained in the first half of w, let s be the number of a’s in vy and let t be the number of
b’s in vy. By condition 2, s + t > 0. By condition 3, the word uxz = ap−s bp−t ap bp is in L, which is a
contradiction.
• The case where vxy is contained in the second half of w is handled similarly to the first case.
• If vxy is composed of parts from both halves of w, then by condition 1, vxy is contained in the last
bp of the first half of w and the first ap of the second half of w. Again, let s be the number of a’s in
vy and let t be the number of b’s in vy. Then vxy = ap bp−t ap−s bp ∈
/ L, which is a contradiction to
condition 3.
This example also yields an alternative proof to the fact that CFL is not closed to complement.
3
Proposition 3.3. For the same language L as in the previous example, it holds that L ∈ CF L.
Proof. Note that L = K1 ∪K2 where K1 = {uw | |u| = |w| and u 6= w ∈ Σ∗ } and K2 = {w ∈ Σ∗ | |w| is odd}.
Since CFL is closed to union, it is enough to show that K1 and K2 are in CFL. Note that K2 is clearly in
CFL since it is in REG, which is contained in CFL. We are thus left with proving that K1 ∈ CF L.
Let w ∈ K1 such that |w| = 2n (note that all words in K1 are of even length). There must exist some
1 ≤ i ≤ n such that wi 6= wn+i . Assume that wi = a and wn+i = b. Then we can write w = xaybz where
x, y, z ∈ Σ∗ and |x| = i − 1, |y| = n − 1 and |z| = n − i. In general, define K1′ as the language of all words
of the form xaybz or xbyaz, such that |x| = i − 1, |y| = n − 1 and |z| = n − i for some n and i such that
1 ≤ i ≤ n. Since i is arbitrary, then another equivalent definition for K1′ is the language of all words xaybz
or xbyaz such that |x| + |z| = |y| (make sure you see why). We have just shown that K1 ⊆ K1′ . It is also
easy to see that K1′ ⊆ K1 . Thus, K1′ = K1 . Now, to prove that K1 ∈ CF L, we describe a CFG for K1′ .
S → AB|BA
A → aAa|aAb|bAa|bAb|a
B → aBa|aBb|bBa|bBb|b
The idea behind the grammar is as follows: we allow to derive any words as x and z, but for every letter
we add to either of them, we also add a letter to y. We thus get that |y| = |x| + |z|, and we are done.
Corollary 3.4. CFL is not closed to complement
Proof. This follows directly from the last two examples.
Proposition 3.5. Consider the language
L = {w#x : w, x ∈ {0, 1}∗ , x is a substring of w}.
Then L ∈
/ CF L
Proof. We prove that it is not context free using the pumping lemma. Assume by way of contradiction that
L ∈ CF L, and let p be the pumping constant. Consider the word w = 0p 1p #0p 1p ∈ L. Let uvxyz be a
decomposition of w such that the pumping lemma conditions hold.
The trick in CFG pumping is to play with the location of the |vxy| part. If y ends before the #, then
using i = 0 we can shorten the part before #, which is clearly a contradiction (a long string is never a
substring of a shorter string).
Similarly, if v starts after the #, we can use i = 2 to lengthen the substring, which is again a contradiction.
So the entire |vxy| part is on the 1p #0p part. If # ∈ v or # ∈ y then if we choose i > 1 we get too many
#’s - contradiction. So # ∈ x. Thus, we have that v is a substring of 1p and y is a substring of 0p . If y = ǫ,
then by pumping with i = 0 we get less than p 1’s left of the #, which is a contradiction. Otherwise, |y| > 0,
so if we pump with i = 2 we get too many 0′ s in the substring - contradiction.
We conclude that L is not a CFL.
Proposition 3.6. Let L be a language such that L ⊆ {a}∗ . Then L satisfies the pumping lemma for context
free languages then it also satisfies the pumping lemma for regular languages.
Proof. Let L ⊆ {a}∗ that satisfies the pumping lemma for context free languages. Let p be the pumping
constant for L. z ∈ L such that |z| ≥ p. By the pumping lemma for CFL, we know that we can write
z = uvwxy such that:
1. |vwx| ≤ p
2. |vx| > 0
3. for all i ≥ 0 it holds that uv i wxi y ∈ L
4
Because L is over an unary alphabet, we can change the order of the sub-words and the word z will not
change, meaning we can also write z = wvxuy.
So. for all i ≥ 0, uv i wxi y = wv i xi uy = w(vx)i uy. Let’s write a little different: u′ = w, v ′ = vx, w′ = uy.
and we have that u′ (v ′ )i w′ ∈ L.
It’s easy to see that |u′ v ′ | ≤ p and |v ′ | > 0. So we can conclude that for the same p, the conditions of
pumping lemma for regular languages holds.
Example 3.1: From exam:
Let L ⊆ Σ∗ . We define the equivalence relation ≈L on Σ∗ as follows: for any x, y ∈ Σ∗ we say the x ≈L y
iff ∀z ∈ Σ∗ such that |z| is even it holds that xz ∈ L ⇐⇒ yz ∈ L (i.e. there is no even-length separating
suffix between x, y).
For example, let L = {an |nmod6 = 0} then it holds that:
• a ≈L a3 since for every k ∈ N it holds that a2k+1 ∈
/ L and a2k+3 ∈
/L
• a2 6≈ a4 , as z = a2 is a separating suffix of even length.
1. How many equivalence classes does the relation ≈L induce on Σ∗ for the language L = (ab)∗ ?
Solution: There are 2 equivalence classes: We will prove the equivalence classes are exactly L, L:
Let z ∈ {a, b}∗ . Then |z| is even iff one of the following holds:
• z ∈ L (this means that z = (ab)k for some k ∈ N ∪ {0}
• z ∈ L and z ∈ {aa, ab, ba, bb}k1 {aa, ba, bb}k2 {aa, ab, ba, bb}k3 for some k1 , k3 ∈ N ∪ {0}, k2 ∈ N
Now, let x, y ∈ L: So x = (ab)m , y = (ab)n for some m, n ∈ N ∪ {0}. Let z ∈ Σ∗ such that |z| is odd.
• If z = (ab)k for some k ∈ N ∪ {0} then xz = (ab)m+k ∈ L and yz = (ab)n+k ∈ L
• If z ∈ L and z ∈ {aa, ab, ba, bb}k1 {aa, ba, bb}k2 {aa, ab, ba, bb}k3 for some k1 , k3 ∈ N ∪ {0}, k2 ∈ N
then xz ∈
/ L and yz ∈
/L
So x ≈L y
Now, Let x ∈ L, and let z ∈ Σ∗ such that |z| is even. We will show that xz ∈
/ L:
If |x| is odd then |xz| is odd and yz ∈
/ L.
If |x| is even then since x ∈
/ L it holds that x ∈ {aa, ab, ba, bb}k1 {aa, ba, bb}k2 {aa, ab, ba, bb}k3 for some
k1 , k3 ∈ N ∪ {0}, k2 ∈ N. So xz contains as a sub-string at least one of {aa, ba, bb} and thus xz ∈
/ L.
Finally, for any x ∈ L, y ∈ L, ǫ is a separating suffix of even length, and we are done.
2. Let ∼L be the Myhill-Nerode relation of L. Prove that for all x, y ∈ Σ∗ it holds that x ∼L y iff x ≈L y
and for every σ ∈ Σ xσ ≈L yσ
Solution: Assume that x ∼L y. So there is no separating suffix of even length, and thus x ≈L y. Now,
assume towards a contradiction that there exists some σ ∈ Σ such that xσ 6≈L yσ thus there exists a
separating suffix of even length z ∈ Σ∗ such that w.l.o.g xσz ∈ L and yσz ∈
/ L. Thus zσ is a separating
suffix between x, y (of odd length) and thus x 6∼L y.
Assume now that x ≈L y and for every σ ∈ Σ xσ ≈L yσ. Let z ∈ Σ∗ . If |Z| is even then xz ∈
L ⇐⇒ yz ∈ L. otherwise, |z| is odd. Write z = σw for some σ ∈ Σ, w ∈ Σ∗ , and |w| is even. From
our assumption we know that xσ ≈L yσ, so it must hold that (xσ)w ∈ L ⇐⇒ (yσ)w ∈ L which is
equivalent to x(σw) ∈ L ⇐⇒ y(σw) ∈ L which in turn xz ∈ L ⇐⇒ yz ∈ L which means that a ∼L y
and we are done.
5
Example 3.2: From exam:
Let L= {ai bj ci dj |i, j ∈ N ∪ {0}}. Is L1 context free?
No. We will prove this suing the pumping lemma: Assume towards a contradiction that L1 ∈ CFL and
let p be the pumping constant. Let w = ap bp cp dp . Clearly |w| > p. We can write w = zuxvy, |uxv| ≤ p.
There are three possible cases:
1. uxv is a substring of ap bp . In this case the word zu0 xv 0 y is of the form am bn cp dp having m + n < 2p.
Thus either m < p or n < p, so zu0 xv 0 y ∈
/ L1 . This is a contradiction to the pumping lemma, so we
are done.
2. uxv is a substring of bp cp . In this case the word zu0 xv 0 y is of the form ap bm cn dp having m + n < 2p.
Thus either m < p or n < p, so zu0 xv 0 y ∈
/ L1 . This is a contradiction to the pumping lemma, so we
are done.
3. uxv is a substring of cp dp . In this case the word zu0 xv 0 y is of the form ap bp cm dn having m + n < 2p.
Thus either m < p or n < p, so zu0 xv 0 y ∈
/ L1 . This is a contradiction to the pumping lemma, so we
are done.
Example 3.3: From exam:
Let L2 = {ai bj cj di |i, j ∈ N ∪ {0}}. Is L1 context free?
Yes! It suffices to show that there is a context-free grammar G such that L(G) = L2 . We define
G = h{S, T }, {a, b}, R.Si where R is:
S → aSd|T
T → bT c|ǫ
6
Computability - Recitation 6
27/11/22 - 3/12/22
1
Turing Machines
We have seen that NFAs and DFAs recognize the class of regular languages. This class is rather small, and
the computational model is indeed quite weak - a finite state machine with no memory. As evidence, we
have seen that even languages such as {an bn : n ∈ } are not regular.
So how can we give automata more “power” to recognize more interesting languages? One of the obvious
ways is to add a memory model. Thus, we now replace our stack with a list, or an array. This model is
called a Turing Machine.
N
1.1
Definitions
Turing Machines, (TM, for short) are finite automata, equipped with an infinite tape, bounded on the left,
and a read/write head. The idea is that the input is written on the tape, and the machine can read and
write on the tape, as well as move between states. The tape acts as the memory.
Formally:
Definition 1.1. A (deterministic) Turing Machine is a 7-tuple M = hQ, Σ, Γ, δ, q0 , qaccept , qreject i where
1. Q is a finite set of states.
2. Σ is a finite alphabet not containing the blank symbol xy.
3. Γ is a finite tape alphabet such that Σ ⊆ Γ and xy ∈ Γ.
4. q0 , qaccept , qreject ∈ Q, qaccept 6= qreject .
5. δ : Q × Γ → Q × Γ × {R, L}.
This is the syntax of a TM, and now we need to define its semantics. That is, how are words accepted
and rejected.
Intuitively, given a word w = w1 · · · wn ∈ Σ∗ , a TM M starts with w written on the tape. It then follows
the rules of the transition function, using the state and the symbol which is written on the tape under the
location of the head. During the run, the head moves left and right, and might change the contents of the
tape, which will affect the run in later times. If, at any point, the machine reaches the state qaccept (resp.
qreject ), it halts and accepts (resp. rejects).
To define this formally, we first need to define what a configuration of a TM is. A configuration of a TM
is a piece of information which encodes everything that is needed in order to continue the run of the TM.
Formally, a configuration is a string uqσv, where the contents of the input tape is uσv, and the head is
currently below the letter σ. Note that since the tape is infinite to the right, we write v only up to the point
from which there are only xy symbols.
We now define when two configurations are consecutive.
• We say that configuration uqabv yields the configuration ucq ′ bv if δ(q, a) = (q ′ , c, R).
1
• We say that configuration uaqbv yields the configuration uq ′ acv if δ(q, b) = (q ′ , c, L).
• A special case is when there is a left transition at the left end of the tape. For a transition δ(q, a) =
(q ′ , b, L), the configuration qau yields the configuration q ′ bu. Note that there is no “notification” that
we have reached the end of the tape.
We are now ready to define runs on words.
For a word w ∈ Σ∗ , a partial run of M on w is a finite sequence of configurations c1 · · · ck , where c1 = q0 w,
and for every 1 ≤ i < k, ci yields ci+1 .
A partial run is an accepting run (resp. rejecting run) if the state in the last configuration is qaccept (resp.
qreject ). Finally, a word is accepted (resp. rejected) if there is an accepting (resp. rejecting) run of M on it.
Note there may be words on which there is no accepting run and no rejecting run. This will become very
important later on. Finally, we define L(M ) = {w : M accepts w}.
An Important Remark - Configurations. Every computational model that is based on a finite state
control (e.g. DFAs, NFAs, PDAs, TMs) has some notion of configurations. A configuration is some data
such that knowing this data is enough to continue the run (or all runs) of the model.
For example, in NFAs, knowing the current state is enough to continue the run on a word. In PDAs,
however, knowing the state is not enough - we also need to know the entire contents of the stack (think why is it not enough to know just the top of the stack?). And as we have just seen, in TMs we need to know
the entire contents of the tape, the location of the head, and the state (again - think why excluding each
of these elements will prevent us from defining the runs of a TM). Given the notion of a configuration, we
can talk about sequences of configurations. These are known as computations (or runs), and are the formal
model we use to reason about computations. Note how beautifully generic this notion is, and how it fits
both our conceptual notion of a computation, as well as our need for a simple formal definition.
1.2
Expressive power
The first interesting question that rises is whether this model gives us any additional computational power
over CFL. The (very) surprising answer is that not only do we get more computational power, but we get
computational power equivalent to a computer. In fact, the Church-Turing thesis states that anything that
can be computed on any physically-feasible computational model can be simulated by a TM.
1.3
Computing functions
So we see that TMs can recognize languages that are not context free. What else can they do? In order to
get accustomed to TMs, let’s see some other uses.
Apart from recognizing languages, TMs can also compute functions. For example, we will now see a TM
that computes the functions f (n) = n + 1, given a binary representation of n.
As we will see shortly, when designing such a TM, we immediately encounter a small problem with the
model, namely: how can we tell when we are in the leftmost cell?
Indeed, recall that when the head reaches the leftmost cell of the tape, there is no indication of it, and
we want some indication to know that we are there. It would be nice to assume that there is some symbol
$ before the input. To assume this, we first show how to move the entire input one cell to the right, and to
put a marker at the left end cell. This is done with the 4 state TM in Figure 1. Note that this is not a full
TM, just a piece of one. You can think of it as a subroutine.
Now we are ready to compute a function.
Example 1.4: This time, we will only give a low level description of the TM, and it is left as an exercise
to describe the TM fully.
In low level, the machine starts in q0 , and scans the tape to the right, until it reaches xy, it then moves
left and goes to state q1 . In q1 , if it sees 1, it changes it to 0 and continues left. If it sees 0, it changes into
2
a → a, R
q1
a → $, R
xy → a, L
q0
b → a, R
a → b, R
b → $, R
xy → b, L
q2
a → a, L
q3
b → b, L
b → b, R
Figure 1: TM that moves the entire input one cell to the right, and puts a $ at the beginning.
1 and halts. If $ is encountered, meaning that the input was of the form 111...1, we change $ to 1, and then
move the entire input one cell to the right, and write $ again to the left (to keep things nice and clean).
2
Robustness of the TM model
We have already seen several examples of computational model equivalence: DFA, NFA and regular expression are equivalent.
The idea of computational model equivalence also extends to TMs. In fact, a nice property of the TM
computational model is that it is robust: By changing minor technical properties of the TM model, we
usually get an equivalent computational model. Some informal examples:
• A bidirectionally unbounded TM (BUTM) is similar to a TM, except that its tape is unbounded to the
right and also to the left (as opposed to unbounded just to the right). The BUTM model is equivalent
to a TM.
• A stay TM (STM) is similar to a TM, but its transition function is of the type Q×Γ → Q×Γ×{L, R, S},
where S tells the head to stay in the same cell. The STM model is also equivalent to a TM.
• A 2-dimensional TM has a tape that’s an infinite square grid. It is also equivalent to a TM.
One particularly useful variant of a TM is a two-taped TM. In the next section, we define this model and
prove its equivalence to a TM. Before we do so, we need to ask ourselves what does it mean for a TM to be
equivalent to another model. Recall that a TM running on an input w can either accept, reject or not halt
on w. Thus, we need to generalize our previous definition of computational model equivalence.
Definition 2.1. Two machines (not necessarily of the same model) M and N are equivalent if for every
w ∈ Σ∗ the following hold:
• M accepts w iff N accepts w.
• M rejects w iff N rejects w.
• M does not halt on w iff N does not halt on w.
3
Definition 2.2. Two computational models X and Y are equivalent if the following hold:
• For every machine of type X there exists an equivalent machine of type Y.
• For every machine of type Y there exists an equivalent machine of type X .
3
TM with 2 tapes
A two-taped TM is an ordinary Turing machine, only with two tapes. Each tape has its own head for reading
and writing. Initially the input appears on tape 1, and the second tape starts out blank. The transition
function is changed to allow for reading, writing, and moving the heads on both tapes simultaneously.
Formally,
δ : Q × Γ2 → Q × Γ2 × {R, L}2
The expression
δ(q, γ1 , γ2 ) = (q ′ , γ1′ , γ2′ , L, R)
means that if the machine is in state q, head 1 is on γ1 and head 2 is on γ2 , then the machine moves to q ′ ,
writes γ1′ on tape 1 and moves head 1 left, and writes γ2′ on tape 2 and moves head 2 right.
An important thing to notice is that the “useful” part of two tapes is not the additional tape, but rather
the additional head. Indeed, mapping two tapes onto one is pretty easy. However, adding another head is
a whole new feature. It allows synchronous movement, which, a-priori, seems more powerful than a single
head.
It is obvious (i.e. left as an exercise) that two-tape machines are at least as expressive as ordinary TMs.
The interesting point is that the models are equivalent.
Theorem 3.1. For every two-tape TM M , there exists an equivalent TM M ′ .
Proof. There are several ways to prove this theorem. The general idea is to somehow simulate the actions
of the two tapes using only one tape.
Perhaps the most intuitive solution is to just write the two tapes consecutively on one tape, separated
by some special character #. It is not hard to prove that such a construction works.
Here, we show a different solution. Let M = hQ, Σ, Γ, δ, q0 , qacc , qrej i be the two-tape TM. We construct
′
′
, qrej
i as follows. The tape alphabet Γ′ will simulate the two tapes, as well as the
M ′ = hQ′ , Σ, Γ′ , δ ′ , q0′ , qacc
positions of the two heads.
Γ′ = (Γ × Γ × {0, 1} × {0, 1}) ∪ Σ ∪ {xy}
The letter (a, b, 0, 1) means that at this position, the first tape contains “a”, the second contains “b”, and
head 2 is also in this position, while head 1 is not.
The single tape machine operates as follows.
1. The machine starts by going over the input word, and replacing each occurrence of σ ∈ Σ with
(σ, xy, 0, 0). It then goes back to the left, and marks the first cell as (σ, xy, 1, 1). That is, both heads
are at this position.
2. The simulation phase: the machine encodes in its states the state of the two-tape machine. At every
stage, the machine scans the tape from left to right, searching for the letter where head 1 is on. Once
it is found, the machine remembers that letter (encodes it in its state). It then continues to the right,
and starts scanning back to find head 2. Thus, once it is back to the left, the machine encoded the
letters under the two reading heads. So the machine is now ready to decide what to do based on δ
(since it “knows” q, γ1 , and γ2 ).
Again, the machine scans the tape from left to right, searching for the letter where head 1 is on. Once
it is found the machine updates the tape according to δ, by changing the letter of the first tape (that
is, the first component of the state), and moving head 1 left or right (which means turning it to 0 on
4
the current component, and to 1 on an adjacent component). Then, the machine goes to right of the
tape, and goes back left, doing the same thing with head 2. When it reaches the left end of the tape
(xy), it has finished updating one move of the tape. It now repeats, until M states that it goes to qacc
or qrej , in which case M ′ does the same.
Note that while Q′ and Γ′ are larger than Q and Γ respectively, they are both still finite.
Think of M ′ as an emulator for M . The configuration of M ′ holds an encoding of the configuration of
M configuration. Whenever M makes a step which changes its configuration from c to c′ , M ′ makes a series
of steps in order to change its encoding of c to an encoding of c′ .
Remark: It is very difficult to get much more formal, without having to get dirty with indices and various
end-cases. This is problematic in the context of a course: ”am I being formal enough for the graders?”, but
it is even worse in “real life”: “am I being formal enough for the result to be true?”. One of the main skills
you need to acquire is the ability to see potential problems, and to explain why the proof overcomes them.
An important question we often ask when translating between variants of a model is what is the cost of
this translation? In automata, this question had a very clear meaning - what is the blowup in the number
of states/transitions. In TMs, however, this question is no longer simple. First, we may ask what the size of
the new TM is. Even this is not trivial - what do we mean by “size”? The number of states? transitions?
alphabet? Also, a completely different aspect of cost comes up - what is the runtime of the new machine on
a word? As you have seen in class, TMs may run for a long time on even a short word (in fact, they may
not stop at all).
Typically, the more interesting questions in TM is the latter - how is the runtime on a word affected by
a change in model (although the other questions are also widely studied).
For the construction above, observe that in every step of the original TM, we need to scan the tape
twice. That is, we do O(n) operations for every operation of the original machine, where n is the length
of the (longer) tape of the two tapes. Assume the original machine ran for t steps on a word w. Within t
steps, the original machine could write at most t symbols in each tape. So the maximal distance between
the two heads at every step is t. So in every step we make at most O(t) steps. Thus, the total runtime is
t · O(t) = O(t2 ). In particular, note that this is polynomial in the runtime of the machine. This will become
important later on in the course.
Finally, observe another interesting blowup in this translation - we increased the size of the alphabet
from |Γ| to 4|Γ|2 . We could have done the same procedure for k tapes (instead of 2), for the price of 2k |Γ|k
- exponential in k, and with a runtime of O(t2 ) (think why).
Thus, from now on, we may use machines with a fixed number of tapes. It is important to note that
while a k-tape machine is equivalent to a TM for any constant k, the value of k must be constant. In other
words, it must not depend on the input.
4
Closure properties of R and RE
For a TM M , we say that M recognizes a language L if L(M ) = L. We say that M decides L if L(M ) = L
and M halts on every input. That is, for every w ∈ Σ∗ we have that M accepts w if w ∈ L and M rejects
w if w ∈
/ L.
We
define RE = {L : L is recognizable by a TM}, R = {L : L is decidable by a TM}, and coRE =
L : L is recognizable by a TM . In class you will see that R = RE ∩ coRE.
The natural thing to do now is to study under which operations these classes are closed.
Let’s start with something easy:
Proposition 4.1. If L ∈ R then L ∈ R.
Proof. Let M be a machine that decides L, we swap qacc and qrej in M . Since M always halts, then for
every w, its run on w either gets to qacc or to qrej . Using this, it is easy to prove the correctness of this
modification.
5
Note that we crucially use the fact that M is a decider, and that M is deterministic. Indeed, for
nondeterministic machines (which we will see later on in the course), or for machines that only recognize
their language, this no longer works.
Proposition 4.2. If L1 , L2 ∈ R, then L1 ∪ L2 ∈ R.
Proof. Let M1 , M2 be TMs that decide L1 , L2 respectively.
We construct a TM M for L1 ∪ L2 as follows. The main idea is to run M1 on w, and if it rejects, run M2 .
If either machine accepts, we accept. Otherwise we reject. To give a more detailed description, we proceed
as follows.
M starts by marking a special delimiter # left of the input w. It then copies w to the left of # (assume
a bidirectional tape). M then passes control to M1 , with # acting as the left bound of the tape. If, at any
point, M1 moves to its accepting state, M also accepts. If M1 moves to the rejecting state, M erases the
contents of the tape up to #, then copies back w to the right of #, and passes control to M2 and answers
the same.
Correctness is relatively clear here - M accepts only if at least of of M1 or M2 accepts. Also, all the
“overhead” operations are things we know how to do with a TM - erase the input and copy strings.
Note that again, we rely on the fact that M1 and M2 are deciders, since otherwise M1 may get stuck,
and we will never try to run M2 . So can we show that RE is also closed under union? As it turns out - yes,
but we need to be somewhat more clever.
Proposition 4.3. If L1 , L2 ∈ RE, then L1 ∪ L2 ∈ RE.
Proof. Let M1 be TM that recognizes L1 , and M2 a TM that recognizes L2 .
We need to somehow run both M1 and M2 in parallel. There are several ways of doing that. Perhaps
the simplest is to use two tapes, and simulate each machine independently on its own tape. This is rather
simple: start by copying the word to the second tape, and then apply the transition function of each TM in
its separate tape. While this works, we take a different approach here.
What really happens here? If we consider the TM that is equivalent to the 2-tape TM we used here, we
actually see that in a way, we simply run the machines one step at a time. That is, M1 runs for a step, then
M2 , and so forth. This is a sort of parallel run, and it’s impressive that we can simulate a parallel run with
the (serial) model of TM.
We take a slightly different approach here.
We construct a TM M that recognizes L1 ∪ L2 as follows. Start by copying the input to a “safe” place,
left of the original input. Additionally, we store a counter i somewhere on the tape (left of the input, for
example). We now proceed to run M1 on the input for i steps.
How can we do that? well - if you’re re-reading this after seeing a universal machine, then this is trivial.
Otherwise, we modify M1 such that after taking a transition, it marks the place of the head on the tape,
then goes to the left of the tape, updates a step-counter, compares it to i, and if it hasn’t reached it yet,
goes back to where the head was, and continues for another step.
After M1 runs for i steps, if it accepts, then M accepts. Otherwise, clean the tape, copy w back to the
right of #, and run M2 on w for i steps. If it accepts, we can accept. Otherwise, increment i by 1, and
repeat the process.
To prove correctness, note that if w ∈ L1 ∪ L2 , then there exists some n > 0 such that either M1 or M2
accepts w within n steps. Thus, when the counter value i reaches n, our machine will accept w.
For the second direction, if M accepts w, then clearly either M1 or M2 accept w, so w ∈ L1 ∪ L2 , and
we are done.
Note that we do not prove that M always halts, as indeed - it may not halt at all!
6
5
Parallel runs
Running parallel computations is crucial in order to show closure properties of certain classes, as well as to
recognize certain languages. This technique usually comes in handy when we work with machines that are
not deciders. The general problem is that if we are given a machine, and we are not guaranteed that it is a
decider, we do not want to simply let it run, because it may never halt. Instead, we only run it for a certain
amount of steps, thus giving us time to decide if we want to do something else if the machine does not halt
after a certain number of steps.
Parallel runs are not a specific concept, but rather a general scheme. We demonstrate the idea here by
showing that RE is closed under concatenation.
Theorem 5.1. Let L1 , L2 ∈ RE, then L1 · L2 ∈ RE.
Proof. Let M1 , M2 be TMs that recognize L1 , L2 respectively. Observe that we cannot assume M1 and M2
halt on inputs that are not in their languages.
Let w ∈ Σ∗ be the input. The naive solution idea is to try every partition of w into w = uv, and then
run M1 on u and M2 on v. If both accept, then w ∈ L1 · L2 and we accept. Otherwise, we try the next
partition. However, this is not as simple as it sounds - when we try the first partition, it could be the case
that M1 does not halt. In this case, our TM also does not halt, and we will never check other partitions!
We break the problem into two parts. First, we construct a new machine M3 . This machine reads a word
of the form u#v (if the word is not of this form, it rejects). Upon reading this word, M3 runs M1 on u, and
then, if M1 accepts, erases the tape and runs M2 on v. Thus, M3 recognizes the language
L3 = {u#v : u ∈ L1 , v ∈ L2 }
Note that M3 is not necessarily a decider! Now, we are left with a simple task: given w, check whether we
can split w as w = uv such that M3 accepts u#v.
As we stated earlier, running M3 serially on every partition is not good enough, since it might not halt
on the first one, even though a later partition is accepted. So we want to run M3 “in parallel” on all the
partitions. This is done as follows.
We construct a TM M as follows: We add a new character ⊥ to the alphabet of M (in addition to
ΓM1 , ΓM2 , ΓM3 ). ⊥ will act as a delimiter. Start by copying the input to a safe place, left of the original
input, followed by ⊥. Next store a counter right after the first ⊥ and initialize it to 1 (i will keep track of
how many steps the simulation is to run), then add another ⊥. Now add yet another counter j, initialized to
|w|, followed by yet another ⊥. j will keep track of the current partition. The rest of the tape will be used
to simulate the run of M3 on every partition of the input w for i steps. Simulating M3 this way requires
operating on the 4th part of the tape (after the last ⊥) the same as M3 operates, but after every transition
of M3 , the counter i is reduced by 1 (while keeping a copy of its original value to use in the next partition)
and compared to 0. If it is 0, M stops simulating M3 , and moves to the next partition.
The simulation of M3 works as follows. M reads the value i of the counter, it then simulates M3 on
every partition of w, but only for i steps. When all the partitions have been simulated, the counter value is
increased by 1 and the same process begins. If, at any point, M3 accepts, then M accepts as well. Otherwise,
M does not halt. (keeping track on the current partition is done using the counter j. At every change of
partition, reduce j by 1 and compare to 0. If 0 is reached, the current simulation is done. The counter j is
the restored to |w| and i is increased by 1.
Now, if w ∈ L1 · L2 , then there exists a partition w = uv such that u#v is accepted by M3 . Since it is
accepted, there exists a finite number k such that M3 accepts u#v after k iterations. So M also accepts w
after the counter i reaches value k. Conversely, if M accepts w, then one of the partitions is accepted by
M3 , so w ∈ L1 · L2 .
6
Universal TM
In the late 80’s, for those who remember, hand-held video games were very popular. These devices ran
a single game. In the early 90’s, came the GAMEBOYr , and there was a big hype, which changed the
7
hand-held game industry forever. What was the big change? The main advantage of the GAMEBOY was
allowing different programs to run on the same machine. This ability seems trivial in computers today (for
those who remember the old floppy disk, perhaps it doesn’t).
Enough history, back to TMs. A TM runs a single program, by definition. However, for various uses, we
want TMs to be able to run different programs. Can we do that? The answer, as expected (after claiming
TMs are as strong as computers), is yes.
6.1
Encoding
The first observation we need to make is that we can encode a TM as a finite string over a finite alphabet.
We will show how to do that shortly.
A universal TM is a TM that can take as input an encoding of a TM and simulate it on some input
(given or fixed).
We start by describing how we can encode a TM. We will use the alphabet {0, 1, #}. Let M =
hQ, Σ, Γ, δ, q0 , qacc , qrej i. We encode M as follows. The states are encoded as increasing binary numbers.
That is, if Q = {q1 , ..., qn }, then the encoding starts with
0#1#10#11#100#...
We end the state string with a triple ###. Next, we encode Σ and Γ. We encode each letter with a
binary string of length ⌈log |Γ|⌉ (we will later see why). We first encode the letters of Σ as increasing binary
numbers, separated by #. To encode Γ, we start from the last symbol we encode from Σ, and encode Γ \ Σ
as binary numbers. We separate Σ and Γ with ###.
Example 6.2: If Σ = {a, b} and Γ = {xy, 0, 1} ∪ Σ then we have |Γ| = 5. Thus, we use 3 bits to encode
each letter. The encoding will be
000#001#010#011#101
Corresponding to a, b, xy, 0, 1.
Next, we encode δ. A single transition δ(q, γ) = (q ′ , γ ′ , L) is encoded as a tuple
hqi#hγi#hq ′ i#hγ ′ i#hLi##
Where hLi = 0 and hRi = 1. We end the description of δ with ###.
Finally, we encode q0 , qacc , qrej with their binary encoding, separated by ###.
Example 6.3: Assume hqi = 101, hγi = 010, hq ′ i = 1 and hγ ′ i = 100. Assume we have the transition
δ(q, γ) = (q ′ , γ ′ , R), then the encoding will contain
101#010#1#100#1##
Given a TM, we denote it’s encoding by hM i. We note here that in the same spirit, we can encode many
other different objects, such as NFAs, graphs, matrices, logic formulas, and many more, which you will see
during this course.
6.4
A universal machine
We want to construct a TM U that, given an encoding hM, wi of a TM M and a word w, can simulate the
run of M on w. That is, U accepts/rejects/gets stuck if M accepts/reject/gets stuck on w, respectively.
To construct U , we use a TM with 3 tapes. The first tape will hold the description of M , the second
will hold the working tape of M , and the third tape will be used to hold the current state of M , and for
calculations.
8
U starts with hM i written on the first tape. Let’s also assume that we also get as input a word w. We
want to run M on w. For that, we assume that we are actually given an encoding of w, according to the
encoding of the alphabet of M . That is, w ∈ {0, 1, #}∗ , and every # separates consecutive letters.
U starts by finding the beginning of w and copying it to tape 2. It then restores head 2 to the beginning
of w. Next, U finds q0 in the description of M and writes it on tape 3.
At the beginning of every iteration, we assume that head 2 is pointing to the letter that the head of M
is supposed to point. U operates as follows. First, it scans tape 3 and compares it to qacc and qrej . If one
of the comparisons succeeds, U acts accordingly (accept/reject).
Next, U scans all the transitions in the description of δ, and for each one it compares the first two
components to tapes 3 and 2 respectively. That is, U searches for the appropriate transition. Once a
transition is found, U finds the letter that should be written, and replaces it with the current letter in tape
2. Recall that “letters” in M are encoded as strings of the same length in hM i, so replacing the letter will
not require us to shrink or push tape 2.
Next, U scans where it should move the head to (R or L), and moves head 2 to the next letter.
Finally, U finds the new state M should go to, and writes it on tape 3.
Recap on the operation of U :
1. Scan tape 1 to find the beginning of w.
2. Copy w to tape 2, reset heads 1 and 2.
3. Scan tape 1 to find q0 , copy q0 to tape 3. Reset heads 1,3.
4. In every iteration, repeat the following.
(a) Compare tape 3 to qacc , qrej , and act accordingly if successful.
(b) Scan tape 1 and find the beginning of δ.
(c) Compare tape 3 and current letter in tape 2 until appropriate transition is found.
(d) Replace current letter in tape 2 with letter from transition.
(e) Move head of tape 2 left or right according to transition, to next letter.
(f) Replace content of tape 3 with new state.
Observe that we have constructed a single machine, not a general abstract machine. There are many
specific implementations of such universal machines.
An important thing to notice about the construction is that we can make small changes to create useful
variants of U , for example:
• A machine that runs M on w and acts the opposite of M .
• A machine that runs M on w for a certain number of steps (not infinitely).
• A machine that runs M on w only as long as M stays in the limits of a certain portion of the tape.
Final remark: there are many computational models that you have not seen in this course (and probably
won’t see). One measure for the strength of a model is whether it can simulate a TM.
6.5
Remark about TM comparing two numbers
The careful reader might notice that in the last two proofs we used indices, and had to compare the current
index to 0, and stop if the numbers are equal. But(!) how does a TM determine if two numbers are indeed
equal? We did not formally explain this yet. Indeed, we can define a TM that accomplishes this task, and even
describe the states!. A detailed explanation can be found here: TM machine as comparator of numbers :)
9
Computability - Recitation 7
4/12/22 - 11/12/22
1
Closure properties of R and RE
For a TM M , we say that M recognizes a language L if L(M ) = L. We say that M decides L if L(M ) = L
and M halts on every input. That is, for every w ∈ Σ∗ we have that M accepts w if w ∈ L and M rejects
w if w ∈
/ L.
We
define RE = {L : L is recognizable by a TM}, R = {L : L is decidable by a TM}, and coRE =
L : L is recognizable by a TM . In class you will see that R = RE ∩ coRE.
The natural thing to do now is to study under which operations these classes are closed.
Let’s start with something easy:
Proposition 1.1. If L ∈ R then L ∈ R.
Proof. Let M be a machine that decides L, we swap qacc and qrej in M . Since M always halts, then for
every w, its run on w either gets to qacc or to qrej . Using this, it is easy to prove the correctness of this
modification.
Note that we crucially use the fact that M is a decider, and that M is deterministic. Indeed, for
nondeterministic machines (which we will see later on in the course), or for machines that only recognize
their language, this no longer works.
Proposition 1.2. If L1 , L2 ∈ R, then L1 ∪ L2 ∈ R.
Proof. Let M1 , M2 be TMs that decide L1 , L2 respectively.
We construct a TM M for L1 ∪ L2 as follows. The main idea is to run M1 on w, and if it rejects, run M2 .
If either machine accepts, we accept. Otherwise we reject. To give a more detailed description, we proceed
as follows.
M starts by marking a special delimiter # left of the input w. It then copies w to the left of # (assume
a bidirectional tape). M then passes control to M1 , with # acting as the left bound of the tape. If, at any
point, M1 moves to its accepting state, M also accepts. If M1 moves to the rejecting state, M erases the
contents of the tape up to #, then copies back w to the right of #, and passes control to M2 and answers
the same.
Correctness is relatively clear here - M accepts only if at least of of M1 or M2 accepts. Also, all the
“overhead” operations are things we know how to do with a TM - erase the input and copy strings.
Note that again, we rely on the fact that M1 and M2 are deciders, since otherwise M1 may get stuck,
and we will never try to run M2 . So can we show that RE is also closed under union? As it turns out - yes,
but we need to be somewhat more clever.
Proposition 1.3. If L1 , L2 ∈ RE, then L1 ∪ L2 ∈ RE.
1
Proof. Let M1 be TM that recognizes L1 , and M2 a TM that recognizes L2 .
We need to somehow run both M1 and M2 in parallel. There are several ways of doing that. Perhaps
the simplest is to use two tapes, and simulate each machine independently on its own tape. This is rather
simple: start by copying the word to the second tape, and then apply the transition function of each TM in
its separate tape. While this works, we take a different approach here.
What really happens here? If we consider the TM that is equivalent to the 2-tape TM we used here, we
actually see that in a way, we simply run the machines one step at a time. That is, M1 runs for a step, then
M2 , and so forth. This is a sort of parallel run, and it’s impressive that we can simulate a parallel run with
the (serial) model of TM.
We take a slightly different approach here.
We construct a TM M that recognizes L1 ∪ L2 as follows. Start by copying the input to a “safe” place,
left of the original input. Additionally, we store a counter i somewhere on the tape (left of the input, for
example). We now proceed to run M1 on the input for i steps.
How can we do that? well - if you’re re-reading this after seeing a universal machine, then this is trivial.
Otherwise, we modify M1 such that after taking a transition, it marks the place of the head on the tape,
then goes to the left of the tape, updates a step-counter, compares it to i, and if it hasn’t reached it yet,
goes back to where the head was, and continues for another step.
After M1 runs for i steps, if it accepts, then M accepts. Otherwise, clean the tape, copy w back to the
right of #, and run M2 on w for i steps. If it accepts, we can accept. Otherwise, increment i by 1, and
repeat the process.
To prove correctness, note that if w ∈ L1 ∪ L2 , then there exists some n > 0 such that either M1 or M2
accepts w within n steps. Thus, when the counter value i reaches n, our machine will accept w.
For the second direction, if M accepts w, then clearly either M1 or M2 accept w, so w ∈ L1 ∪ L2 , and
we are done.
Note that we do not prove that M always halts, as indeed - it may not halt at all!
2
Parallel runs
Running parallel computations is crucial in order to show closure properties of certain classes, as well as to
recognize certain languages. This technique usually comes in handy when we work with machines that are
not deciders. The general problem is that if we are given a machine, and we are not guaranteed that it is a
decider, we do not want to simply let it run, because it may never halt. Instead, we only run it for a certain
amount of steps, thus giving us time to decide if we want to do something else if the machine does not halt
after a certain number of steps.
Parallel runs are not a specific concept, but rather a general scheme. We demonstrate the idea here by
showing that RE is closed under concatenation.
Theorem 2.1. Let L1 , L2 ∈ RE, then L1 · L2 ∈ RE.
Proof. Let M1 , M2 be TMs that recognize L1 , L2 respectively. Observe that we cannot assume M1 and M2
halt on inputs that are not in their languages.
Let w ∈ Σ∗ be the input. The naive solution idea is to try every partition of w into w = uv, and then
run M1 on u and M2 on v. If both accept, then w ∈ L1 · L2 and we accept. Otherwise, we try the next
partition. However, this is not as simple as it sounds - when we try the first partition, it could be the case
that M1 does not halt. In this case, our TM also does not halt, and we will never check other partitions!
We break the problem into two parts. First, we construct a new machine M3 . This machine reads a word
of the form u#v (if the word is not of this form, it rejects). Upon reading this word, M3 runs M1 on u, and
then, if M1 accepts, erases the tape and runs M2 on v. Thus, M3 recognizes the language
L3 = {u#v : u ∈ L1 , v ∈ L2 }
Note that M3 is not necessarily a decider! Now, we are left with a simple task: given w, check whether we
can split w as w = uv such that M3 accepts u#v.
2
As we stated earlier, running M3 serially on every partition is not good enough, since it might not halt
on the first one, even though a later partition is accepted. So we want to run M3 “in parallel” on all the
partitions. This is done as follows.
We construct a TM M as follows: We add a new character ⊥ to the alphabet of M (in addition to
ΓM1 , ΓM2 , ΓM3 ). ⊥ will act as a delimiter. Start by copying the input to a safe place, left of the original
input, followed by ⊥. Next store a counter right after the first ⊥ and initialize it to 1 (i will keep track of
how many steps the simulation is to run), then add another ⊥. Now add yet another counter j, initialized to
|w|, followed by yet another ⊥. j will keep track of the current partition. The rest of the tape will be used
to simulate the run of M3 on every partition of the input w for i steps. Simulating M3 this way requires
operating on the 4th part of the tape (after the last ⊥) the same as M3 operates, but after every transition
of M3 , the counter i is reduced by 1 (while keeping a copy of its original value to use in the next partition)
and compared to 0. If it is 0, M stops simulating M3 , and moves to the next partition.
The simulation of M3 works as follows. M reads the value i of the counter, it then simulates M3 on
every partition of w, but only for i steps. When all the partitions have been simulated, the counter value is
increased by 1 and the same process begins. If, at any point, M3 accepts, then M accepts as well. Otherwise,
M does not halt. (keeping track on the current partition is done using the counter j. At every change of
partition, reduce j by 1 and compare to 0. If 0 is reached, the current simulation is done. The counter j is
the restored to |w| and i is increased by 1.
Now, if w ∈ L1 · L2 , then there exists a partition w = uv such that u#v is accepted by M3 . Since it is
accepted, there exists a finite number k such that M3 accepts u#v after k iterations. So M also accepts w
after the counter i reaches value k. Conversely, if M accepts w, then one of the partitions is accepted by
M3 , so w ∈ L1 · L2 .
3
Universal TM
In the late 80’s, for those who remember, hand-held video games were very popular. These devices ran
a single game. In the early 90’s, came the GAMEBOYr , and there was a big hype, which changed the
hand-held game industry forever. What was the big change? The main advantage of the GAMEBOY was
allowing different programs to run on the same machine. This ability seems trivial in computers today (for
those who remember the old floppy disk, perhaps it doesn’t).
Enough history, back to TMs. A TM runs a single program, by definition. However, for various uses, we
want TMs to be able to run different programs. Can we do that? The answer, as expected (after claiming
TMs are as strong as computers), is yes.
3.1
Encoding
The first observation we need to make is that we can encode a TM as a finite string over a finite alphabet.
We will show how to do that shortly.
A universal TM is a TM that can take as input an encoding of a TM and simulate it on some input
(given or fixed).
We start by describing how we can encode a TM. We will use the alphabet {0, 1, #}. Let M =
hQ, Σ, Γ, δ, q0 , qacc , qrej i. We encode M as follows. The states are encoded as increasing binary numbers.
That is, if Q = {q1 , ..., qn }, then the encoding starts with
0#1#10#11#100#...
We end the state string with a triple ###. Next, we encode Σ and Γ. We encode each letter with a
binary string of length ⌈log |Γ|⌉ (we will later see why). We first encode the letters of Σ as increasing binary
numbers, separated by #. To encode Γ, we start from the last symbol we encode from Σ, and encode Γ \ Σ
as binary numbers. We separate Σ and Γ with ###.
3
Example 3.2: If Σ = {a, b} and Γ = {xy, 0, 1} ∪ Σ then we have |Γ| = 5. Thus, we use 3 bits to encode
each letter. The encoding will be
000#001#010#011#101
Corresponding to a, b, xy, 0, 1.
Next, we encode δ. A single transition δ(q, γ) = (q ′ , γ ′ , L) is encoded as a tuple
hqi#hγi#hq ′ i#hγ ′ i#hLi##
Where hLi = 0 and hRi = 1. We end the description of δ with ###.
Finally, we encode q0 , qacc , qrej with their binary encoding, separated by ###.
Example 3.3: Assume hqi = 101, hγi = 010, hq ′ i = 1 and hγ ′ i = 100. Assume we have the transition
δ(q, γ) = (q ′ , γ ′ , R), then the encoding will contain
101#010#1#100#1##
Given a TM, we denote it’s encoding by hM i. We note here that in the same spirit, we can encode many
other different objects, such as NFAs, graphs, matrices, logic formulas, and many more, which you will see
during this course.
3.4
A universal machine
We want to construct a TM U that, given an encoding hM, wi of a TM M and a word w, can simulate the
run of M on w. That is, U accepts/rejects/gets stuck if M accepts/reject/gets stuck on w, respectively.
To construct U , we use a TM with 3 tapes. The first tape will hold the description of M , the second
will hold the working tape of M , and the third tape will be used to hold the current state of M , and for
calculations.
U starts with hM i written on the first tape. Let’s also assume that we also get as input a word w. We
want to run M on w. For that, we assume that we are actually given an encoding of w, according to the
encoding of the alphabet of M . That is, w ∈ {0, 1, #}∗ , and every # separates consecutive letters.
U starts by finding the beginning of w and copying it to tape 2. It then restores head 2 to the beginning
of w. Next, U finds q0 in the description of M and writes it on tape 3.
At the beginning of every iteration, we assume that head 2 is pointing to the letter that the head of M
is supposed to point. U operates as follows. First, it scans tape 3 and compares it to qacc and qrej . If one
of the comparisons succeeds, U acts accordingly (accept/reject).
Next, U scans all the transitions in the description of δ, and for each one it compares the first two
components to tapes 3 and 2 respectively. That is, U searches for the appropriate transition. Once a
transition is found, U finds the letter that should be written, and replaces it with the current letter in tape
2. Recall that “letters” in M are encoded as strings of the same length in hM i, so replacing the letter will
not require us to shrink or push tape 2.
Next, U scans where it should move the head to (R or L), and moves head 2 to the next letter.
Finally, U finds the new state M should go to, and writes it on tape 3.
Recap on the operation of U :
1. Scan tape 1 to find the beginning of w.
2. Copy w to tape 2, reset heads 1 and 2.
3. Scan tape 1 to find q0 , copy q0 to tape 3. Reset heads 1,3.
4. In every iteration, repeat the following.
4
(a) Compare tape 3 to qacc , qrej , and act accordingly if successful.
(b) Scan tape 1 and find the beginning of δ.
(c) Compare tape 3 and current letter in tape 2 until appropriate transition is found.
(d) Replace current letter in tape 2 with letter from transition.
(e) Move head of tape 2 left or right according to transition, to next letter.
(f) Replace content of tape 3 with new state.
Observe that we have constructed a single machine, not a general abstract machine. There are many
specific implementations of such universal machines.
An important thing to notice about the construction is that we can make small changes to create useful
variants of U , for example:
• A machine that runs M on w and acts the opposite of M .
• A machine that runs M on w for a certain number of steps (not infinitely).
• A machine that runs M on w only as long as M stays in the limits of a certain portion of the tape.
Final remark: there are many computational models that you have not seen in this course (and probably
won’t see). One measure for the strength of a model is whether it can simulate a TM.
3.5
Remark about TM comparing two numbers
The careful reader might notice that in the last two proofs we used indices, and had to compare the current
index to 0, and stop if the numbers are equal. But(!) how does a TM determine if two numbers are indeed
equal? We did not formally explain this yet. Indeed, we can define a TM that accomplishes this task, and even
describe the states!. A detailed explanation can be found here: TM machine as comparator of numbers :)
4
Nondeterministic TMs
A nondeterministic TM (NTM, for short) is exactly the same as a TM, with the only difference that
δ : Q\{qacc , qrej } × Γ → 2Q×Γ×{R,L} \∅
Thus, at every stage, the machine can nondeterministically choose a transition. As in regular automata, this
means that every word has many possible runs. Formally, we say that a configuration d follows configuration
c if there exists a transition rule in δ according to which d follows from c as in the deterministic case.
We say that w ∈ L(M ) if w has an accepting run. This implies that when M runs on w, there could be
runs that reject and runs that get stuck, but if there is even one run that accepts, the word is accepted. The
NTM M is considered a decider if for every input w, every run of M on w is a halting run.
Example 4.1
Consider the language C = {hni : n is a composite number}. First, let’s see why C is recognizable by an
NTM: an obvious answer is that C is decidable. But let’s use the non-determinism ad-hoc. Given a number
n, an NTM can non-deterministically write on the tape every number p between 2, ..., n − 1 (one number per
non-deterministic choice), and check whether p divides n - if so, accept. Thus, n is accepted iff there exists
an accepting run, which happens iff there exists a factor of n.
5
4.2
Equivalence of NTMs and TMs
TMs are a particular case of NTMs, so they are at most as powerful. That is, every language that can be
recognized by a TM can also be recognized by an NTM. The interesting question is whether the converse it
true. In the context of computability, the answer is yes - every NTM has a TM that recognizes the same
language. Before proving this, we go through some notions and definitions.
4.2.1
Runtrees of NTMs
Let N = hQ, Σ, Γ, δ, q0 , qrej , qacc i be an NTM, and let w ∈ Σ∗ be an input for N . The runtree of N w.r.t w,
denoted TN,w = hV, Ei, is formally defined as follows. Let C denote the set of all configurations of N . We
have the following:
• V ⊆ C × (N ∪ {0}). That is, every vertex hc, ii corresponds to the configuration c of N that lies in the
level i of the tree TN,w .
• The root of TN,w is hq0 w, 0i. That is, the root corresponds to the initial configuration of N on w, q0 w,
and it has no incoming edges.
S
• E ⊆ i≥0 (C × {i}) × (C × {i + 1}) is such that forall i ≥ 0, it holds that E(hc, ii, hd, i + 1i) iff there
exists a transition rule in δ according to which d follows from c. That is, from every vertex that
corresponds to the configuration c there is an edge to a vertex (in the next level) that correcponds to
the configuration d whenever d is a consecutive configuration of c.
For simplicity, we abuse notation and refer to a vertex by the configuration it corresponds to. For
example, when we say that the vertex v is an accepting configuration, then we mean that v is of the form
hc, ii, where c is an accepting configuration.
Intuitively, the run tree TN,w encodes all the runs of N on w - a path from the root q0 w to some node in
the tree corresponds to a partial run of N on w, a path from the root to a leaf corresponds to an accepting or
a rejecting run, and infinite paths from the root correspond to non-halting runs. Note that different vertices
in the tree can correspond to the same configuration. This is both because a configuration can be reached via
different runs and because as a run of the machine can be be stuck in a loop, in which case the configuration
re-appears in the same path of the tree. Finally, as we are interested in the runs of N on w, we may assume
that all the vertices in the tree are reachable from the root q0 w. Also note that if a configuration is a halting
configuration (that is, its state is qrej or qacc ), then it is a leaf in the tree. Finally, note that while V may
be infinite, it is finite when N is a decider.
remark Let N be a fixed NTM. Then there is a constant k that depends on |hN i| such that for every
input w for N , k bounds the branching degree of TN,w , that it, the maximal number of children of a node
in the tree is at most k.
The following lemma follows form the definitions, and is left as an exercise.
Lemma 4.1. An NTM N is a decider iff all its runtrees are finite.
Proof. Left to the reader. The hard direction is to show that if N is a decider, then for every input w, the
runtree TM,w is finite. To show that, you can use Remark 4.2.1 and König’s lemma.
Theorem 4.2. For every decider NTM N , there exists a decider TM D with L(N ) = L(D).
Proof. Let N = hQ, Σ, Γ, δ, q0 , qrej , qacc i be a decider NTM. We decsribe an equivalent TM D. The idea
is as follows. Given input w, the machine D scans that runtree TN,w to find an accepting configuration.
However, note that D has no explicit description of TN,w , so D needs to apply the scan given only w and the
decription of N (which can be hardcoded in D). For this, we introduce the notion of addresses of vertices in
TN,w . Let k be a constant that bounds the branching degree of every runtree of N , and consider the alphabet
Σk = {1, 2, . . . , k}. We can think of every word u in Σ∗k as an address of a vertex in TN,w . For example, the
6
word u = ǫ is the address of the root q0 w, and the word u = 13 is the address of the configuration d that
we reach from the root by following the path q0 w → c → d in the tree, where c is the first1 child of q0 w,
and d is the third child of c. Note that there are words u ∈ Σ∗k that do not describe an address of a vertex
in the tree, and we denote them by invalid adresses. Also, note that given a word u ∈ Σ∗k , the machine
D can check whether u is a valid address to an accepting configuration in TN,w . Indeed, given an address
u = u1 · u2 · · · ui , D can write down the current configuration c0 = q0 w which is the initial configuration of
N on w. Then, D checks (according to δ) whether c0 has a u1 ’th following configuration, c1 . If c1 exists, D
writes it down and proceeds similarly to the letter u2 in order to compute the configuration c2 . If at some
point, D discovers that the following configuration according to the address u does not exist, then we know
that u is invalid. Otherwise, once D computes the configuration ci , we check whether it is accepting.
We’re now ready to define D. Given input w, the machine D operates in iterations. In the i’th iteration:
1. D writes down all the addresses u ∈ Σ∗k of length i in lexicographic order.
2. D goes over the writeen addresses and checks whether one of them describes a path from the root q0 w
to an accepting configuration in TN,w . If such an address is found, then D accepts. Otherwise, proceed
to 3.
3. If all addresses u of length i are invalid, D rejects. Otherwise, proceed to the next iteration.
Intuitively, considering the addresses in minlex order corresponds to scanning the runtree TN,w in a
BFS manner. Correctness follows easily. Indeed, if w ∈ L(N ), then the runtree TM,w has an accepting
configuration and eventually D’s scan finds it and accepts. Conversely, if D accepts, then there is some u
that describes an address of an accepting configuration in TN,w and thus w ∈ L(N ). Finally, note that D
is a decider. Indeed, if there is a word w ∈
/ L(N ), then there is no accepting configuration in TN,w . Now
by Lemma 4.1, TN,w has a finite height. Hence, there is some t such that all the words of length t over Σk
describe invalid addresses and thus D cannot have more than t iterations. Hence, it has to reject w.
A very important point that needs to be made regarding this construction is its runtime cost. Assuming
the NTM N has at most k consecutive configurations for each configuration, and that it performs t steps on
some input, the construction yields a TM with a runtime of O(t2 · k t ) = 2O(t) steps determined by the size of
the tree and the length of an encoding of a run. Thus, this translation from NTMs to TMs is not efficient.
Can we make it efficient? The surprising answer is that we don’t know.
Finally, note that the above construction works if we assume that the machine N is not a decider, that
is, we can generalize the previous theorem to the following.
Theorem 4.3. For every NTM N , there exists a TM D with L(N ) = L(D) and:
1. If N is a decider, then D is a decider.
2. For every word w ∈
/ L(N ), it holds that D halts on w iff all the runs of N on w are halting runs.
Indeed, if we apply the same construction for a non decider TM N that has non-halting runs and accepting
ones, then D will eventually find an accepting configuration in the runtree as it scans the tree in a BFS
manner. Also, note that D does not halt when N has an infinite run but has no accepting runs.
1 Note that we are assuming that there is an order on the consecutive configurations. This is okay, as the description of N
(that is encoded in D) defines such an order.
7
Computability - Recitation 8
December 13, 2022
1
The concept of mapping reduction
We start with a reminder. Let L1 , L2 ⊆ Σ∗ . We say that L1 is mapping-reducible to L2 , and denote
L1 ≤m L2 , if there exists a computable function f : Σ∗ → Σ∗ such that for every x ∈ Σ∗ it holds that x ∈ L1
iff f (x) ∈ L2 . f is then called a reduction from L1 to L2 . We remind that since f is computable, this means
that there exists a TM T such that given input x ∈ Σ∗ , T always halts with f (x) ∈ Σ∗ written on the tape.
The intuition behind this, is that L2 is “harder” than L1 , in the sense that if we have a TM M that
decides L2 , we can decide L1 as follows. Given input x to L1 , run T on x to obtain T (x), then run M on
T (x) and answer the same.1
The following theorem demonstrates the intuition.
Theorem 1.1. Let L1 , L2 ⊆ Σ∗ such that L1 ≤m L2 . The following holds:
1. If L2 ∈ RE then L1 ∈ RE.
2. If L2 ∈ co-RE then L1 ∈ co-RE.
Proof. For the first part, assume that L2 ∈ RE, so there exists a TM M that recognizes L2 . By our
assumption, there exists a reduction f : Σ∗ → Σ∗ from L1 to L2 . We define a TM N , that given input x
works as follows:
• Compute y = f (x).
• Simulate M on the input y.
It holds that N accepts x iff M accepts f (x) iff f (x) ∈ L2 iff x ∈ L1 . So N recognizes L1 , and L1 ∈ RE.
As for the second part, notice that a reduction from L1 to L2 is also a reduction from L1 and L2 , so the
second part can be deduced by applying the first part of the claim to the languages L1 and L2 .
So how do we use this theorem? Given a language L, if we think that L ∈
/ RE, we look for a language
to reduce from. As you will see in the exercise, reductions are transitive. This means that with every
language for which we prove undecidability, we increase our arsenal. Currently, our main “weapons” are
ATM ∈ RE \ co-RE, and ATM ∈ co-RE \ RE where ATM = {⟨M, w⟩ | w ∈ L (M )}.
2
Classifying languages into computability classes
We have seen that RE ∩ co-RE = R. Hence, every language belongs to exactly one of the following four sets:
R, RE \ R, co-RE \ R or RE ∪ co-RE. The tools that we have allow us to classify many languages into the
correct set. Let’s see some examples.
1 Observe that this is somewhat a “strong” notion - we not only use M to solve L , we do it with a single use at the end.
1
There is a weaker notion of reduction called Turing reductions.
1
2.1
ALLTM
Define the language ALLTM = {⟨M ⟩ : L(M ) = Σ∗ }.
Proposition 2.1.
ALLTM ∈ RE ∪ co-RE
Proof. We split the proof into two claims.
Claim:
ATM ≤m ALLTM (and thus, ALLTM ∈
/ co-RE)
Construction: The reduction proceeds as follows. On input ⟨M, w⟩ for ATM , the reduction returns ⟨K⟩
where K is a machine that on input x, simulates M on w and answers the same (i.e. if M accepts, so does
K, and if M rejects, so does K. Clearly if M does not halt, so does K). Note that K ignores the input x.
Correctness: If ⟨M, w⟩ ∈ ATM then K accepts every input, so L(K) = Σ∗ , so ⟨K⟩ ∈ ALLTM .
Conversely, if ⟨M, w⟩ ∈
/ ATM , then M either gets stuck or rejects w. In any case, K does not accept x,
so L(K) = ∅ =
̸ Σ∗ , so ⟨K⟩ ∈
/ ALLTM . Thus, the reduction is correct.
Computability: Finally, we still need to show that the reduction is computable. This is often the confusing
part, since it’s usually clear that it’s computable, but you don’t know how formal you need to get. The
answer is that you should explain the nontrivial parts. In this case:
The reduction is computable since from the encoding of ⟨M, w⟩ we construct the new machine K by
making it a universal machine and hard-coding its input to be ⟨M, w⟩, so we can create ⟨K⟩ using an
algorithm (and so using a TM).
Remark 2.2. Note that the reduction didn’t run M on w, it only constructed ⟨K⟩ from ⟨M, w⟩.
We now proceed to show the second reduction, and it’s a bit trickier. The problem is that we want to
simulate M on w, but we actually want to do something if M does not accept w. Now, if M rejects w,
that’s fine. We’ll wait until it rejects, and go about our business. However, what happens if M does not halt
on w? Well, then we need some tricks.
Claim:
ATM ≤m ALLTM . (and thus, ALLTM ∈
/ RE)
Construction: The reduction proceeds as follows. On input ⟨M, w⟩, the reduction constructs ⟨K⟩ where
K is a machine that on input x, simulates M on w for |x| steps. If, during this, M accepts, then K rejects.
Otherwise, K accepts.
Correctness: If ⟨M, w⟩ ∈ ATM then M does not accept w, and in particular, M does not accept w within
|x| steps, for all x. Thus, K accepts every input, so L(K) = Σ∗ , and so ⟨K⟩ ∈ ALLTM .
Conversely, if ⟨M, w⟩ ∈
/ ATM , then M accepts w, and therefore there exists some n ∈
such that M
accepts w within n steps. Then, for every x such that |x| > n, K rejects x, So L(K) ̸= Σ∗ , so ⟨K⟩ ∈
/ ALLTM ,
so the reduction is correct.
N
Computability: Finally, this reduction is computable for the same reasons the previous construction was
computable - we can compute ⟨K⟩ from ⟨M, w⟩, for example by hard-coding all the parts of ⟨K⟩ that are
independent of ⟨M, w⟩ into the TM computing the reduction.
2
2.2
USELESS
Let USELESS = {⟨M ⟩ : there exists a state q ∈
/ {qacc , qrej } in M that is never reached, on any input}. We
claim that USELESS ∈ co-RE \ R. First, to show that USELESS ∈ co-RE it’s enough to show that given input
⟨M ⟩, we can always reject if ⟨M ⟩ ∈
/ USELESS. Given input ⟨M ⟩, a TM can simulate M on every input in
parallel (incrementally 2 ), while keeping track of visited states. If, at any point, every state was visited,
then ⟨M ⟩ ∈
/ USELESS and we can reject. Otherwise we do not halt.
Now we want to show that USELESS ∈
/ RE. We show that by showing AT M ≤m USELESS.
Construction: The reduction machine, T , works as follows. Given input ⟨M, w⟩, T construct the machine
H that works as follows. On input x, H simulates M on w (without restricting the steps). If M accepts w,
H moves to a new state, from which it traverses every state of itself, and then accepts. If M rejects w, then
H rejects.
Correctness: If ⟨M, w⟩ ∈ AT M , then M does not accept w. Thus, H never reaches the special traversing
state, so ⟨H⟩ ∈ USELESS. Conversely, if ⟨M, w⟩ ∈
/ AT M , then M accepts w, so H always reaches the traversal
state, and thus visits every state in the machine. So ⟨H⟩ ∈
/ USELESS.
Computability: Here we face a gap that was left from the construction: how to construct a traversingstate? It’s not trivial, and it’s the sort of think we must explain in order for this answer to be correct.
The idea is to write on the tape a special symbol @, and from every state in H make a transition to
a “next” state upon reading @, (having ordered the states arbitrarily), without moving the head.3 The
traversing-state writes this special symbol on the tape, and starts the traversal.
This promises that every state is visited, but that it won’t be used before we get to the special state.
There is a small technical problem here - we also want to make sure there is some input on which H
visits qreject , but this cannot be part of the traversal. To solve this, we modify H such that on a certain x
(e.g x = ϵ), it goes straight to qreject .
Finally, we remark that the rest of the construction is computable - the simulation part is as we have
seen in class, and the traversal part is easy to compute - simply order the states of the machine and add the
appropriate transitions.
2.3
From 2017 moed B exam
The following problem is taken from last year’s exam. The school solution given here uses the language
HALTTM = {⟨M, w⟩ | M halts on w}. It is not a hard exercise to see that HALTTM ∈ RE \ R.
enumerating Σ∗ , run 1 step on the first word, then 2 steps on the first two words, 3 on the first three, and so on.
you don’t work with a TM that has a “stay” option - add the symbol on two cells, and move left and right repeatedly.
2 By
3 If
3
4
3
Rice’s Theorem
We now formulate the notion of some of the reductions we’ve encountered so far to a theorem, due to Henry
Gordon Rice.
Theorem 3.1 (Rice’s Theorem). Let P be a nontrivial semantic property of TMs, then
LP = {⟨M ⟩ : M ∈ P }
is undecidable.
We need to explain what this means, of course. A semantic property P of TMs is a set of TM’s, with the
following property: for every two TMs M1 , M2 , if L(M1 ) = L(M2 ), then M1 ∈ P iff M2 ∈ P . Intuitively, P
is a set of machines, but is defined through their languages.
A semantic property P is nontrivial if there exist two machines M1 , M2 such that M1 ∈ P and M2 ∈
/ P.
That is, P is not all of the machines, nor no machines.
To prove the theorem, we prove the following lemma.
Lemma 3.2. Let P be a nontrivial semantic property of TMs, such that T∅ ∈
/ P (where L(T∅ ) = ∅), then
AT M ≤m LP .
Proof. We show a reduction f from AT M to LP . We are given an input ⟨M, w⟩ for AT M , and we need to
construct an input f (⟨M, w⟩) = ⟨T ⟩ such that ⟨M, w⟩ ∈ AT M iff ⟨T ⟩ ∈ LP . Let H be a machine such that
H ∈ P . We know that such a machine exists, because P is nontrivial. The reduction f works as follows.
Given M, w, construct the machine T such that works as follows.
1. On input x, simulate M on w. If it halts and rejects, reject.
If it accepts, proceed to 2.
2. Simulate H on x. If it accepts, accept, if it rejects, reject (otherwise we get stuck).
Now we need to prove formally that this reduction is correct. First, we need to explain why it is computable.
This is easy, since we know we can simulate a TM on a word, so we can do it for both M on w and H on x.
Now we need to show why it works. First, if ⟨M, w⟩ ∈ AT M , then M accepts w, so L(T ) = L(H), since
H ∈ P and P is a semantic property, this means that T ∈ P , so ⟨T ⟩ ∈ LP .
Conversely, if ⟨M, w⟩ ∈
/ AT M , then T does not accept anything. So L(T ) = ∅, but T∅ ∈
/ P , so T ∈
/ P , so
⟨T ⟩ ∈
/ LP . We conclude that ⟨M, w⟩ ∈ AT M iff f (⟨M, w⟩) ∈ LP , so AT M ≤m LP .
We can now prove the theorem.
Proof of Theorem 3.1. Let P be a nontrivial semantic property. If T∅ ∈
/ P , then from the lemma we conclude
that LP ∈
/ co-RE, and in particular - undecidable.
/ co-RE, so LP ∈
/ RE, and in
Otherwise, consider LP , then from the lemma we conclude that LP ∈
particular - undecidable.
The useful thing in Rice’s theorem is clearly the lemma, not the theorem, since it gives us more information.
Example:
Let L = {⟨M ⟩ : ∀w ∈ Σ∗ , w ∈ L(M ) ⇐⇒ wwR wwR ∈ L(M )} This is a nontrivial semantic
property (make sure you understand why). Also, ⟨T∅ ⟩ ∈ L, so from the lemma we get that L ∈
/ RE.
An important point: The lemma does not tell us anything about the converse. That is, we may conclude
that L ∈
/ RE, but we do not know whether L ∈ co-RE or not. For this we need creative tricks, and there is
no general known way to tell.
5
Computability - Recitation 9
December 27, 2022
1
One More Reduction
Given two languages L1 and L2 , we say that they 10-agree if there are at least 10 distinct words w1 , . . . , w10 such that
for every 1 ≤ i ≤ 10, we have wi ∈ L1 iff wi ∈ L2 . Let’s classify the language:
L = {⟨M1 , M2 ⟩ : L(M1 ) and L(M2 ) 10-agree}
We show that L ∈ RE ∪ co-RE. We will describe two reductions. One from the language
HALTϵTM = {⟨M ⟩ : M halts on ϵ}
and one from its complement. You have seen in class that HALTϵTM ∈ RE \ R (and hence HALTϵTM ∈ co-RE \ R). We start
with the reduction HALTϵTM ≤m L, which shows that L ∈
/ co-RE.
• Construction: Given ⟨M ⟩, the reduction outputs ⟨M1 , M2 ⟩ where M1 , M2 are defined as follows. M1 immediately
accepts (for every input). M2 ignores its input, simulates M on ϵ, and if M halts, M2 accepts (otherwise, M2 runs
forever).
• Correctness: Suppose ⟨M ⟩ ∈ HALTϵTM . Then M halts on ϵ, so both M1 and M2 accept every word. We have
L(M1 ) = L(M2 ), so they 10-agree (in fact they agree on every word), hence ⟨M1 , M2 ⟩ ∈ L.
In the other direction, suppose ⟨M ⟩ ∈
/ HALTϵTM . Then M does not halt on ϵ, so M2 does not accept any word.
We have L(M1 ) = Σ∗ and L(M2 ) = ∅. They disagree on every word, so in particular they do not 10-agree, and
⟨M1 , M2 ⟩ ∈
/ L.
• Computability: M1 can be constructed with q0 = qacc . M2 can be constructed with two parts: the first part
erases the input, and the second part is a copy of M .
To show that L ∈
/ RE, we use a similar reduction from HALTϵTM . It works the same as above, except that M1 now
rejects immediately.
• Correctness: Suppose ⟨M ⟩ ∈ HALTϵTM . Then M does not halt on ϵ, so both M1 and M2 never accept (for every
input). We have L(M1 ) = L(M2 ), so they 10-agree (in fact they agree on every word), hence ⟨M1 , M2 ⟩ ∈ L.
In the other direction, suppose ⟨M ⟩ ∈
/ HALTϵTM . Then M halts on ϵ, so M2 accepts every word. We have L(M1 ) = ∅
∗
and L(M2 ) = Σ . They disagree on every word, so in particular they do not 10-agree, and ⟨M1 , M2 ⟩ ∈
/ L.
2
Rice’s Theorem
We now formulate the notion of some of the reductions we’ve encountered so far to a theorem, due to Henry Gordon Rice.
Theorem 2.1 (Rice’s Theorem). Let P be a nontrivial semantic property of TMs, then
LP = {⟨M ⟩ : M ∈ P }
is undecidable.
1
We need to explain what this means, of course. A semantic property P of TMs is a set of TM’s, with the following
property: for every two TMs M1 , M2 , if L(M1 ) = L(M2 ), then M1 ∈ P iff M2 ∈ P . Intuitively, P is a set of machines,
but is defined through their languages.
A semantic property P is nontrivial if there exist two machines M1 , M2 such that M1 ∈ P and M2 ∈
/ P . That is, P is
not all of the machines, nor no machines.
To prove the theorem, we prove the following lemma.
Lemma 2.2. Let P be a nontrivial semantic property of TMs, such that T∅ ∈
/ P (where L(T∅ ) = ∅), then AT M ≤m LP .
Proof. We show a reduction f from AT M to LP . We are given an input ⟨M, w⟩ for AT M , and we need to construct an
input f (⟨M, w⟩) = ⟨T ⟩ such that ⟨M, w⟩ ∈ AT M iff ⟨T ⟩ ∈ LP . Let H be a machine such that H ∈ P . We know that such
a machine exists, because P is nontrivial. The reduction f works as follows. Given M, w, construct the machine T such
that works as follows.
1. On input x, simulate M on w. If it halts and rejects, reject.
If it accepts, proceed to 2.
2. Simulate H on x. If it accepts, accept, if it rejects, reject (otherwise we get stuck).
Now we need to prove formally that this reduction is correct. First, we need to explain why it is computable. This is
easy, since we know we can simulate a TM on a word, so we can do it for both M on w and H on x.
Now we need to show why it works. First, if ⟨M, w⟩ ∈ AT M , then M accepts w, so L(T ) = L(H), since H ∈ P and
P is a semantic property, this means that T ∈ P , so ⟨T ⟩ ∈ LP .
Conversely, if ⟨M, w⟩ ∈
/ AT M , then T does not accept anything. So L(T ) = ∅, but T∅ ∈
/ P , so T ∈
/ P , so ⟨T ⟩ ∈
/ LP .
We conclude that ⟨M, w⟩ ∈ AT M iff f (⟨M, w⟩) ∈ LP , so AT M ≤m LP .
We can now prove the theorem.
Proof of Theorem 2.1. Let P be a nontrivial semantic property. If T∅ ∈
/ P , then from the lemma we conclude that
LP ∈
/ co-RE, and in particular - undecidable.
Otherwise, consider LP , then from the lemma we conclude that LP ∈
/ co-RE, so LP ∈
/ RE, and in particular undecidable.
The useful thing in Rice’s theorem is clearly the lemma, not the theorem, since it gives us more information.
Example:
Let L = {⟨M ⟩ : ∀w ∈ Σ∗ , w ∈ L(M ) ⇐⇒ wwR wwR ∈ L(M )} This is a nontrivial semantic property
(make sure you understand why). Also, ⟨T∅ ⟩ ∈ L, so from the lemma we get that L ∈
/ RE.
An important point: The lemma does not tell us anything about the converse. That is, we may conclude that L ∈
/ RE,
but we do not know whether L ∈ co-RE or not. For this we need creative tricks, and there is no general known way to
tell.
3
NP and Equivalent Definitions
There are two equivalent definitions for NP. The first definition is
NP =
∞
[
NTIME(nk ).
k=0
That is, NP is the set of languages that can be decided by an NTM in polynomial time.
2
Example:
define:
A positive integer n is called composite if there are integers a, b > 1 such that ab = n. Let Σ = {0, 1} and
COMPOSITE = {w : w is a composite number in binary}
An NTM N can decide COMPOSITE in polynomial time as follows. Suppose that the input w is a binary representation
of the number n. Note that the input size is |w| = ⌊log n⌋ + 1. N uses its nondeterminism to guess a factor a between 2
and n − 1. It writes the first bit of a nondeterministically (0 or 1) on the tape, then N moves to the right and guesses the
next bit of a, and so on. Note that numbers up to n can be represented using at most |w| bits, so it takes N a polynomial
number of steps to write ⟨a⟩. Once N has written ⟨a⟩, it checks whether a is a factor of n. If so, N accepts. Otherwise,
it rejects.
Checking whether a is a factor of n can be done deterministically in polynomial time (for example, using long division
and checking whether there is a remainder). Therefore COMPOSITE ∈ NP.
In this algorithm we had to nondeterministically guess a. If instead we were given a, we could verify that it is indeed
a factor, and deduce that w ∈ COMPOSITE in polynomial time. We then say that ⟨a⟩ is a witness to the fact that
w ∈ COMPOSITE. We show a second definition of NP which generalizes this observation.
Remark: We could have defined N to guess both a and b, and then accept iff ab = n. In that case the witness is the
pair ⟨a, b⟩. Note that this is still polynomial in |w|.
We say that at TM V is a verifier for a language L if
L = {w : There exists c such that ⟨w, c⟩ ∈ L(V )}.
In our example, the verifier is a machine V that decides whether the witness a is a factor of n.
The second, equivalent definition of NP is:
NP = {L : There is a verifier V for L that runs on input ⟨w, c⟩ in time polynomial in |w|}
We require V to run in time polynomial in |w| because we are only interested in polynomial witnesses. Equivalently,
we could define NP to be the set of all languages L that have a polynomial verifier. That is, L ∈ NP iff there is a machine
V that runs in polynomial time and:
L = {w : There is c such that |c| is polynomial in |w| and ⟨w, c⟩ ∈ L(V )}
3.1
Equivalence
Theorem 3.1. A language L can be decided by an NTM in polynomial time iff there is a polynomial verifier for L.
Proof. In the first direction, suppose there is a polynomial verifier V for L. Since it’s polynomial, there is k ∈ N such
that V runs in at most |w|k steps, where |w| is the input size. Define an NTM N for L that works as follows: for an
input w, guess a witness c of length at most |w|k and run V on ⟨w, c⟩. If V accepts, N accepts. Otherwise, N rejects.
In the other direction, suppose there is an NTM N that runs in polynomial time and decides L. Construct a polynomial
verifier V for L: V accepts ⟨w, r⟩ iff r is an accepting run of N on w. Since N runs in polynomial time, |r| is polynomial
in |w|. V verifies that r is a valid accepting run by checking that the initial configuration is q0 w, the last configuration
contains qacc , and each configuration yields the next one using a transition of N .
3
Computability - Recitation 10
January 2, 2023
1
1.1
NP and NP-Completeness
Reminders
We have seen two equivalent definitions for NP. The first definition is
NP =
∞
[
NTIME(nk ).
k=0
That is, NP is the set of languages that can be decided by an NTM in polynomial time.
For the second definition, we say that at TM V is a verifier for a language L if
L = {w : There exists c such that ⟨w, c⟩ ∈ L(V )}.
A language L is in NP if it has a verifier V which runs on input ⟨w, c⟩ in time polynomial in |w|.
A polynomial time reduction is a mapping reduction computable by a TM that runs in polynomial time in its input.
If there is a polynomial time reduction from a language K to a language L, we write K ≤p L.
We have also seen that
Lemma 1.1. If L ∈ P and K ≤p L then K ∈ P.
1.2
NP-Completeness
Definition 1.2. A language L is NP-Hard if every K ∈ NP it holds that K ≤p L.
You will prove in the exercise that the relation ≤p is transitive. Hence, if L is NP-Hard and L ≤p J then J is also
NP-Hard.
Definition 1.3. If L is NP-Hard and also L ∈ NP, we say that L is NP-Complete.
We do not know whether P = NP. However, we do know that the NP-Complete languages are the hardest languages
in NP. The following theorem formalizes this:
Theorem 1.4. If there exists an NP-Complete language in P then P = NP.
Equivalently:
If P ̸= NP then every NP-Complete language is not in P.
Proof. We prove the first form of the statement. Assume that L ∈ P is NP-Complete. Consider a language K ∈ NP. Since
L is NP-Hard, K ≤p L. Since L ∈ P, it follows from Lemma 1.1 that K ∈ P. Thus, NP ⊆ P. Since P ⊆ NP, we conclude
that P = NP.
Our goal in this recitation is to get to know many new exciting languages, and use polynomial reductions to prove
that these languages are NP-Complete.
1
2
The clique, vertex cover and dominating set problems
2.1
CLIQUE
Recall that a clique in a graph G = (V, E) is a set C ⊆ V such that for every x, y ∈ C where x ̸= y, we have {x, y} ∈ E.
We define the language.
CLIQUE = {⟨G, k⟩ : The graph G has a clique of size k.}
You will see in class that CLIQUE is NP-Complete. Today, we will use this fact.
2.2
VC
We now consider the vertex-cover problem. Given a graph G = ⟨V, E⟩, a vertex cover in G is a set C ⊆ V such that for
every e ∈ E there exists x ∈ C such that x ∈ e. That is, a vertex cover is a set of vertices that touch every edge in the
graph.
We consider the problem VC = {⟨G, k⟩ : G has a vertex cover of size at most k}.
Proposition 2.1. VC is NP-Complete.
Proof. First, we show that VC ∈ NP. We do this by describing a polynomial time verifier for VC. Given input ⟨G, k⟩, the
witness that ⟨G, k⟩ ∈ VC is a set of vertices S. The verifier checks that |S| ≤ k, and that for every e ∈ E there exists
v ∈ S such that e touches v. This is done as follows: first, the verifier counts the size of S (which is at most the size of
V , and thus takes polynomial time in |⟨G⟩|). Then, the verifier traverses all the edges in the graph, and for each edge,
compares it against all the vertices in S. This takes O(|E| · |S|) = O(|E| · |V |), which is polynomial in the size of G. The
verifier accepts iff indeed |S| ≤ k and S is a vertex cover.
We now proceed to show that VC is NP-Hard, by showing a reduction from CLIQUE (which by now we know is
NP-Hard).
Construction: Given input ⟨G, k⟩, the reduction outputs ⟨G, n − k⟩, where G is the complement graph of G (that is,
an edge exists in G iff it does not exist in G), and n is the number of vertices in G.
Runtime: This is clearly polynomial, since all we need to do is traverse every edge and “flip” it. That takes O(|V |2 )
flips, and we assume a polynomial access to each edge. Also, computing n − k given k can be done in polynomial time.
Correctness: This is the interesting part. We claim that G has a clique of size k iff G has a vertex cover of size at
most n − k. Let G = ⟨V, E⟩. Then, G = ⟨V, E⟩. If G has a clique C of size k, consider the set C. Since |C| = k, then
|C| = n − k. We claim that C is a vertex cover in G. Indeed, let e = {x, y} ∈ E. If x, y ∈
/ C, then x, y ∈ C, but
{x, y} ∈ E, so {x, y} ∈
/ E, so C is not a clique, which is a contradiction. We conclude that every edge has a vertex in C,
so C is a vertex cover of size n − k, so ⟨G, n − k⟩ ∈ VC.
Conversely, If G has a vertex cover S of size n−k (why don’t we need “at most”?), consider the set S. Since |S| = n−k,
/ E, then {x, y} ∈ E, but x, y ∈
/ S, so S is
then S = k. We claim that S is a clique in G. Indeed, let x, y ∈ S. If {x, y} ∈
not a vertex cover of G, which is a contradiction. We conclude that every edge in S exists, so S is a clique of size k, so
⟨G, k⟩ ∈ CLIQUE.
2.3
DS
Given an undirected graph G = ⟨V, E⟩, a dominating set in G is a set D ⊆ V such that for every v ∈ V , either v ∈ D
or there exists u ∈ D such that {u, v} ∈ E. That is, a set of vertices that is in distance 1 from every other vertex. Let
DS = {⟨G, k⟩ : G has a dominating set of size k}.
Proposition 2.2. DS is NP-Complete.
2
Proof. First, we show that DS ∈ NP. We do this by describing a polynomial time verifier for DS: the witness is the
dominating set in the graph, and given a dominating set D ⊆ V , it is easy to verify that every vertex is indeed connected
to a vertex in D, by going over all the edges that touch vertices in D, and making sure we reach all the vertices in V .
Thus, DS ∈ NP.
To prove that DS is NP-Hard, we should probably show a reduction from some NP-Hard problem. Any suggestions?
We show that VC ≤p DS.
N
Construction: Given input G = ⟨V, E⟩ and k ∈ , T first goes over the vertices and counts all the vertices that are
not connected to any edge. Denote this number by f . Next, T outputs the pair ⟨G′ , k ′ ⟩ where G′ = ⟨V ′ , E ′ ⟩ is obtained
from G as follows. For every edge e ∈ E, we add a new vertex ve , and define V ′ = V ∪ {ve }e∈E . As for the edges, for
every edge e = {u, v} ∈ E we add two edges: {v, ve } and {u, ve }. We thus have:
E ′ = E ∪ {{v, ve }, {u, ve } : e = {u, v} ∈ E}
Finally, we define k ′ = k + f .
Runtime: Finding isolated vertices takes O(|E|) at most, and constructing the new graph involves adding |E|
vertices, and connecting them with new 2|E| edges. Even if we have to construct an edge matrix, this is polynomial
(indeed, quadratic) in |V | + |E|. Finally, computing k + f is clearly polynomial, so the reduction is polynomial.
Correctness: For the first direction, assume G has a vertex cover C of size k. Let F be the set of isolated vertices
in V , we claim that F ∪ C is a dominating set of size at most k ′ = k + f in V ′ . Indeed, the size of F ∪ C is at most k + f .
Let v ∈ V ′ . If v ∈ F then v ∈ F ∪ C. If v ∈ V \ F , then since C is a vertex cover, either v ∈ C or v is in an edge {u, v}
such that u ∈ C. Thus, C ∪ F is in distance 1 from V . If v ∈ V ′ \ V , then v = ve for some edge {x, y} ∈ E. Since C is a
vertex cover, then w.l.o.g x ∈ C, since there is an edge {x, ve } by our construction, then v is in distance 1 from C. We
conclude that C is a dominating set in G′ of size at most k ′ , so ⟨G′ , k ′ ⟩ ∈ DS.
Conversely, assume G′ has a dominating set Ds of size at most k ′ . First, observe that every isolated vertex in V
(which is also isolated in V ′ ) must be in Ds. Let D be the set Ds without the isolated vertices. Thus, |D| ≤ k ′ − f ≤ k.
Next, we claim that w.l.o.g all the vertices in D are from V (and not V ′ \ V ). Indeed, assume D contains a vertex ve
for some e = {x, y} ∈ E. then ve is connected with a single edge only to x and y. Thus, replacing ve with x may touch
only more vertices (since x, y, and ve are still in distance 1 from D).
Now that we assume D ⊆ V , we claim that D is a vertex cover in G. Indeed, consider an edge {x, y} ∈ E, then either
x or y are in D, since they are the only vertices that are in distance 1 from ve , and D is a dominating set.
We conclude that G has a vertex cover of size at most k, and we are done.
3
CNF and 3-CNF satisfiability (if time permits)
A CNF (conjunctive normal form) formula is a conjunction of disjunction of literals (”AND of ORs”). For example,
ϕ = (x1 ∨ x2 ) ∧ (x4 ∨ x2 ∨ x3 ∨ x5 ∨ x1 ) ∧ (x5 )
is in CNF. Each disjunction, such as (x1 ∨ x2 ), is called a clause. A 3-CNF formula is a CNF formula in which every
clause has exactly 3 literals. For example, ϕ above is CNF but not in 3-CNF. The following is 3-CNF:
ϕ′ = (x1 ∨ x2 ∨ x3 ) ∧ (x4 ∨ x2 ∨ ∨x1 ) ∧ (x5 ∨ x4 ∨ x4 )
A formula is called satisfiable if there is an assignment to its variables such that the formula evaluates to true. The
satisfiability problem is to determine whether a formula is satisfiable. Formally, define the languages:
CNF-SAT = {⟨ϕ⟩ : ϕ is a satisfiable CNF formla}
3-SAT = {⟨ϕ⟩ : ϕ is a satisfiable 3-CNF formla}
3
These languages are NP-Complete. They are in NP, because given an assignment (witness), a TM can verify in
polynomial time that the formula is indeed satisfied. You will see in class that CNF-SAT is NP-Hard.
Since every 3-CNF formula is CNF, there is a very simple reduction from 3-SAT to CNF-SAT (what does it need to
do?). We now show a polynomial reduction in the other direction.
Proposition 3.1. We have CNF-SAT ≤p 3-SAT.
Proof. We show how to translate CNF to 3-CNF in such a way that preserves satisfiability. Clauses smaller than 3 can
be padded (for example, (x1 ∨ x2 ) is equivalent to (x1 ∨ x2 ∨ x2 ). A larger clause can be replaced with multiple clauses
of size 3 that are constructed to be equivalent to it, as shown below.
Construction: Given input ⟨ϕ⟩, where ϕ is a CNF formula, the reduction constructs a 3-CNF formula ϕ′ and returns
⟨ϕ ⟩. The formula ϕ′ is built as follows. For every clause ct in ϕ:
′
1. If ct contains exactly 3 literals, add it to ϕ′ .
2. Otherwise, if ct contains fewer than 3 literals:
(a) If ct contains exactly 1 literal, ct = (l1 ): add the clause (l1 ∨ l1 ∨ l1 ) to ϕ′ .
(b) If ct contains exactly 2 literals, ct = (l1 ∨ l2 ): add the clause (l1 ∨ l2 ∨ l2 ) to ϕ′ .
t
and add the following
3. If ct contains more than 3 literals, ct = (l1 ∨ l2 ∨ . . . ∨ lk ), introduce new variables y1t , . . . , yk−1
′
clauses to ϕ :
t
t
t
t
∨ yk−1
) ∧ (lk ∨ yk−1
∨ yk−1
)
(l1 ∨ y1t ∨ y1t ) ∧ (l2 ∨ y1t ∨ y2t ) ∧ (l3 ∨ y2t ∨ y3t ) ∧ · · · ∧ (lk−1 ∨ yk−2
Runtime: For each clause, the reduction writes a polynomial number of clauses of size 3 in the output. Therefore
the reduction runs in time polynomial in ⟨ϕ⟩.
Correctness: First, assume that ⟨ϕ⟩ ∈ CNF-SAT. Denote the variables of ϕ by x1 , . . . , xn , and let a be a satisfying
assignment for ϕ. We prove that ⟨ϕ′ ⟩ ∈ 3-SAT by building a satisfying assignment b for it. First, let b agree with a on
the variables x1 , . . . , xn .
Now, consider a clause ct = (l1 ∨ . . . ∨ lk ) in ϕ. Since ct is satisfied by a, there exists some i such that li is a literal
t
satisfied by a. The assignment b then assigns the value TRUE to the variables y1t , . . . , yi−1
, and FALSE to the variables
t
t
yi , . . . , yk−1 . We note that this assignment satisfies all of the clauses that our construction derived from ct . Hence, the
assignment b satisfies ϕ′ .
For the other direction, assume that ⟨ϕ′ ⟩ is satisfiable by some assignment b. Let a be the restriction of b to the
variables of ϕ. We claim that a satisfies ϕ, and therefore ⟨ϕ⟩ ∈ CNF-SAT.
Consider a clause ct = (l1 ∨ . . . ∨ lk ) of ϕ. Suppose by contradiction that a does not satisfy ct . Hence, none of the
literals l1 , . . . , lk are satisfied by b. However, b does satisfy the corresponding clauses in ϕ′ :
t
t
t
t
(l1 ∨ y1t ∨ y1t ) ∧ (l2 ∨ y1t ∨ y2t ) ∧ (l3 ∨ y2t ∨ y3t ) ∧ · · · ∧ (lk−1 ∨ yk−2
∨ yk−1
) ∧ (lk ∨ yk−1
∨ yk−1
)
Since l1 , . . . , lk are not satisfied by b, this implies that b must satisfy the clauses
t
t
t
t
(y1t ∨ y1t ) ∧ (y1t ∨ y2t ) ∧ (y2t ∨ y3t ) ∧ · · · ∧ (yk−2
∨ yk−1
) ∧ (yk−1
∨ yk−1
)
t
(why?). Observe that b must assign TRUE to y1t and FALSE to yk−1
. Thus, there exists i such that b assigns TRUE to
t
t
t
yi and FALSE to yi+1 (why?). Consequently, b does not satisfy the clause (yit ∨ yi+1
), resulting in contradiction. Hence,
a satisfies ϕ.
4
Computability - Recitation 11
January 9, 2023
1
Completeness in co-NP
Because of the asymmetric nature of NTMs, if a language L is NP, it is not clear whether L ∈ NP. From the perspective
of verifiers, we know that if L ∈ NP, then we can “convince” that x ∈ L by providing a witness, but we don’t necessarily
know how to convince that x ∈
/ L (think of SAT for example. We don’t know of a short way to convince that a Boolean
formula is not satisfiable). This is the reason we define the class co-NP of languages whose complement is in NP. It is
generally believed that NP ̸= co-NP, but proving that would prove that P ̸= NP, as appears in the exercise. Similar to
NP, we can defined co-NP-hardness and completeness.
Definition 1.1. A language L is co-NP-hard if for every K ∈ co-NP it holds that K ≤p L. It is co-NP-complete if also
L ∈ co-NP.
Claim 1.2. L is NP-hard iff L is co-NP-hard.
Corollary 1.3. L is NP-complete iff L is co-NP-complete.
Proof. On one direction, if L is NP-hard, then for every K ∈ co-NP, it holds that K ≤p L. The same reduction shows
that also K ≤p L, thus L is co-NP-hard. The other direction is similar.
Example. A boolean formula φ is called a tautology if every assignment of true/false values to variables yields a true
value. It is called a contradiction if every assignment yields a false value. We claim that the languages
CONTRADICTION = {⟨φ⟩) : φ is a contradiction}
TAUTOLOGY = {⟨φ⟩) : φ is a tautology}
are co-NP-complete. First, they are polynomially reducible to each other by f (⟨φ⟩) = ¬(φ) (the same reduction works in
both directions). Second, we know that SAT is NP-complete. Therefore, SAT = CONTRADICTION is co-NP-complete 1 .
Remark. We can also define (NP ∩ co-NP)-completeness, however, it is not known whether such a language exists.
In Figure 1, the 3 possibilities for the relationship between P, NP and co-NP are depicted.
1 Strictly speaking, this is not an equality, since SAT also contains all strings that are not a well-formatted encoding of a formula. This,
however, is not a big issue, and there are several ways to fix this (think how!).
1
Option 1: If P = NP
Option 2: If P ≠ NP, but coNP = NP Option 3: If coNP ≠ NP
NPC = coNPC
P = NP = coNP
≈ NPC = coNPC
NP = coNP
P
NPC
coNPC
NP
coNP
P
Figure 1: The 3 possibilities for the relationship between P, NP and co-NP. Make sure you understand all the relationships
(consult the exercise, for the reason that NPC is disjoint from coNP in Option 3). The reason for the ≈ notation in Option
1 is that NPC does not contain ∅ and Σ∗ . Note that it is not known whether P = NP ∩ co-NP or not.
2
Problems involving Hamiltonian paths
Recall that a Hamiltonian Path in a graph G is a path that passes through each of G’s vertices exactly once. Similarly,
a Hamiltonian Cycle is a cycle that passes through each vertex exactly once.
We define six languages related to the notion of Hamiltonian paths and cycles. These are:
• D-ST-HAMPATH = {⟨G, s, t⟩ : G is a directed graph that has a Hamiltonian path from s to t}
• D-HAMPATH = {⟨G⟩ : G is a directed graph that has a Hamiltonian path}
• D-HAMCYCLE = {⟨G⟩ : G is a directed graph that has a Hamiltonian cycle}
• The languages U-ST-HAMPATH, U-HAMPATH and U-HAMCYCLE, defined analogously for undirected graphs.
It turns out that all six of these languages are NP-complete. In this recitation, we will prove that D-ST-HAMPATH is
NP-complete. In the exercise, you will use this fact in order to show that a few other of these languages are NP-complete
as well.
To show that D-ST-HAMPATH is in NP, it is enough to show that there exists a polynomial-time verifier for it. This
is easy: a verifier gets as input ⟨G, s, t⟩, and a sequence of n vertices (where n is the number of vertices in G). It then
checks that the sequence is a Hamiltonian path in G. Clearly this can be done in polynomial time.
We now show that D-ST-HAMPATH is NP-hard, by showing that 3-SAT ≤p D-ST-HAMPATH. The notes are taken
from M. Sipser’s “Theory of Computation”. Sipser calls the D-ST-HAMPATH language by the name ”HAMPATH”.
2
We will now use the fact that D-ST-HAMPATH is NP-hard to prove that U-ST-HAMPATH is NP-hard as well.
Theorem 2.1. U-ST-HAMPATH is NP-complete.
Proof. We need to show two things. First, that U-ST-HAMPATH ∈ NP, and second, that it is NP-hard.
To show that U-ST-HAMPATH is in NP, it is enough to show that there exists a polynomial-time verifier for it. This
is easy: a verifier gets as input ⟨G, s, t⟩, and a sequence of n vertices (where n is the number of vertices in G). It then
checks that the sequence is a Hamiltonian path in G. Clearly this can be done in polynomial time.
We now show that U-ST-HAMPATH is NP-hard, by showing that D-ST-HAMPATH ≤p U-ST-HAMPATH
Construction: Let ⟨G, s, t⟩ be an input for D-ST-HAMPATH, with G = ⟨V, E⟩. The reduction T constructs the
input ⟨G′ , sin , tout ⟩ to U-ST-HAMPATH, where G′ is defined as follows.
1. For every vertex v ∈ V , T introduces the vertices vin , vmid and vout , and the edges {vin , vmid } and {vmid , vout }.
2. For every edge (u, v) ∈ E we define the edge {uout , vin }.
Thus, G′ = ⟨V ′ , E ′ ⟩ where V ′ = {vin , vout , vmid : v ∈ V } and E ′ = {{vin , vmid }, {vmid , vout } : v ∈ V }∪{{uout , vin } : (u, v) ∈ E}.
Runtime: Clearly the reduction is polynomial, since the size of G′ is 3 times the size of G, and computing it is
straightforward.
Correctness: For the easy direction, assume ⟨G, s, t⟩ ∈ D-ST-HAMPATH. Then, let s, u1 , u2 , ..., uk , t be a directed
Hamiltonian path in G. The path induces the following path in G′ :
sin , smid , sout , u1in , u1mid , u1out , ..., ukin , ukmid , ukout , tin , tmid , tout
Since the original path contained all the vertices, so does the induced path. Thus, ⟨G′ , sin , tout ⟩ ∈ U-ST-HAMPATH.
For the hard direction, assume that ⟨G′ , sin , tout ⟩ ∈ U-ST-HAMPATH. Thus, there is a Hamiltonian path in G′ , but
we do not know what this path “looks like”. We proceed with the following claim. A Hamiltonian path that starts at sin
and ends at tout does not contain a directed traversal of the form (vin , uout ). That is, we never go “backward” on edges.
The proof of this claim is by contradiction. Assume that there is such a directed traversal, and let (vin , uout ) be the
first one in the path. Since the path is Hamiltonian, we must visit vmid at some point. If we already visited it, then we
must have visited it from vout (since we are just now visiting vin ). But how did we reach vout ? it must have been from
some xin (since these are the only possible edges left). This is a contradiction to the minimality of (vin , uout ).
Thus, we must visit vmid after uout , but that means we reach it from vout , at which point we get stuck, and in
particular we cannot end in tout .
We conclude that the edges in the path are of the forms (vin , vmid ), (vmid , vout ) and (vout , uin ). Thus, the Hamiltonian
path is of the form
sin , smid , sout , u1in , u1mid , u1out , ..., ukin , ukmid , ukout , tin , tmid , tout
which can be easily mapped to a Hamiltonian path in G.
Remark: We used the word “we” in this proof a lot. Usually, this is legitimate writing, even in mathematical texts.
However, if you are not careful, it can lead to trouble (from ambiguity to computing uncomputable functions). So use it
wisely.
8
Computability - Recitation 12
January 17, 2023
1
Savitch’s Theorem
Recall that we have seen the following containments: NP ⊆ PSPACE, and PSPACE ⊆ EXPTIME. Similarly, NPSPACE ⊆
NEXPTIME and clearly PSPACE ⊆ NPSPACE. Similarly to the famous NP v.s. P question, we wonder what is the
relation between PSPACE and NPSPACE? Quite surprisingly, in the space-case, we actually have a definite answer, which
is obtained via Savitch’s theorem.
Savitch’s theorem (due to Walter Savitch, 1970) is one of the earliest results on space complexity, and one of the few
conclusive results we have in complexity theory, and is stated as follows.
Theorem 1.1 (Savitch). For every function f :
N → N such that f (n) = Ω(log n), it holds that
NSPACE(f (n)) ⊆ SPACE(f 2 (n))
We will actually prove the theorem under the condition f (n) ≥ n. This is only to slightly simplify things. Next week
we will define logarithmic space complexity, and see that that the proof works for f (n) = Ω(log n).
Another simplifying assumption we make, is that f (n) is constructible in O(f (n)) space. That is, there is a TM M
that takes as input 1n and halts with f (n) on the tape, while using at most O(f (n)) tape cells. This assumption actually
loses generality, but there are ways to overcome this problem (see Sipser for details).
→
be a function, and let N be an NTM deciding a language A in NSPACE(f (n)). We construct a TM
Let f :
M that decides A in SPACE(f 2 (n)). Before we start the proof, let’s understand the problem. Given a word w of length
n, every run of N on w terminates after using at most f (n) cells. From configuration-counting arguments, every run of
N on w runs for at most 2O(f (n)) steps. To simulate this naively with a deterministic TM, we need to encode runs, or at
least encode the “address” of a run – the nondeterministic choices that need to be made in order to trace the run. Since
the depth of the run tree may be up to 2O(f (n)) , so is the length of an address in the run tree. Thus, a naive simulation
may take up to 2O(f (n)) space, which is too much.
Instead, we use a different approach. Assume N has a single accepting configuration on w. Then, M accepts w iff
there exists a “path” in the “configuration graph” from the initial configuration c0 , to the accepting one cacc , where this
path is of length t. Thus, there must exist a configuration cm such that the run reaches cm from c0 in exactly 2t steps,
and reaches cacc from cm in exactly 2t steps. We can now recursively solve these smaller problems. What do we need
to write on the tape? At every level of the recursion, we need to write two configurations, and t. A configuration is of
length O(f (n)), and 1 ≤ t ≤ 2O(f (n)) , so we can encode t in O(f (n)) cells. Thus, at every level we need O(f (n)) cells.
Furthermore, at every level, t is half what it was in the level before. Since initially t = 2O(f (n)) , we need only O(f (n))
levels. We conclude that we need O(f 2 (n)) space.
N N
N N
→
be a function, and let N be an NTM deciding a language A in NSPACE(f (n)). We construct a
Proof. Let f :
TM M that decides A in SPACE(f 2 (n)).
We start by describing the procedure can-yield(c1 , c2 , t), that takes in two configurations c1 , c2 and a number t (in
binary), and outputs whether N can get from configuration c1 to configuration c2 in at most t steps. The procedure
works as follows:
can-yield(c1 , c2 , t)
1. If t = 1, then test directly whether c1 = c2 or whether c1 yields c2 in one step according to the (nondeterministic)
rules of N . Accept if either test succeeds; reject if both fail.
1
2. If t > 1, then for every configuration cm of N on w using space f (n):
(a) Run can-yield(c1 , cm , 2t ).
(b) Run can-yield(cm , c2 , 2t ).
(c) If both accept, then accept.
3. Reject.
Assume w.l.o.g that N has a single accepting configuration on every word w. We do not lose generality, since every
machine N has such an equivalent machine (that also works in NSPACE(f (n))) that is obtained by changing the accepting
state to a state that erases all the tape before accepting. Let d ∈ such that N has at most 2d·f (n) configurations.
We now construct M as follows. First, M computes f (n) within O(f (n)) space and writes the result. Then, M
simulates can-yield(c0 , caccept , 2d·f (n) ), where c0 is the initial configuration of N on w.
In order to simulate can-yield, M keeps track of every level of the recursion by holding c1 , c2 , t and whether the first
call had accepted (if it is a simulation of 2.b in the algorithm). In total, we need 3O(f (n)) + 1 = O(f (n)) space at every
level. The recursion depth is log t = O(f (n)), and thus M uses at most O(f 2 (n)) space.
Finally, the procedure can-yield clearly determines whether c2 is reachable from c1 in t steps, and thus our machine
decides the correct language.
N
Now assume f is a polynomial. Then, f 2 (n) is also a polynomial. This implies the following very important corollary.
Corollary 1.2. PSPACE = NPSPACE.
Also, since PSPACE is a deterministic class, it is closed under complementation (prove it!), so we have that
NPSPACE = PSPACE = co–PSPACE = co–NPSPACE
2
PSPACE and TQBF
We show that a language called TQBF is PSPACE-complete.
2.1
TQBF in PSPACE
For this section, we identify 0 with False and 1 with True. A quantified Boolean formula is a boolean formula preceded
by ∃ and ∀ quantifiers on the variables. A fully quantified formula is one such that every variable is under the scope of
a quantifier. A fully quantified formula is always either true of false.
Example: The formula ∀x∃y((x ∨ y) ∧ (x ∨ y)) is true, since if x = 0 we can take y = 1, and if x = 1 we can take y = 0.
On the other hand, the formula ∃y∀x((x ∨ y) ∧ (x ∨ y)) is false. Since for both y = 0 and for y = 1 we can find a x
(namely x = y ) such that one of the conjuncts does not hold.
We define the language
TQBF = {hϕi : ϕ is a true fully quantified Boolean formula.}
This language is sometimes known as QSAT (quantified SAT).
We want to show that TQBF ∈ PSPACE. We devise the following recursive algorithm (TM) T .
T : On input hϕi, a fully quantified Boolean formula:
1. If ϕ contains no quantifiers, then it is an expression with only constants, so evaluate ϕ and accept if it is true;
otherwise, reject.
2. If ϕ equals ∃xψ, recursively call T on ψ, first with 0 substituted for x and then with 1 substituted for x. If either
result is accept, then accept; otherwise, reject.
2
3. If ϕ equals ∀xψ, recursively call T on ψ, first with 0 substituted for x and then with 1 substituted for x. If both
result is accept, then accept; otherwise, reject.
Algorithm T obviously (well, you can prove by induction) decides TQBF. To analyze its space complexity we observe that
the depth of the recursion is at most the number of variables m. Indeed, we make (at most) two recursive calls for every
quantifier. At each level in the recursion we need to store the formula with the replaced variables, so the total space used
is O(n). At the bottom of the recursion, we need to evaluate the formula, which takes another O(n) space, where n is
the length of the formula. Therefore, T runs in space O(m · n) = O(n2 ) (since m = O(n)).
2.2
Encoding configurations using Boolean formulas
Towards proving that TQBF is PSPACE-complete, we need to give some background on how to encode configurations of a
Turing-machine using Boolean formulas. This section might be a bit cumbersome, but after that we will have great tools
in our hands!
Let M be a TM, and let s be a tape size. We encode a configuration that uses at most s tape cells using Boolean
variables as follows:
• For each i ∈ [s] and a ∈ Γ, we have a variable xi,a , such that xi,a = 1 iff a is written in the i-th cell.
• For each i ∈ [s], we have variable yi , such that yi = 1 iff the head of the machine is over the i-th cell.
• For each q ∈ Q, we have a variable zq such that zq = 1 iff the configuration in state q.
We denote the tuple of all those variables by c. We claim the following “encoding” theorem.
Theorem 2.1.
1. There exists a Boolean formula ϕvalid (c), that evaluates to 1 iff c is a valid encoding of a configuration.
2. There exists a Boolean formula ϕ(c1 , c2 ), that evaluates to 1 iff c2 is a consecutive configuration to c1 .
Moreover, both formulas can be calculated in time polynomial in s.
Proof.
1. We define
ϕvalid (c) =
^ _
i∈[s] a∈Γ

xi,a ∧
^
b∈Γ\{a}

xi,b  ∧
_
i∈[s]

yi ∧
^
j∈[s]\{i}

yj  ∧
_
q∈Q

zq ∧
^
r∈Q\{q}
and one should note that the length of the formula is O(s2 ) (Q and Γ are both constants).
2. For each i ∈ [s], a ∈ Γ, q ∈ Q, if δ(q, a) = (r, b, R) then we define

2
∧ zr2 ∧
ψi,a,q (c1 , c2 ) = (x1i,a ∧ yi1 ∧ zq1 ) → x2i,b ∧ yi+1
^
^
j∈[n]\{i} d∈Γ

zr 

x1j,d ↔ x2j,d 
and if δ(q, a) = (r, b, L), we do the same thing with i − 1. We remark several exceptions in the definition of ψi,a,q :
(a) If the head is on the first place, and the move is L, then the head stays.
(b) If the head is on the s-th place, and the move is R, then there is no valid consecutive configuration.
(c) If the configuration is accepting, then we require that the configuration stays the same.
Finally, we define
ϕ(c1 , c2 ) = ϕvalid (c1 ) ∧ ϕvalid (c2 ) ∧
^ ^ ^
i∈[s] a∈Γ q∈Q
and one should note that the length of the formula is O(s2 ).
3
ψi,a,q (c1 , c2 )
Remark.
2.3
A similar construction also works for an NTM.
TQBF is PSPACE-hard
So now we know that TQBF is in PSPACE. In fact, TQBF is PSPACE-complete, but we have not defined yet what it
means to be PSPACE-complete. Well, PSPACE-hardness is defined with respect to polynomial time reductions. The
reason is that we want to study the relation between P and PSPACE, so any reduction stronger than P (such as PSPACEreductions), will not give us any information regarding problems in P, since we can solve the problems mid-reduction.
This is a delicate point, and we suggest you go back to it after you feel comfortable with the rest of the material.
Definition 2.2. A language K is PSPACE-hard if for every L ∈ PSPACE it holds that L ≤p K.
Theorem 2.3. TQBF is PSPACE-hard.
Proof. We show that L ≤p TQBF for every L ∈ PSPACE. Let L ∈ PSPACE, and let M be a TM that decides L using at
most s(n) space, where s is some polynomial. Assume w.l.o.g that M has a single accepting configuration on every word
w. We do not lose generality, since every machine M has such an equivalent machine (that is also in PSPACE) that is
obtained by changing the accepting state to a state that erases all the tape before accepting.
Given a word w, our reduction will output a formula η, such that η has value true iff M accepts w. How this is done?
Well, using the tools that we developed in the last section.
Let n = |w|, s = s(n), c0 the start configuration, and cacc the accepting configuration. A very naı̈ve attempt would
be to define
!
t−1
^
ϕ(ci , ci+1 ) .
∃c1 , c2 , . . . , ct (c1 = c0 ) ∧ (ct = cacc ) ∧
i=1
However, the run time t can be exponential in s (thus, in n), and this would yield a formula of exponential length. We
can use recursion to solve this issue.
M accepts a word w iff there is a path of length at most t from c0 to cacc , where t is the runtime of M , and is
exponential in s(n). This happens iff there is a path from c0 to some cm of length 2t , and a path from cm to cacc of length
t
2 (we may assume w.l.o.g. that t is a power of two).
First attempt: The first attempt at a reduction is a wrong one, but is in the right direction. We construct, inductively,
a formula ϕt (c0 , cacc ) which states that cacc is reachable from c0 within at most t steps. More generally, we construct the
formula ϕk (c1 , c2 ) which states that c2 is reachable from c1 within at most k steps. The formula is constructed as follows.
First, if k = 1, then ϕ1 (c1 , c2 ) is simply the formula ϕ(c1 , c2 ) that we constructed earlier, stating that c2 is consecutive
to c1 . Then, we define:
ϕk (c1 , c2 ) = ∃cm ϕk/2 (c1 , cm ) ∧ ϕk/2 (cm , c2 )
Brilliant, is it not? The formula simply asks if there exists a configuration that functions as the middle configuration of
the run. Clearly η = ϕt (c0 , cacc ) is true iff M accept w. Also, since t is single-exponential in s(n) (that is, t = 2O(s(n)) ),
then the recursion depth in constructing the formula is polynomial. So what’s wrong?
Well, while the recursion depth is polynomial, the construction tree of the formula has degree 2, which means that it’s
a binary tree of polynomial depth, so it has an exponential number of nodes. That is, the formula is still too big. How
can we overcome this?
Correct Solution: Here comes a clever trick, which is the crux of the proof. Instead of asking whether there is a cm
that works with c1 and with c2 , we combine these two cases into one, as follows:
ϕk (c1 , c2 ) = ∃cm ∀c3 , c4 (((c3 = c1 ) ∧ (c4 = cm )) ∨ ((c3 = cm ) ∧ (c4 = c2 ))) → ϕk/2 (c3 , c4 )
Now the degree of the tree is 1, and it is still of polynomial depth, so the length of ϕ is polynomial, and we are done.
4
Remark. Note that we required that a TQBF would have all its quantifiers in the beginning (this is sometimes called
prenex normal form). However, our construction in not in such a form, because of the condition on c3 and c4 .
This, however, is not a concern, because we can push the quantifiers out using the rules α → ∃xβ ≡ ∃x(α → β) and
α → ∀xβ ≡ ∀x(α → β).
5
Computability - Recitation 13
21/1/23 - 28/1/23
1
The Time Hierarchy Theorem
Let’s review this great theorem.
N N
→ , where t(n) = Ω(n log n), is called time-constructible, if the function that maps
Definition 1.1. A function t :
1n (n in unary form) to the binary representation of t(n) is computable in time O(t(n)).
Our goal is to prove the following theorem:
N N
Theorem 1.2. Let t : → be a time-constructible function, then there exists a language L that is decidable in O(t(n))
time, but not decidable in o(t(n)/ log t(n)) time.
The proof consists of two parts. The first is to show that we can perform a simulation of a given TM “efficiently
enough”. Then, we use a standard diagonalization argument to prove the theorem. Both parts are not easy to comprehend.
Let’s start with the simulation.
Claim 1.3. There exists a TM S that, given hM, t, wi as input, S computes the configuration of M when ran on w for t
steps, and it does that in (t log t) · p(|hM i|), where p is some fixed polynomial.
Let’s understand the theorem first, before going over the proof. Should we be surprised by this theorem? At first
glance - not really. We need to simulate M on w for t steps. Simulating one step shouldn’t be really expensive, but it
does depend on the length of hM i, since we need to scan the encoding. This is the p(|hM i|) part. Second, we need to
keep a counter, and update it every step. This is the log t part. Finally, we do it for t steps, which is the t factor.
If we had 3 tapes, we would be done. Just keep hM i on one tape, as well as the current state. Keep the counter on a
second tape, and the simulation tape on a third. But we only have one tape. So we need much more careful “accounting”.
The problem with simply keeping all this information consecutively, is that if the simulation tape is really long, then
in every iteration we need to go very far to find the encoding of M and the counter.
We solve this by increasing the alphabet, so that every letter represents 3 letters. So in a way, we now have 3 tapes.
But remember we only have 1 head! So it’s essentially 3 tapes, but all the heads act in sync. So all we need to do is
perform the simulation on the third tape, but every time the head moves. we move the encoding of M and the counter
to stay “close by”. So in every simulation step, we really do only need to scan the encoding of hM i to figure out where
to move, and to decrease the counter. Which is what we wanted.
Good, so now we want to prove Theorem 1.2. We start by defining the language we want to use. This a very ad-hoc
language.
)
(
k
′
k M does not accept hM i#0 within t (n, m) steps,
L = hM i#0
t(n)
where n = |hM i#0k |, m = |hM i| and t′ (n, m) = p(m)·log
t(n)
You can already sense the diagonalization here - we want machines that reject themselves. This always leads to trouble...
We start by showing that L can be decided in time O(t(n)). We define a Turing machine T . On input x, T acts as
follows:
1. Parse x as hM i#0k . If the parsing fails, reject.
2. Compute the binary representation of n = |x| and m = |hM i|.
1
3. Compute t = t(n).
4. Compute t′ = t′ (n, m) = t/(p(m) · log t).
5. Simulate M on x for t′ steps.
6. If the M accepted x, then reject. Otherwise, accept.
It is clear that T decides L. It remain to analyze its running time. We do it for each step separately. Step 1 takes O(n)
time. Step 2 takes O(n log n) time. Step 3 takes O(t(n)) time. Step 4 takes time polynomial in log(t(n)) (note that
m ≤ n ≤ t(n)). As for step 5, takes (t′ log t′ ) · p(m) to simulate. It holds that
(t′ log t′ ) · p(m) ≤ (t′ log t) · p(m) = t(n)
In total, since t(n) = Ω(n log n), we got a total time of O(t(n)) as wanted.
Now for the real point of the proof - assume by way of contradiction that L is decidable in o(t(n)/ log t(n)) time.
Let M be a machine that decides L in r(n) = o(t(n)/ log t(n)) time. Let n be large enough such that n ≥ m + 1 and
t(n)
1
k
r(n) < p(m)
log t(n) , where m = | hM i |. There exists such n since M is now fixed. Let k ≥ 0 such that |hM i#0 | = n, and
consider the behavior of M on hM i#0k . If M accepts hM i#0k within r(n) steps, then hM i#0k ∈
/ L, so M should not
accept it - contradiction. If M rejects hM i#0k within r(n) steps, then hM i#0k ∈ L, but this is again a contradiction,
and we are done!
Corollary 1.4. For any two real number 1 ≤ ǫ1 < ǫ2 , we have TIME(nǫ1 ) ( TIME(nǫ2 ).
Corollary 1.5. P ( EXPTIME.
Proof. For every k, it holds that nk = O(2n ), so TIME(nk ) ⊆ TIME(2n ), and therefore P ⊆ TIME(2n ). By Theorem 1.2,
2
we know that TIME(2n ) ( TIME(2n ) ⊆ EXPTIME.
2
The Space Hierarchy Theorem
Since we understand space better than time, the analogous hierarchy theorem is cleaner, and provides a tighter bound.
N N
Definition 2.1. A function s : → , where s(n) = Ω(log n), is called space-constructible, if the function that maps 1n
(n in unary form) to the binary representation of s(n) is computable in space O(s(n)).
N
N
Theorem 2.2. Let s :
→
be a space-constructible function, then there exists a language L that is decidable in
O(s(n)) space, but not decidable in o(s(n)) space.
In the exercise you are asked to prove this theorem. To help you in your task, we now discuss space-efficient simulation.
Simulation: As before, we need to carefully discuss simulation complexity, this time with respect to space complexity.
Recall that we consider a two-tapes model, in which we have a read-only input tape and a read and write work tape.
Now, suppose we have hM, w, s, ti as input. We would like to efficiently simulate the run of M on w for t steps, using no
more than s tape cells. The simulator allocates the work space for the following purposes:
1. Current state of the simulated machine. Bounded by |hM i|.
2. s simulated tape cells, each is bounded by |hM i|, so at most s · |hM i| space in total.
3. A pointer for keeping track of the current position of the input tape head (remember that we cannot modify the
input tape, and copying hwi onto the work tape requires too much space). This requires log |w| space.
4. A counter for t. Requires log t space.
So in total we have O(log |w| + s · |hM i| + log t) space. We omit the details on how the simulator works, but you should
figure out for yourself. To summarize:
2
Claim 2.3. There exists a TM S that, given hM, w, s, ti as input,
• If the run of M on w for t steps does not use more than s tape cells, then S computes the configuration of M when
ran on w for t steps.
• Otherwise, S outputs “fail”.
Moreover, this is done in space O(log |w| + s · |hM i| + log t).
Remark. Even though we assumed that the input for S is written on the input tape, it can also be the case that some
of the inputs are written on the work tape, and then the claim states how much additional space we need.
3
SCC is NL-Complete
We define SCC := {hGi : G is a strongly connected directed graph}. We prove that SCC is NL-complete. We recall
that we have seen in class that P AT H is NL-complete. Let’s show first that SCC ∈ N L. There are two ways:
• From Immerman’s theorem (N L = coN L), it is sufficient to show SCC ∈ N L. Nondeterministically choose a pair
of vertices s, t ∈ V , and run a machine that solves P AT H on hG, s, ti.
• Alternatively, iterate over all u, v ∈ V . Check whether hG, u, vi ∈ P AT H. If not, reject. If there is a path, proceed
to the next pair. If we succeed in guessing a path for every pair, accept.
Next, we want to show that SCC is NL-Hard. We do this by reducing P AT H ≤L SCC. Let (G, s, t) be an instance
of P AT H problem. We construct graph G′ by adding edges so that s has an incoming edge from every other vertex and
that t has an outgoing edge to every other vertex. We claim that G′ ∈ SCC ⇐⇒ (G, s, t) ∈ P AT H. Assume there is
a path π from s to t in G and consider any two vertices u, v ∈ G′ . Then there is a path u → s → t → v. Conversely,
assume that there is no path between s and t in G. But this also means they are not connected in G′ , as all added edges
were coming into s and going out of t, hence G′ is not strongly connected. Complexity: to add edges, our reduction loops
over all pairs of nodes and so uses only logarithmic space to determine what pair of nodes we are currently working with.
4
2SCC is NL-Complete
We define 2SCC = {hGi : G is a directed graph with exactly two strongly connected components}. We prove that 2SCC
is NL-complete. We first show that 2SCC ∈ N L. A TM that decides 2SCC in using at most logarithmic space in the
work tape works as follows:
• Nondeterministically choose two vertices u, v ∈ V , they will be used as representatives of the two components. Run
a machine that checks hG, u, vi ∈ P AT H. If it accepts, proceed. Otherwise reject.
• It is left to verify that there are at most two components. For every w ∈ V , check whether there is a path from w
to u and from u to w. If so, continue to the next w. Otherwise, check the same for v. If both rejected, reject (this
means w may belong to a third component).
To show that 2SCC is NL-Hard we show that SCC ≤L 2SCC. The reduction works as follows: On input hGi return
hG′ i, where G′ is constructed from G by adding a new isolated vertex. This way, if hGi is strongly connected, hG′ i has
exactly two strongly connected components. Otherwise, if hGi is not strongly connected, it has k ≥ 2 strongly connected
components, which means that hG′ i has k + 1 ≥ 3 strongly connected components and is therefore not in 2SCC. The
reduction requires only logarithmic space in the work tape, since hGi can be copied to the output tape one vertex/edge
at a time, and there is only on extra vertex to add.
3
Download