CSE596 Problem Set 2 Answer Key Fall 2015

Problem Set 2 Answer Key
Fall 2015
(A) Discussion problem: Suppose you have a Boolean formula ψ in disjunctive normal
form. That is, ψ = T1 ∨ T2 ∨ · · · ∨ Tm , where each term Tj is a conjunction of literals—that is,
an AND drawn from the variables x1 , . . . , xn and their negations x̄1 , . . . , x̄n . For one example
(with n = 3 and m = 2),
ψ = (x1 ∧ x̄3 ) ∨ (x̄1 ∧ x2 ∧ x3 ).
Create a regular expression Rψ over the binary alphabet Σ = {0, 1} as a union of terms, one
for each term of ψ. Each corresponding term of Rψ considers the variable indices 1, . . . , n in
order. For each index i, if the positive literal xi is in the term, Rψ uses a ‘1’, if x̄i is present a
0, and if neither is present (i.e., the variable is absent from the clause), it uses (0 + 1). (Note
that if both xi and x̄i are present the term is a contradiction and we can just delete it.) In
the above example,
Rψ = 1(0 + 1)0 + 011.
Note that the feature of using (0+1) makes Rψ not quite in the normal form of problem (2) on
problem set (1). Show that ψ is a tautology if and only if Rψ is equivalent to (0 + 1)n —that
is, it matches all binary strings of length n. [No hardcopy submission]
Answer: The main idea is that a binary string x of length n—regarded as a truth
assignment to the variables x1 , . . . , xn —satisfies the formula ψ if and only if it matches the
regular expression. Then the conclusion on ψ being a tautology follows. To see this, suppose
x satisfies ψ. Since ψ is a disjunction of terms, this means x makes some term Tj true. Since
Tj is a conjunction, this means x makes every literal in the term true. For i = 1 to n, this
means one of three things:
1. The literal xi appears in the term. Then the i-th bit of x must be 1. The corresponding
term of the regular expression Rψ has just a ‘1’ but that’s OK—it matches that ‘1’ in x.
2. The negated literal x̄i appears in the term. Then the i-th bit of x must be 0. The
corresponding term of Rψ has just a ‘0’ but again that’s fine for matching.
3. Neither xi nor x̄i appears in the term—that is, the term doesn’t depend on the variable
xi at all. We can’t say anything about the bit xi but that’s “hunky-dory” (which means
not just OK but “free of trouble or problems”): the corresponding term in Rψ has (0+1)
in the i-th place so anything matches it there.
Conversely, if x matches Rψ which is a union of terms, then x must match one of its
terms. Working the same reasoning in reverse, this means x must satisfy every literal that
actually appears in the corresponding term of ψ, so it satisfies ψ.
The takeaway from this problem is that even given something as simple as a regexp
without stars, it can be hard to determine whether the expression is correct, even when
“correct” means giving a simple language like {0, 1}n . Here hard means as hard as determining
whether a given Boolean formula is a tautology, which in November we’ll see means NP-hard.
(1) Convert the following NFA into an equivalent DFA. Note that of the sixteen possible
subset-states, the eight that have 1 but not 3 are impossible, owing to the λ-transition. Do
you get all 8 of the other possible states? Then answer: can you combine two or more of the
DFA’s states without changing the language it accepts? Convert either the NFA or some form
of your DFA into a regular expression for this language. [The diagram had s = 1, F = {3},
and instructions (1, a, 2), (1, λ, 3), (2, b, 4), (3, b, 1), (3, a, 4), (4, a, 2), (4, b, 3).]
Answer in words: The DFA that you get from the construction has 7 of the 8 possible
states. S = {1, 3} not just {1} because of the λ-transition from 1 to 3, and since 3 is an
accepting state of the NFA, S is accepting for the DFA. On a the options become 2 and 4—
note the initial λ was needed for the latter—so ∆(S, a) = {2, 4}. Whereas ∆(S, b) = {1, 3} not
just {1} because again don’t forget the trailing λ-arc(s). Thus the first round of breadth-first
search produced the new state {2, 4} so we expand it:
• ∆({2, 4}, a) = just {2} which is likewise rejecting;
• ∆({2, 4}, b) = {3, 4} which is accepting since it includes 3; note also it doesn’t include 1
since you don’t follow λ-arcs backwards.
We got two new states—yecch—so on we go: State {2} goes nowhere on a which means
∆({2}, a) = ∅ which is always a dead state for the DFA if it comes up. We know ∆(∅, any) = ∅
so no need to worry further there. And ∆({2}, b) = {4}) which still is not accepting but at
least keeps things alive. Expanding {3, 4} before the next round, we get {2, 4} back again
on a and {1, 3} back again on b. So the only new state discovered in that round was {4}.
Expanding it gives ∆({4}, a) = {2} OK, but ∆({4}, b) = {3} which is new so crikey we’re
still not done. Last round: ∆({3}, a) = {4} back again, and ∆({3}, b) = {1, 3} remembering
the trailing λ which in fact cycles back to start. We never got the possible state {2, 3, 4} in
the search, and the states with 1 but not 3 are impossible, so 7 is enough.
Actually, 7 is more than enough: Note that the state {3, 4} imitates the start state
perfectly: it is likewise accepting, goes to the same state {2, 4} on a, and both states go to
start on b. So we can condense them into the same state.
This still leaves 6 states in the DFA—well 5 if you ignore the dead state which never
figures into the regular expression for the language. But the 4 states of the original NFA are
fewer so you could start with that—notice that since it has only one accepting state which is
different from start there is no reason to add any new start or final state. If you like tracing
out algorithms in print then you can re-number the accepting state from 3 to 2 and use my
version; if you prefer visual then the idea is you eliminate all the non-accepting states other
than start regardless of their numbers. It is simplest to eliminate state 2 first: you get a new
arc from 1 directly to 4 on the string ab, and an arc from 4 back to itself on ab. This makes 4
easier to eliminate: the arc from 1 to 3 is enriched from T1,3 = λ to T1,3 = λ + ab(ab)∗ b (don’t
forget the λ stays here) and from 3 back to itself is now T3,3 = a(ab)∗ b. We still have T3,1 = b;
that never changed. This gives a 2-state machine, whereupon you can apply one of the two
formulas I gave for Ls,f which here is L1,3 :
L1,3 = (T1,1 + T1,3 T3,3
T3,1 )∗ T1,3 T3,3
T1,3 )∗ .
T1,3 (T3,3 + T3,1 T1,1
or = T1,1
= λ, and so regardless of whether you regard T1,1 = ∅ or T1,1 = λ when there is no
Here T1,1
self-arc, these simplify to:
L1,3 = (T1,3 T3,3
T3,1 )∗ T1,3 T3,3
or = T1,3 (T3,3 + T3,1 T1,3 )∗ .
Then you get the two “mechanical” final answers, equally valid:
L(N ) = ((λ + ab(ab)∗ b)(a(ab)∗ b)∗ b)∗ (λ + ab(ab)∗ b)(a(ab)∗ b)∗
or = (λ + ab(ab)∗ b)(a(ab)∗ b + b(λ + ab(ab)∗ b))∗ .
Yuck! Is it obvious that these two regular expressions are equivalent? At least the second
one is a little shorter. But the burning question already is: Is there a simpler way? It wasn’t
needed for full credit, but the answer is illuminating.
In this case there is: It so happens that two more pairs of states can be condensed in
the DFA. We can in fact define a general Myhill-Nerode type equivalence relation on states
of a DFA M : p ≡ q iff for all z ∈ Σ∗ , M processes z to an accepting state from p iff it
processes z to an accepting state from q. Then {3} joins the same equivalence as the other
two accepting states, and {4} ≡ {2, 4} even though this is not so immediate to see (there is
a dynamic-programming algorithm for it that some of you may have seen). The other nonaccepting state, {2}, is not equivalent because it goes dead on a whereas the others do not.
So you get 4 states left over, or just 3 if you ignore the dead state. You can re-number them
1, 2, 3 (plus dead = 4) to get the minimum DFA M 0 with F = {1}:
(1, b, 1), (1, a, 2), (2, b, 1), (2, a, 3), (3, b, 2) (plus (3,a,dead) etc.)
You can instantly eliminate state 3 by replacing the last two instructions with (2, ab, 2), so
T2,2 = ab to go with T1,1 = b, T1,2 = a, and T 2, 1 = b. Since we now want L1,1 not L1,2 there
is no distinction in the formulas: it’s
L(N ) = L(M 0 ) = L1,1 = (T1,1 + T1,2 T2,2
T2,1 )∗
= (b + a(ab)∗ b)∗ .
Is this expression obviously equivalent to the above? Can you even possibly tell? Well, the
discussion problem (A) says that even without *s the problem is NP-hard. With the *s—and
with powering like (0 + 1)k−1 in lecture notes—the problem is known to require exponential
memory not just time, which is totally wicked. Practically what this means for you is, don’t
expect the instructors to be able to tell your answer is correct automatically—you have to
show the strategy by which you got it as a modicum of proof.
(2) Prove using the Myhill-Nerode technique that the language L of binary strings with
the name number of leading and trailing 0’s is non-regular. Note that L0 = {0n 10n : n ≥ 0}
is a subset of L—does your proof need to care about the difference between L and L0 ?
Answer: Take S = 0∗ . Clearly S is infinite. Let any x, y ∈ S (x 6= y) be given. Then
there are natural numbers m, n with m 6= n (without loss of generality we could say m < n
but here we don’t care) such that x = 0m and y = 0n . Take z = 10m . Then xz ∈ L because
xz has m leading and m trailing 0s, but yz ∈
/ m because it has n leading and m trailing and
m 6= n. So L(xz) 6= L(yz), and since x, y ∈ S are arbitrary, S is PD for L. Thus L has an
infinite PD set, so it is non-regular by the Myhill-Nerode Theorem.
Note that the same proof makes L0 (xz) 6= L0 (yz), so it works for L0 too. Indeed, it
doesn’t care at all about the status of strings with more than one 1. You can fiddle with those
strings all you want to make languages L00 at will, and they will all be non-regular by exactly
the same proof.