CSE596 Problem Set 2 Answer Key Fall 2015 (A) Discussion problem: Suppose you have a Boolean formula ψ in disjunctive normal form. That is, ψ = T1 ∨ T2 ∨ · · · ∨ Tm , where each term Tj is a conjunction of literals—that is, an AND drawn from the variables x1 , . . . , xn and their negations x̄1 , . . . , x̄n . For one example (with n = 3 and m = 2), ψ = (x1 ∧ x̄3 ) ∨ (x̄1 ∧ x2 ∧ x3 ). Create a regular expression Rψ over the binary alphabet Σ = {0, 1} as a union of terms, one for each term of ψ. Each corresponding term of Rψ considers the variable indices 1, . . . , n in order. For each index i, if the positive literal xi is in the term, Rψ uses a ‘1’, if x̄i is present a 0, and if neither is present (i.e., the variable is absent from the clause), it uses (0 + 1). (Note that if both xi and x̄i are present the term is a contradiction and we can just delete it.) In the above example, Rψ = 1(0 + 1)0 + 011. Note that the feature of using (0+1) makes Rψ not quite in the normal form of problem (2) on problem set (1). Show that ψ is a tautology if and only if Rψ is equivalent to (0 + 1)n —that is, it matches all binary strings of length n. [No hardcopy submission] Answer: The main idea is that a binary string x of length n—regarded as a truth assignment to the variables x1 , . . . , xn —satisfies the formula ψ if and only if it matches the regular expression. Then the conclusion on ψ being a tautology follows. To see this, suppose x satisfies ψ. Since ψ is a disjunction of terms, this means x makes some term Tj true. Since Tj is a conjunction, this means x makes every literal in the term true. For i = 1 to n, this means one of three things: 1. The literal xi appears in the term. Then the i-th bit of x must be 1. The corresponding term of the regular expression Rψ has just a ‘1’ but that’s OK—it matches that ‘1’ in x. 2. The negated literal x̄i appears in the term. Then the i-th bit of x must be 0. The corresponding term of Rψ has just a ‘0’ but again that’s fine for matching. 3. Neither xi nor x̄i appears in the term—that is, the term doesn’t depend on the variable xi at all. We can’t say anything about the bit xi but that’s “hunky-dory” (which means not just OK but “free of trouble or problems”): the corresponding term in Rψ has (0+1) in the i-th place so anything matches it there. Conversely, if x matches Rψ which is a union of terms, then x must match one of its terms. Working the same reasoning in reverse, this means x must satisfy every literal that actually appears in the corresponding term of ψ, so it satisfies ψ. The takeaway from this problem is that even given something as simple as a regexp without stars, it can be hard to determine whether the expression is correct, even when “correct” means giving a simple language like {0, 1}n . Here hard means as hard as determining whether a given Boolean formula is a tautology, which in November we’ll see means NP-hard. (1) Convert the following NFA into an equivalent DFA. Note that of the sixteen possible subset-states, the eight that have 1 but not 3 are impossible, owing to the λ-transition. Do you get all 8 of the other possible states? Then answer: can you combine two or more of the DFA’s states without changing the language it accepts? Convert either the NFA or some form of your DFA into a regular expression for this language. [The diagram had s = 1, F = {3}, and instructions (1, a, 2), (1, λ, 3), (2, b, 4), (3, b, 1), (3, a, 4), (4, a, 2), (4, b, 3).] Answer in words: The DFA that you get from the construction has 7 of the 8 possible states. S = {1, 3} not just {1} because of the λ-transition from 1 to 3, and since 3 is an accepting state of the NFA, S is accepting for the DFA. On a the options become 2 and 4— note the initial λ was needed for the latter—so ∆(S, a) = {2, 4}. Whereas ∆(S, b) = {1, 3} not just {1} because again don’t forget the trailing λ-arc(s). Thus the first round of breadth-first search produced the new state {2, 4} so we expand it: • ∆({2, 4}, a) = just {2} which is likewise rejecting; • ∆({2, 4}, b) = {3, 4} which is accepting since it includes 3; note also it doesn’t include 1 since you don’t follow λ-arcs backwards. We got two new states—yecch—so on we go: State {2} goes nowhere on a which means ∆({2}, a) = ∅ which is always a dead state for the DFA if it comes up. We know ∆(∅, any) = ∅ so no need to worry further there. And ∆({2}, b) = {4}) which still is not accepting but at least keeps things alive. Expanding {3, 4} before the next round, we get {2, 4} back again on a and {1, 3} back again on b. So the only new state discovered in that round was {4}. Expanding it gives ∆({4}, a) = {2} OK, but ∆({4}, b) = {3} which is new so crikey we’re still not done. Last round: ∆({3}, a) = {4} back again, and ∆({3}, b) = {1, 3} remembering the trailing λ which in fact cycles back to start. We never got the possible state {2, 3, 4} in the search, and the states with 1 but not 3 are impossible, so 7 is enough. Actually, 7 is more than enough: Note that the state {3, 4} imitates the start state perfectly: it is likewise accepting, goes to the same state {2, 4} on a, and both states go to start on b. So we can condense them into the same state. This still leaves 6 states in the DFA—well 5 if you ignore the dead state which never figures into the regular expression for the language. But the 4 states of the original NFA are fewer so you could start with that—notice that since it has only one accepting state which is different from start there is no reason to add any new start or final state. If you like tracing out algorithms in print then you can re-number the accepting state from 3 to 2 and use my version; if you prefer visual then the idea is you eliminate all the non-accepting states other than start regardless of their numbers. It is simplest to eliminate state 2 first: you get a new arc from 1 directly to 4 on the string ab, and an arc from 4 back to itself on ab. This makes 4 easier to eliminate: the arc from 1 to 3 is enriched from T1,3 = λ to T1,3 = λ + ab(ab)∗ b (don’t forget the λ stays here) and from 3 back to itself is now T3,3 = a(ab)∗ b. We still have T3,1 = b; that never changed. This gives a 2-state machine, whereupon you can apply one of the two formulas I gave for Ls,f which here is L1,3 : ∗ ∗ L1,3 = (T1,1 + T1,3 T3,3 T3,1 )∗ T1,3 T3,3 ∗ ∗ T1,3 )∗ . T1,3 (T3,3 + T3,1 T1,1 or = T1,1 ∗ = λ, and so regardless of whether you regard T1,1 = ∅ or T1,1 = λ when there is no Here T1,1 self-arc, these simplify to: ∗ ∗ L1,3 = (T1,3 T3,3 T3,1 )∗ T1,3 T3,3 or = T1,3 (T3,3 + T3,1 T1,3 )∗ . Then you get the two “mechanical” final answers, equally valid: L(N ) = ((λ + ab(ab)∗ b)(a(ab)∗ b)∗ b)∗ (λ + ab(ab)∗ b)(a(ab)∗ b)∗ or = (λ + ab(ab)∗ b)(a(ab)∗ b + b(λ + ab(ab)∗ b))∗ . Yuck! Is it obvious that these two regular expressions are equivalent? At least the second one is a little shorter. But the burning question already is: Is there a simpler way? It wasn’t needed for full credit, but the answer is illuminating. In this case there is: It so happens that two more pairs of states can be condensed in the DFA. We can in fact define a general Myhill-Nerode type equivalence relation on states of a DFA M : p ≡ q iff for all z ∈ Σ∗ , M processes z to an accepting state from p iff it processes z to an accepting state from q. Then {3} joins the same equivalence as the other two accepting states, and {4} ≡ {2, 4} even though this is not so immediate to see (there is a dynamic-programming algorithm for it that some of you may have seen). The other nonaccepting state, {2}, is not equivalent because it goes dead on a whereas the others do not. So you get 4 states left over, or just 3 if you ignore the dead state. You can re-number them 1, 2, 3 (plus dead = 4) to get the minimum DFA M 0 with F = {1}: (1, b, 1), (1, a, 2), (2, b, 1), (2, a, 3), (3, b, 2) (plus (3,a,dead) etc.) You can instantly eliminate state 3 by replacing the last two instructions with (2, ab, 2), so T2,2 = ab to go with T1,1 = b, T1,2 = a, and T 2, 1 = b. Since we now want L1,1 not L1,2 there is no distinction in the formulas: it’s ∗ L(N ) = L(M 0 ) = L1,1 = (T1,1 + T1,2 T2,2 T2,1 )∗ = (b + a(ab)∗ b)∗ . Is this expression obviously equivalent to the above? Can you even possibly tell? Well, the discussion problem (A) says that even without *s the problem is NP-hard. With the *s—and with powering like (0 + 1)k−1 in lecture notes—the problem is known to require exponential memory not just time, which is totally wicked. Practically what this means for you is, don’t expect the instructors to be able to tell your answer is correct automatically—you have to show the strategy by which you got it as a modicum of proof. (2) Prove using the Myhill-Nerode technique that the language L of binary strings with the name number of leading and trailing 0’s is non-regular. Note that L0 = {0n 10n : n ≥ 0} is a subset of L—does your proof need to care about the difference between L and L0 ? Answer: Take S = 0∗ . Clearly S is infinite. Let any x, y ∈ S (x 6= y) be given. Then there are natural numbers m, n with m 6= n (without loss of generality we could say m < n but here we don’t care) such that x = 0m and y = 0n . Take z = 10m . Then xz ∈ L because xz has m leading and m trailing 0s, but yz ∈ / m because it has n leading and m trailing and m 6= n. So L(xz) 6= L(yz), and since x, y ∈ S are arbitrary, S is PD for L. Thus L has an infinite PD set, so it is non-regular by the Myhill-Nerode Theorem. Note that the same proof makes L0 (xz) 6= L0 (yz), so it works for L0 too. Indeed, it doesn’t care at all about the status of strings with more than one 1. You can fiddle with those strings all you want to make languages L00 at will, and they will all be non-regular by exactly the same proof.