CSE596 Problem Set 3 Answer Key Fall 2015 (A) Verify that for any language L, the relation x ∼L y meaning (∀z ∈ Σ∗ )[L(xz) = L(yz)] is reflexive, symmetric, and transitive, making it an equivalence relation. Letting [x] denote the equivalence class of any string x, then say why the DFA’s transition function δ([x], c) = [xc] is the same if some other string y ∈ [x] (that is, y such that y ∼L x) is used in place of x. This fact makes M = (Q, Σ, δ, s, F ) “well-defined” when Q = {[x] : x ∈ Σ∗ }, F = {[x] : x ∈ L}, and s = [λ]. Answer: The relation x ∼L y is reflexive because x ∼L x comes down to the trivial equality (∀z)[L(xz) = L(xz)]. It is symmetric because interchanging y and x makes no difference. It is transitive because if W ∼L x and x ∼L y then we know that for all z, L(wz) = L(xz), and we know that for all z, L(xz) = L(yz). Hence for all z we have L(wz) = L(yz), so w ∼L y follows. Now using some other string y such that y ∼L x means defining δ([y], c) = [yc] instead of δ([x], c) = [xc]. The left-hand sides are the same since [y] = [x], but the right-hand sides are not immediately the same: we need to show that [xc] = [yc]. To show that yc ∼L xc (which is the same thing as [yc] = [xc]) we need to let any z be given and show that L(xcz) = L(ycz). We are armed (only) with the fact from x ∼L y that for all z, L(xz) = L(yz). Now we have a “Symbol Clash” so to think more clearly, let us rename the “z” used for x ∼L y to “w.” So: for all w ∈ Σ∗ , L(xw) = L(yw). Now we can see that this quantification over all strings w includes the cases w = cz that we need. Hence we can rigorously say that for all z ∈ Σ∗ , L(xcz) = L(ycz), so xc ∼L yc, so [xc] = [yc], so the definition δ([x], c) = [xc] is unambiguous. (1) Consider the following three languages: 1. L1 = { xay : #a(x) = #a(y) }, 2. L2 = { xby : #a(x) = #a(y) }, 3. L3 = { xy : #a(x) = #b(y) }. For each Li , say whether it is regular or not. If you say “regular,” design a DFA Mi and prove (informally) that L(Mi ) = Li . Also give a regular expression ri such that L(ri ) = Li . If you say “non-regular,” prove that using the Myhill-Nerode Theorem. (36 pts. in all) Answer: L1 equals the set of strings that have an odd number of a’s, since the ‘a’ in ‘xay’ can be taken as the middle a in such a string. This language is regular, with a 2-state DFA that tracks the parity of the number of a’s encountered so far. A regular expression is b∗ a(b∗ ab∗ a)∗ b∗ . L3 equals the set of all strings over { a, b }! To see this, a string z of length n has n + 1-many prefixes, ranging from the empty string to z itself. For 0 ≤ i ≤ n, define f (i) = #a(x) − #b(y) where x = z(1) · · · z(i) is the length-i prefix and y = z(i + 1) · · · z(n) is the rest of z. Then f (0) ≤ 0, f (n) ≥ 0, and f steps up or down by at most 1 when going from i to i + 1, i.e. |f (i + 1) − f (i)| ≤ 1. Hence there is some i such that f (i) = 0 (kind of like the “Intermediate Value Theorem” in calculus), and this i makes #a(x) = #b(y) for the corresponding x-y break, so z ∈ L3 . Since z is arbitrary, L3 = Σ∗ , which is the language of a 1-state DFA, and a regular expression is (a + b)∗ . (Another way to see this is by induction on strings: λ ∈ L3 with i = 0, and for all strings z ∈ L3 with breakpoint i, za ∈ L3 with the same breakpoint, while zb ∈ L3 with breakpoint i + 1.) L2 is the nonregular language. Take S = a∗ b, clearly infinite. Let any x, y ∈ S, x 6= y be given. Then there are natural numbers m 6= n such that x = am b and y = an b. Take z = am . Then xz = am bam ∈ L2 , but yz = an bam ∈ / L2 since there is only one possible “b” breakpoint and it doesn’t balance. So L2 (xz) 6= L2 (yz), so the infinite S is PD for L2 , so L2 is not regular. (2) Design a deterministic 2-tape Turing machine M such that L(M ) = L2 from problem (1). Be sure to give strategy and analysis comments explaining the design of your machine and making clear that it runs in linear time. Is it a pushdown automaton (DPDA)? Answer in words: Since L2 = { xby : #a(x) = #a(y) } it follows that if the number of a’s in the input string w is odd then we reject. If it is even, say #a(w) = 2m, then we want to find the m-th ‘a’ in w. We will accept if and only if the immediately following character is a ‘b’—if it is another ‘a’ then we cannot possibly find a ‘b’ that equalizes the ‘a’s to its left and right. To give some more machine details, we begin with states s, t like those of the DFA shown in class that kept track of the even/odd parity of the number of 1’s read so far. We push an ‘X’ for every second ‘a’ read, while just skipping over each ‘b’ read. Instructions to do this can include (s, aB/aB, RS, t), (t, aB/aX, RR, s) and (s, bB/bB, RS, s), (t, bB/bB, RS, t) for the no-ops. If a blank B is encountered on Tape 1 in state t this is the “odd a’s case,” so we reject. If in state s then we execute (s, BB/BB, LL, u) going to a new state u that starts popping one ‘X’ for every single ‘a’ read. We could first rewind to the left end in order to do the above strategy literally, but it’s OK to count R-to-L in state u too: (u, aX/aB, LL, u), (u, bX/bX, LS, u) with the latter being a no-op. When we exhaust the stack we will see B on Tape 2, and we will have just moved left from the m-th ‘a’ going right-to-left. We should hence accept if and only if the char on Tape 1 that we are scanning is a ‘b’. So we finish with (u, bB/bB, SS, qacc ) and (u, aB/aB, SS, qrej ). It remains to check the “edge case” of #a(w) = 0—this was needed for full credit. The empty string λ does not belong to L2 because it has no ‘b’ in it. However, every non-empty string consisting only of b’s does belong to L2 . Let’s see what happens with the above program on these inputs. On λ it first sees BB, so it executes (s, BB/BB, LL, u) and again sees BB. This isn’t provided for, but we can add the instruction (u, BB/BB, SS, qrej ) to cover it unless it interferes with accepting other strings of b’s. On such a string, we stay executing (s, bB/bB, RS, s) until we see BB, whereupon we do (s, BB/BB, LL, u) and then what do we see? We see the rightmost ‘b’ on Tape 1 and the blank on Tape 2. This triggers (u, bB/bB, SS, qacc ), which is exactly what we want. So the above design automatically handles the“edge case” gracefully and we’re done. Since the whole computation involves just one L-to-R pass and one R-to-L pass, it runs in linear time. Our program does treat Tape 2 as a stack. However, it makes left-moves on Tape 1, so it is not a PDA. Extra note: There does exist an NPDA for this language. The NPDA N pushes an ‘X’ for each ‘a’—always staying in state s this time—and on reading some ‘b’ guesses whether to keep pushing or go to another state v and start popping. If it guesses correctly when w ∈ L2 it will empty its stack exactly when it has read the last ‘a’ in w and the instruction (v, BB/BB, SS, qacc ) will make it accept. I believe there does not exist any DPDA, but I do not know how to prove it—at least I do not think this is covered even in the most rigorous texts from the 1970s and 1980s such as Hopcroft and Ullman.