Chapter 4: Properties of Regular Languages We begin by looking at closure properties. A set is closed under an operation if whenever two elements of the set are combined using that operation then the result is also in the set. For example, If two integers are added, subtracted or multiplied then the sum, difference or product is also an integer. However, when one integer is divided by another we have no such guarantee—the result might be a fraction. Keeping in mind that a language is just a set of strings over an alphabet, as you might suspect from the proof of creating an NFA from a regular expression, the regular languages are closed under union. That is, if we take the union of two regular languages, the resulting language is also regular. This can be shown by connecting the machines for the two languages using the same construction as in case 1 of the proof of Theorem 3.1. Some closure properties will be proven constructively and others are proven by theorem. The constructive ones basically involve combining DFA’s for the languages in some way (as with the union of two languages above). Although we may choose any of the equivalent representations of a regular language, the DFA is usually the most straightforward one to use in a construction because it has exactly one move from every state on every input symbol and has no -moves. The other thing we’ll look at in this chapter is what questions about regular languages can be answered. _ Theorem 4.1 If L1 and L2 are regular languages, then so are L1 L2, L1 L2, L1•L2, L1 and L1*. Thus, the family of regular languages is closed under union, intersection, concatenation, complementation and star-closure. Proof: If L1 and L2 are regular, then there are DFAs M1 and M2 such that L1 = L(M1) and L2 = L(M2). As in the proof of theorem 3.1, we can build NFAs that recognize L 1 L2, L1•L2, and L1*. We have already discussed how to find a machine for the complement of a language accepted by a DFA and we use that to show closure under complement. Specifically, let M = (Q, , q0, F) be a DFA that accepts L1. Then, the DFA M = (Q, , q0, Q - F) accepts the complement of L1. To see this we make the following observation: it’s important that we defined the DFA in the way that we did, namely, that is a total function so there is a transition from every state on every input symbol. Otherwise, if a transition were missing we could have a string that was in neither L(M) nor L(M). Thus, either *(q0, w) F in which case w L((M) or *(q0, w) Q - F in which case w is in the complement of L1. Note: We can’t do the proof in this manner with an NFA. Suppose w L(M), but because M is an NFA we might end up in a nonaccept state when we follow a walk starting at q0 labeled w in GM. Following another walk labeled w starting at q0 through GM would lead us to an accept state implying that w L(M). For closure under intersection, let’s use DeMorgan’s law that the complement of an intersection of two languages is the union of the complements of the languages. _______ __ __ L1 L2 = L 1 L2 __ __ That is, since L1 and L2 are regular so are L1 and L2, and the union of those complements is also regular by closure under union. Then, the complement of that union must also be regular. Note, the book does this constructively. Let’s look briefly at that just to see a more sophisticated construction proof. Let L1 = L(M1) and L2 = L(M2) where M1 = (Q, , 1, q0, F1) and M2 = (P, , 2, p0, F2). Then we construct an automaton M3 = (Q P, , 3, (q0, p0), F3). Basically, we form a new automaton whose set of states is the Cartesian product of the states of the other two automata i.e. each state in the new machine is an ordered pair of states from the original machines. As you might expect, the new function acts like 1 on the first component in an ordered pair and like 2 on the second component. That is, 3((qi, pj), a) = (1(qi, a), 2(pj, a)). The set of final states F3 = {(qf, pf) | qf F1 and pf F2}. It is straightforward to show that w L1 L2 if and only if w L(M3). There are two more closure properties that are easy to establish. Regular languages are __ closed under difference: L1 – L2 = L1 L2 because of closure under intersection and complement. Similarly, Theorem 4.2 states that the regular languages are closed under reversal which is easy to show constructively. Given a language L, LR is obtained by reversing each string in L. Here’s how to prove closure under reversal using automaton construction. Basically we start with an automaton M for L and introduce a new start state q0'. We then put in -moves from the new start state to each of the original final states. Then, we reverse all of the arrows in the underlying graph. The new final state is the original start state q0. Mathematically we have: Proof. Let L = L(M) for a DFA M = (Q, , , q0, F). Construct an automaton M′ = (Q {q0’}, , ′, q0′, {q0}) for LR, where ′(q, a) = p iff (p, a) = q. For new start state q0′, ′(q0′, ) = p for every p F Summarizing, we have shown that the regular languages are closed under, union, concatenation, star closure, complement, difference, intersection and reversal. Here are some examples of ways in which closure properties can be used to prove or disprove whether a language is regular. _ Example 1: Prove or disprove: If L is not regular then L is not regular. _ Proof by contradiction. Suppose L is a language that is not regular but L is regular. _ Since regular languages are closed under complement, the complement of L = L must be regular. This is a contradiction. Therefore if L is not regular then its complement is not regular either. _ (Note: the complement of L is denoted by L with two bars on top of it. ) Example 2: If L is regular then L′ = {xy | x L and y L} is also regular _ _ Since L is regular, so is L. Then L′ = L•L which is regular since the regular languages are closed under concatenation. Example 3: #8 p. 109 Define the complementary or (cor) of two languages by _ _ cor(L1, L2) = {w | w L1 or w L2}. Show that the family of regular languages is closed under the cor operation. class exercise—fill in the proof below Proof: Let L1 and L2 be regular languages. Closure under other operations Before we get into additional operations under which the regular languages are closed let’s take a look at the language L = {0n1n}. This is a context-free language that is not regular. We’ll prove this later, but intuitively we can justify that this is not regular by making the observation that finite automata cannot count i.e. if a string in this language was input to an automaton the machine would have to “count” the number of 0’s so that when the 1’s are encountered it could determine if there are exactly as many 1’s as 0’s. A machine cannot add states in order to count the 0’s. Once the machine is built, the number of states if fixed. Definition 4.1: Let and be alphabets. Then a function h: * is called a homomorphism. That is, a homomorphism is a substitution in which a single letter is replaced by a string. Although this is defined for alphabet symbols, a homomorphism can be extended to languages in the obvious way. If h is a homomorphism and w = a1a2…ak is a string, then h(w) = h(a1)h(a2)…h(ak). That is, the homomorphism is applied to every symbol in the string. Then if L is a language, h(L) = {h(w) | w L}. h(L) is called the homomorphic image of L. If we start with a regular expression r for a language L then a regular expression for h(L) can be obtained by applying the homomorphism to each alphabet symbol in r. For example, consider the languages L = {01, 10}* over = {0, 1}. Let h be the homomorphism defined by h(0) = ab and h(1) = c. Then, h(01) = abc and h(10) = cab. In this case h(L) = {abc, cab}*. If r = (0 + 1)*11, then h(r) =(ab + c)*cc. Theorem 4.3: Let h be a homomorphism. If L is a regular language, then its homomorphic image h(L) is also regular. The family of regular languages is therefore closed under arbitrary homomorphisms. Proof method: Let L be a regular language and let r be a regular expression such that L = L(r). We need to show that h(r) defines the language h(L). Specifically, it must be shown that for every w L(r), h(w) L(h(r)) and conversely that for every y L(h(r)), there is a w in L such that y = h(w). Induction can be used to prove this. See the text for a complete proof. How not to use a homomorphism: An important thing to keep in mind is that the homomorphism is defined on single alphabet symbols, not strings. That is, a single alphabet symbol is replaced by a string not the other way around. If L = {(ab)ncn | n 0}, we cannot show that L is not regular by replacing ab by 0 and c by 1 to get {0n1n}. We’ll have to find other methods to do that. Another example of using closure properties Example 4: Let L = {0i1n2n | i 1, n 0}. Suppose L is regular and consider the following homomorphism: h(0) = h(1) = 0 h(2) = 1 Then, h(L) = {0n1n} Since the regular languages are closed under homomorphism, if L is regular this implies {0n1n} is regular. But, as we’ll show later, {0n1n} is not regular. Therefore, we have a contradiction so L is not regular. Note that the choice of homomorphism matters. For example, consider this homomorphism h′ instead: h′(0) = , h′(1) = 0, h′(2) = 0. Then h′(L) = {02n} which is a regular language. Caution: This brings up an important point about the use of closure properties—if two regular languages are combined under an operation for which the regular languages are closed then we are guaranteed that the resulting language is regular. However, if languages L1 and L2 are combined using an operation under which regular languages are closed to create a new language L3, and both L1 and L3 are regular it does NOT mean that L2 is regular. For example, let L1 = and L2 = {0n1n}. Then L1 L2 = , but L2 is not a regular language even though is a regular language. Thus, just because the result of combining two languages is regular it does not mean that both of the original languages that were combined are regular. Since a homomorphism is a function, we can talk about inverse homomorphisms. (Actually, it's more like a preimage than an inverse since the homomorphism is not necessarily one-to-one or onto.) So, if h is a homomorphism on L, h-1(L) = {w | h(w) is in L}. As one might expect, the regular languages are closed under inverse homomorphism. Specifically, If h is a homomorphism from alphabet to alphabet , and L is a regular language over , then h-1(L) is also a regular language. You have to be a bit careful applying inverse homomorphisms because the answer isn’t always what you might expect it to be. For example, let L = {aba, aabb} and define the homomorphism h(0) = ab, h(1) = a. Then h-1(L) = {01} since aabb has no preimage under the given homomorphism. Note that this implies that in general h(h-1(L)) L. Now, let’s look again at one of the examples above: L = {(ab)ncn | n 0}. We can use an inverse homomorphism to show this is not regular. Consider the homomorphism h(0) = ab and h(1) = c. Then, h-1(L) = {0n1n | n 0} which we have already noted is not a regular language. A concept closely related to homomorphisms is of substitution. In this, case rather than replacing each alphabet symbol by a string it is replaced by a language. Let’s go back to the example L = {01, 10}*. Define a substitution s on L by s(0) = a* and s(1) = (a + b). Then s(L) becomes {a*(a + b), (a + b)a*}*. Definition 4.2 Let L1 and L2 be languages over the same alphabet. Then, the right quotient of L1 with L2 is defined as L1/L2 = {x | xy L1 for some y L2}. (Play with this a bit to make sure you understand how this works.) To form the right quotient we take all strings in L 1 that have a suffix belonging to L2. Every such string, after removal of this suffix belongs to L 1/L2. Theorem 4.4: If L1 and L2 are regular languages, then L1/L2 is also regular. Thus, we say that the family of regular languages is closed under right quotient with a regular language. See the book for the proof, we’re not going to do it. Example 5: (Example 4.5 in Linz:) Let L1 = L(a*baa*) and L2 = L(ab*). Find L1/L2. The answer is L1/L2 = L(a*ba*). Here’s the reasoning behind the answer: Look at L2 = {a, ab, abbb, …}. Since none of the strings in L1 end in b, the only string in L2 that we need to worry about is a. Thus, the problem becomes L 1/{a} which can easily be seen to be a*ba* since it just removes the mandatory a from the end of each string in L1. Example 6: Another constructive example. Prove that if L is regular then so is drop(L) = {x | x is formed by dropping one letter from a word in L}. Note: this will not be discussed in class unless you have questions about it. Proof sketch: Let L be a regular language and let M = (Q, Σ, , q0, F) be a DFA such that L(M) = L. Construct an NFA M' that consists of two copies of M. Stay in the first copy until a letter is dropped. Then use to move to the second copy and continue in the second copy until the string as been read. We’ll denote states in the second copy of M by q' to distinguish them from the states in the original machine. Construct M' = (Q Q', Σ, ', q0, F') States of M': Q Q' where qi' Q' iff qi Q Final states: F' = {p' | p F} The transition function: '(qi, a) = (qi, a) for all qi Q '(qi', a = qj iff (qi, a) = qj. '(qi, ) = {qj' | a such that (qi, a) = qj} It is moves of this last type that allow us to drop the letter. For example, the construction looks something like this: The next couple of examples rely on the fact that {anbn | n 0} is not regular. Again, we’ll prove that fact shortly. Example 7: Let L = {aibj | i, j 0, i j}. _ What is L? Your first guess is probably that the complement is aibi, but that isn’t right since * - L contains strings like abab and baaab. Is L regular? _ Assume that L is regular. Then L is also regular. The language a*b* is regular. Since regular _ languages are closed under intersection, this implies that L a*b* = {anbn} is regular, giving us a contradiction so L is not regular. So what have we learned here? Even though anbn is not the complement of L, strings of that type are in L’s complement. So, if we can use another operation under which the regular languages are closed to “extract” strings of that form from the complement then we will get our contradiction. Since a*b* is regular (it’s a regular expression) we can intersect that with the complement to isolate the strings with the right pattern. Example 8: Let L = {aibjck | i = 0 or j = k}. That is, if there is at least one a then the number of b’s equals the number of c’s. Consider the following homomorphism h: h(a) = , h(b) = 0, h(c) = 1. Then, h(L) = {0j1k} but it is not necessarily true that j = k. In this case, h(L) is regular, but this doesn’t give us any information about whether or not L is regular. Assume L is regular. Since ab*c* is regular, let L′ = L ab*c* = {abncn}. By closure under intersection, L’ is regular. Then, consider the homomorphism h we tried to use before (h(a) = , h(b) = 0, h(c) = 1). This time, h(L′) = {0n1n} which is not regular. Therefore, L could not have been regular either. Example 9: (problem 4.1.16) Consider the statement: If L 1 is regular and L1 L2 is also regular then L2 is regular. Prove or disprove this. Remember the caution paragraph above. This is an example of how you must be careful of how you use the closure properties. Suppose L1 = *. Then L1 L2 is also * which is a regular set, but L2 could be any language such as {0n1n}, so using this L1 and L2 gives us a counterexample to the claim. Since L2 can be arbitrary, if this were true it would imply that every language was regular. Example 10: Consider the related problem If L1 is finite and L1 L2 is regular then L2 is regular. __ __ This statement is true. To see this let L3 = (L1 L2) L1. L1 is regular since regular languages are closed under complement and every finite language is regular. Therefore, L3 is regular since it is the intersection of two regular languages. Since L1 L2 is finite, then it too is regular. Now, we can write L2 = L3 (L1 L2) which is a regular language since the regular languages are closed under union. (Suggestion: convince yourself this works by using a Venn diagram.)