Closure properties of regular languages Ü What happens if we perform operations on regular languages? Ü Is the resulting language still regular? Ü We refer to this as a closure question. Ü Given two languages L1 and L2 , is their union also regular? Ü We can ask similar questions about other types of operations on languages. Ü This leads us to the study of the closure properties of languages in general. Closure properties of regular languages Theorem If L1 and L2 are regular languages, then so are L1 ∪L2 , L1 L2 , and L∗1 . If L1 and L2 are regular, then there exists regular expressions r1 and r2 such that L1 = L(r1 ) and L2 = L(r2 ). By definition, r1 + r2 , r1 r2 , and r∗1 are regular expressions denoting the languages L1 ∪ L2 , L1 L2 , and L∗1 , respectively. Thus, closure under union, concatenation, and star-closure is immediate. Closure properties of regular languages Theorem If L1 is a regular language, then so is L1 . To show closure under complementation, let M = (Q, Σ, δ, q0 , F ) be a DFA that accpets L1 . Then the DFA M̂ = (Q, Σ, δ, q0 , Q − F ) accepts L1 . Remember that δ ∗ (q0 , w) is defined for all w ∈ Σ∗ . Consequently either δ ∗ (q0 , w) is a final state, in which case w ∈ L or δ ∗ (q0 , w) ∈ Q − F and w ∈ L1 . Closure properties of regular languages Theorem If L1 and L2 are regular languages, then so is L1 ∩ L2 . Let L1 = L(M1 ) and L2 = L(M2 ), where M1 = (Q, Σ, δ1 , q0 , F1 ) and M2 = (P, Σ, δ2 , p0 , F2 ) are DFA’s. We construct from M1 and M2 a combined automaton M̂ = (Q̂, Σ, δ̂, (q0 , p0 ), F̂ ), whose state set Q̂ = Q × P consists of pairs (qi , pj ), and whose transition function δ̂ is such that M̂ is in state (qi , pj ) whenever M1 is in state qi and M2 is in state pj . This is achieved by taking δ̂((qi , pj ), a) = (qk , pl ), whenever δ1 (qi , a) = qk and δ2 (pj , a) = pl . F̂ is defined as the set of all (qi , pj ), such that qi = F1 and pj ∈ F2 . Then, w ∈ L1 ∩ L2 if and only if it is accepted by M̂ . Consequently, L1 ∩ L2 is regular. Closure properties of regular languages Theorem The family of regular languages is closed under reversal. Suppose that L is a regular language. We then construct an NFA with a single final state for it. In the transition graph for this NFA we make the initial vertex a final vertex, the final vertex the initial vertex, and reverse the direction on all edges. The modified NFA accepts wR if and only if the original NFA accepts w. Therefore, the modified NFA accepts LR , proving closure under reversal. Closure properties of regular languages Definition Suppose Σ and Γ are alphabets. Then a function h : Σ → Γ∗ is called a homomorphism. In words, a homomorphism is a substitution in which a single letter is replaced with a string. The domain of the function h is extended to strings in an obvious fashion; if w = a1 a2 . . . an , then h(w) = h(a1 )h(a2 ) . . . h(an ). If L is a language on Σ, then its homomorphic image is defined as h(L) = {h(w) | w ∈ L}. Closure properties of regular languages Ü Example: Let Σ = {a, b} and Γ = {a, b, c} and define h by h(a) = ab, h(b) = bbc. Then h(aba) = abbbcab. The homomorphic image of L = {aa, aba} is the language h(L) = {abab, abbbcab}. Theorem Let h be a homomorphism. If L is a regular language, then its homomorphic image h(L) is also regular. The family of regular languages is therefore closed under arbitrary homomorphisms. Closure properties of regular languages Ü Closure properties of the family of regular languages Ü Entry + means the language family is closed under the operation under consideration. Operation ∪ · ∗ − ∩ R h + + + + + + + Decidability questions Ü Decidability questions deal with our ability to decide on certain properties. Ü For example, can we tell whether a language is finite or not? Ü A very fundamental question is Ü Given a language L and a string w, can we determine whether or not w is an element of L? This is the membership question and a method for answering it is called membership algorithm. Decidability questions We say that a regular language is given in a standard representation if and only if it is described by a finite automaton, a regular expression, or a regular grammar. Theorem Given a standard representation of any regular language L on Σ and any w ∈ Σ∗ , there exists an algorithm for determining whether or not w is in L. We represent the language by some DFA, then test w to see if it is accepted by this automaton. Decidability questions Theorem There exists an algorithm for determining whether a regular language, given in standard representation, is empty, finite, or infinite. We represent the language as a transition graph of a DFA. If there is a simple path from the initial vertex to any final vertex, then the language is not empty. To determine whether or not a language is infinite, find all the vertices that are the base of some cycle. If any of these are on a path from an inital to a final vertex, the language is infinite. Otherwise, it is finite. Decidability questions Theorem Given standard representations of two regular languages L1 and L2 , there exists an algorithm to determine whether or not L1 = L2 . Using L1 and L2 we define the language L3 = (L1 ∩ L2 ) ∪ (L1 ∩ L2 ). By closure, L3 is regular, and we can find a DFA M that accepts L3 . Once we have M we can then use the algorithm in the previous theorem to determine if L3 is empty. L3 = ∅ if and only if L1 = L2 . Pumping lemma for regular languages Ü How can we tell whether a given language is regular or not? Ü If the language is in fact regular, we can show it by giving some DFA, regular expression or regular grammar. But if it is not? Ü One way to show a language is not regular is to study characteristics that are shared by all regular languages. Ü If we know of some such characteristic or property, and we can show that the candidate language does not have it, then we can tell that the language is not regular. Pumping lemma for regular languages Ü We prove a basic result, called the pumping lemma for regular languages, which is a powerful tool for proving certain languages non-regular. Ü We can make use of an observation about regular languages that is based on the pigeonhole principle. Ü The term “pigeonhole principle” is used by mathematicians to refer to the following simple observation. If we put n objects into m boxes (pigeonholes), and if n > m, then at least one box must have more than one item in it. Pumping lemma for regular languages Ü If a language is regular, it is accepted by a DFA M = (Q, Σ, δ, q0 , F ) with some particular number of states, say n. Ü Consider an input of n or more symbols a1 a2 . . . am , m ≥ n, and for i = 1, 2, . . . , m let δ ∗ = (q0 , a1 a2 . . . ai ) = qi . Ü It is not possible for each of the n + 1 states q0 , q1 , . . . , qn to be distinct, since there are only n different states. Ü Thus there are two integers j and k, 0 ≤ j ≤ k ≤ n, such that qj = qk . Pumping lemma for regular languages Ü The path labeled a1 a2 . . . am is illustrated as follows aj+1 . . . ak start q0 a1 . . . aj qj = qk ak+1 . . . am qm Ü If a1 a2 . . . am is in L(M ), then a1 a2 . . . aj ak+1 k + 2 . . . am is also in L(M ), since there is a path from q0 to qm that goes through qj but not around the loop labeled aj+1 . . . ak . Ü Similarly, we could go around the loop more than once, in fact, as many times as we like. Thus a1 . . . aj (aj+1 . . . ak )i ak+1 . . . am is in L(M ) for any i ≥ 0. Pumping lemma for regular languages Ü What we have proved is that given any sufficiently long string accepted by a finite automaton, we can find a substring that maybe “pumped”, i.e. repeated as many times as we like, and the resulting string will be accepted by the finite automaton. Ü This property of finite state automata, and the languages that they can accept, is the basis for the pumping lemma showing that a language is not regular. Ü If a language contains even one sufficiently long string that cannot be pumped, then it is not regular. Pumping lemma for regular languages The formal statement of the pumping lemma follows. Theorem Let L be an infinite regular language. Then there exists some positive integer m such that any w ∈ L with |w| ≥ m can be decomposed as w = xyz with |xy| ≤ m and |y| ≥ 1, such that wi = xy i z, is also in L for all i = 0, 1, 2, . . . Pumping lemma for regular languages Ü The pumping lemma is used to show that certain languages are not regular. The demonstration is always by contradiction. Ü We will say “If L were regular, then it would possess certain properties. But it does not possess those properties. Therefore, it is not regular. Ü In applying the pumping lemma, we must keep in mind that we are guaranteed the existence of an m as well as the decomposition xyz, but we do not know what they are. Ü We cannot claim a contradiction just because the pumping lemma is violated for some specific values of m or xyz. We must show that there is no way to carve w into x, y, z in such a way that all conditions of the theorem are met. Ü If the pumping lemma is violated even for one w or i, then the language cannot be regular! Pumping lemma for regular languages Ü The correct argument can be visualized as a game we play against an opponent. Our goal is to win the game by establishing a contradiction of the pumping lemma, while the opponent is trying to foil us. Ü There are four moves in the game: 1. The opponent picks m. 2. Given m, we pick a string w in L of length equal or greater than m. We are free to choose any w, subject to w ∈ L and |w| ≥ m. 3. The opponent chooses the decomposition xyz, subject to |xy| ≤ m, |y| ≥ 1. 4. We try pick i in such a way such that xy i z is not in L. If we can do so, we win the game. Pumping lemma for regular languages Ü Example Pumping lemma for regular languages Ü There is nothing in the pumping lemma that can be used for proving that a language is regular. Ü If the language is regular, then it has some properties. If these properties are missing, we can conclude the language is not regular. Ü If a language has these properties we cannot conclude that the language is regular. References LINZ, P. An introduction to Formal Languages and Automata. Jones and Bartlett Learning, 2012. HOPCROFT, J. and ULLMAN, J. Introduction to Automata Theory, Languages and Computation. Addison-Wesley, 1979.