TDDD65 Introduction to the Theory of Computation Lecture 2 Gustav Nordh, IDA gustav.nordh@liu.se 2012-08-31 Outline - Lecture 2 Closure properties of regular languages Regular expressions Equivalence of regular expressions and finite automata Pumping lemma for regular languages Closure properties of regular languages The natural numbers N = {0, 1, 2, 3, . . . } are closed under multiplication in the sense that for any natural numbers x and y, x · y is again a natural number The natural numbers are not closed under subtraction (3 − 5 = −2 which is not a natural number) Definition We say that a class of languages C is closed under an operation op if applying op to any languages from C results in a language in C Closure properties of regular languages Understanding under which operations a class of languages C is closed is important! Closure properties of regular languages Theorem The class of regular languages is closed under union (if L1 and L2 are regular languages, then so is L1 ∪ L2 ) Proof. ε ε Closure properties of regular languages Theorem The class of regular languages is closed under concatenation (if L1 and L2 are regular languages, then so is L1 L2 ) Proof. ε ε Closure properties of regular languages Theorem The class of regular languages is closed under star (if L1 is a regular language, then so is L∗1 ) Proof. ε ε ε Regular expressions Regular expressions Regular expressions Definition of regular expressions Definition L(R) denotes the language described by the regular expression R. R is a regular expression if R is ∗ 1 a for a ∈ Σ, L(a) = {a} 2 ε, L(ε) = {ε} 3 ∅, L(∅) = ∅ 4 R1 + R2 where R1 and R2 are regular expressions, L(R1 + R2 ) = L(R1 ) ∪ L(R2 ) 5 R1 R2 where R1 and R2 are regular expressions, L(R1 R2 ) = L(R1 )L(R2 ) 6 R1∗ where R1 is a regular expression, L(R1∗ ) = L(R1 )∗ has higher precedence than concatenation and +, concatenation has higher precedence than + Examples of regular expressions Example (0 + 1)∗ 0 binary strings ending with 0 (0 + 1)∗ 00(0 + 1)∗ binary strings with at least two consecutive 0’s (0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9)∗ 1234(0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9)∗ Equivalence with finite automata Theorem A language is regular if and only if some regular expression describes it Equivalence with finite automata Lemma If a language is described by a regular expression then it is recognized by a NFA Equivalence with finite automata Lemma If a language is described by a regular expression then it is recognized by a NFA Proof. R = a for a ∈ Σ, L(R) = {a} a R = ε, L(R) = {ε} R = ∅, L(R) = ∅ Equivalence with finite automata Lemma If a language is described by a regular expression then it is recognized by a NFA Proof. R = R1 + R2 , L(R) = L(R1 ) ∪ L(R2 ) ε ε Equivalence with finite automata Lemma If a language is described by a regular expression then it is recognized by a NFA Proof. R = R1 R2 , L(R) = L(R1 )L(R2 ) ε ε Equivalence with finite automata Lemma If a language is described by a regular expression then it is recognized by a NFA Proof. R = R1∗ , L(R) = L(R1 )∗ ε ε ε Equivalence with finite automata: Example Example (0 + 1)∗ to NFA 0 0 1 1 0 0+1 ε ε 1 0 (0 + 1)∗ ε ε ε ε ε 1 Equivalence with finite automata RegExp Closure Properties NFA Subset Construction DFA Equivalence with finite automata RegExp Closure Properties NFA GNFA Construction Subset Construction DFA Equivalence with finite automata Lemma If a language is recognized by a DFA then it is described by a regular expression Idea: Use a generalized NFA (GNFA) where the transition arrows can be labeled by regular expressions (aa)∗ a∗ ab ∗ q1 q2 ab + ba Equivalence with finite automata Lemma If a language is recognized by a DFA then it is described by a regular expression Idea: Use a generalized NFA (GNFA) where the transition arrows can be labeled by regular expressions Given a DFA Add a new start state with an ε transition to the old start state Add a new accept state with ε transitions from all old accept states Replace transitions of the form a, b, c by a + b + c Equivalence with finite automata Lemma If a language is recognized by a DFA then it is described by a regular expression Add a new start state with an ε transition to the old start state Add a new accept state with ε transitions from all old accept states Replace transitions of the form a, b, c by a + b + c 1 0 1 q1 q2 0 Equivalence with finite automata Lemma If a language is recognized by a DFA then it is described by a regular expression Add a new start state with an ε transition to the old start state Add a new accept state with ε transitions from all old accept states Replace transitions of the form a, b, c by a + b + c 0 1 1 qs ε q1 q2 0 ε qF Equivalence with finite automata Lemma If a language is recognized by a DFA then it is described by a regular expression Add a new start state with an ε transition to the old start state Add a new accept state with ε transitions from all old accept states Replace transitions of the form a, b, c by a + b + c Eliminate a state different from the start and accept state (reducing the number of states by 1) Equivalence with finite automata Lemma If a language is recognized by a DFA then it is described by a regular expression Eliminate a state different from the start and accept state (reducing the number of states by 1) R2 q1 R1 qe R4 R3 q2 Equivalence with finite automata Lemma If a language is recognized by a DFA then it is described by a regular expression Eliminate a state different from the start and accept state (reducing the number of states by 1) R2 q1 R1 qe R3 q2 R4 q1 R1 R2∗ R3 + R4 q2 Equivalence with finite automata Lemma If a language is recognized by a DFA then it is described by a regular expression Eliminate a state different from the start and accept state (reducing the number of states by 1) 0 1 1 qs ε q1 q2 0 ε qF Equivalence with finite automata Lemma If a language is recognized by a DFA then it is described by a regular expression Eliminate q2 0 1 1 qs ε q1 q2 0 ε qF Equivalence with finite automata Lemma If a language is recognized by a DFA then it is described by a regular expression Eliminate q2 0 1 1 qs ε q1 q2 ε qF 0 Using the rule R1 R2∗ R3 + R4 the new transition from q1 to qF is labeled 11∗ ε + ∅ Equivalence with finite automata Lemma If a language is recognized by a DFA then it is described by a regular expression Eliminate q2 0 1 1 qs ε q1 q2 ε qF 0 Using the rule R1 R2∗ R3 + R4 the new transition from q1 to qF is labeled 11∗ ε + ∅ Using the rule R1 R2∗ R3 + R4 the new transition from q1 to q1 is labeled 11∗ 0 + 0 Equivalence with finite automata Lemma If a language is recognized by a DFA then it is described by a regular expression Eliminate q2 0 1 1 qs ε q1 q2 ε qF 0 11∗ 0 + 0 qs ε q1 11∗ ε + ∅ qF Equivalence with finite automata Lemma If a language is recognized by a DFA then it is described by a regular expression Eliminate q1 11∗ 0 + 0 qs ε q1 11∗ ε + ∅ qF Equivalence with finite automata Lemma If a language is recognized by a DFA then it is described by a regular expression Eliminate q1 11∗ 0 + 0 qs ε q1 11∗ ε + ∅ qF Using the rule R1 R2∗ R3 + R4 the new transition from qs to qF is labeled ε(11∗ 0 + 0)∗ (11∗ ε + ∅) + ∅ Equivalence with finite automata Lemma If a language is recognized by a DFA then it is described by a regular expression Eliminate q1 11∗ 0 + 0 qs qs ε q1 11∗ ε + ∅ ε(11∗ 0 + 0)∗ (11∗ ε + ∅) + ∅ qF qF Equivalence with finite automata RegExp Closure Properties NFA GNFA Construction Subset Construction DFA Nonregular languages Nonregular languages Example L = {an b n | n ≥ 0} is not a regular language Why? Nonregular languages Example L = {an b n | n ≥ 0} is not a regular language Why? Imagine a DFA that recognize L When reading a string the DFA needs to keep track of the number of a’s seen so far so that it can check that the same number of b’s follow The number of a’s is unbounded so the DFA needs to remember an unbounded number This is not possible since the memory of the DFA is bounded (finite number of states) Nonregular languages, pumping Given a sufficiently long string s in a regular language L Then the DFA that recognizes L will be in the same state qt several times s = s1 · · · sj−1 sj · · · sk −1 sk · · · sn ∈ L | {z } | {z } | {z } qt qt qaccept Nonregular languages, pumping Given a sufficiently long string s in a regular language L Then the DFA that recognizes L will be in the same state qt several times s = s1 · · · sj−1 sj · · · sk −1 sk · · · sn ∈ L | {z } | {z } | {z } qt qt qaccept So s′ = s1 · · · sj−1 (sj · · · sk −1 )i sk · · · sn ∈ L | {z } | {z } | {z } qt for any i ≥ 0 qt qaccept The Pumping Lemma for Regular Languages Lemma If L is a regular language, then there exists a positive integer p (the pumping length) such that every string s ∈ L, |s| ≥ p, can be partitioned into three pieces, s = xyz, such that the following conditions hold: |y| > 0, |xy| ≤ p, and for each i ≥ 0, xy i z ∈ L Proof of the Pumping Lemma Lemma If L is a regular language, then there exists a positive integer p (the pumping length) such that every string s ∈ L, |s| ≥ p, can be partitioned into three pieces, s = xyz, such that the following conditions hold: |y | > 0, |xy | ≤ p, and for each i ≥ 0, xy i z ∈ L Proof. If L is regular, let M be a DFA recognizing L. Let p be the number of states of M and let s = s1 s2 · · · sn ∈ L with n ≥ p. Let r1 · · · rn+1 be the sequence of states that M enters while reading s. Among the first p + 1 states in this sequence, two must be the same. Let rj = rk (j < k) be such two. Let x = s1 · · · sj−1 , y = sj · · · sk −1 , and z = sk · · · sn . Proof of the Pumping Lemma Lemma If L is a regular language, then there exists a positive integer p (the pumping length) such that every string s ∈ L, |s| ≥ p, can be partitioned into three pieces, s = xyz, such that the following conditions hold: |y | > 0, |xy | ≤ p, and for each i ≥ 0, xy i z ∈ L Proof. If L is regular, let M be a DFA recognizing L. Let p be the number of states of M and let s = s1 s2 · · · sn ∈ L with n ≥ p. Let r1 · · · rn+1 be the sequence of states that M enters while reading s. Among the first p + 1 states in this sequence, two must be the same. Let rj = rk (j < k) be such two. Let x = s1 · · · sj−1 , y = sj · · · sk −1 , and z = sk · · · sn . y = sj · · · sk −1 x = s1 · · · sj−1 r1 z = sk · · · sn rj = rk rn+1 Proof of the Pumping Lemma Lemma If L is a regular language, then there exists a positive integer p (the pumping length) such that every string s ∈ L, |s| ≥ p, can be partitioned into three pieces, s = xyz, such that the following conditions hold: |y | > 0, |xy | ≤ p, and for each i ≥ 0, xy i z ∈ L Proof. If L is regular, let M be a DFA recognizing L. Let p be the number of states of M and let s = s1 s2 · · · sn ∈ L with n ≥ p. Let r1 · · · rn+1 be the sequence of states that M enters while reading s. Among the first p + 1 states in this sequence, two must be the same. Let rj = rk (j < k) be such two. Let x = s1 · · · sj−1 , y = sj · · · sk −1 , and z = sk · · · sn . yi , i ≥ 0 x = s1 · · · sj−1 r1 z = sk · · · sn rj = rk rn+1 Proof of the Pumping Lemma Lemma If L is a regular language, then there exists a positive integer p (the pumping length) such that every string s ∈ L, |s| ≥ p, can be partitioned into three pieces, s = xyz, such that the following conditions hold: |y | > 0, |xy | ≤ p, and for each i ≥ 0, xy i z ∈ L Proof. If L is regular, let M be a DFA recognizing L. Let p be the number of states of M and let s = s1 s2 · · · sn ∈ L with n ≥ p. Let r1 · · · rn+1 be the sequence of states that M enters while reading s. Among the first p + 1 states in this sequence, two must be the same. Let rj = rk (j < k) be such two. Let x = s1 · · · sj−1 , y = sj · · · sk −1 , and z = sk · · · sn . We have xy i z ∈ L for each i ≥ 0. Proof of the Pumping Lemma Lemma If L is a regular language, then there exists a positive integer p (the pumping length) such that every string s ∈ L, |s| ≥ p, can be partitioned into three pieces, s = xyz, such that the following conditions hold: |y | > 0, |xy | ≤ p, and for each i ≥ 0, xy i z ∈ L Proof. If L is regular, let M be a DFA recognizing L. Let p be the number of states of M and let s = s1 s2 · · · sn ∈ L with n ≥ p. Let r1 · · · rn+1 be the sequence of states that M enters while reading s. Among the first p + 1 states in this sequence, two must be the same. Let rj = rk (j < k) be such two. Let x = s1 · · · sj−1 , y = sj · · · sk −1 , and z = sk · · · sn . We have |y| > 0 (since j 6= k). We have |xy| = |s1 · · · sk −1 | ≤ p since rk occurs among the first p + 1 places in the sequence of states (so k ≤ p + 1). Strategy for showing that a language L is not regular 1 2 Assume that L is regular (with the aim of reaching a contradiction). Choose a string s ∈ L, such that |s| ≥ p where p is the pumping length given by the Pumping Lemma. The Pumping Lemma then says that s can be partitioned into three pieces s = xyz where |y | > 0 and |xy | ≤ p, such that xy i z ∈ L for all i ≥ 0. 3 Show that for ALL POSSIBLE partitions of s = xyz satisfying |y| > 0 and |xy| ≤ p there exists an i such that xy i z ∈ / L. If we succeed to show this, then we have a contradiction with the Pumping Lemma and our assumption that L is regular is wrong. The standard example Example L = {an b n | n ≥ 0} is not regular. Proof. 1 2 Assume that L is regular Choose s = ap b p where p is the pumping length given by the Pumping Lemma. s ∈ L and |s| ≥ p, so the Pumping Lemma says that s can be partitioned into three pieces s = xyz where |y | > 0 and |xy | ≤ p, such that xy i z ∈ L for all i ≥ 0. 3 s = ap b p = xyz where |y| > 0 and |xy| ≤ p. For all such partitions of s, y is a string of a’s of length at least 1. Choose i = 2, xy 2 z contains more a’s than b’s, thus, xy 2 z ∈ / L. This contradicts the Pumping Lemma, hence, our assumption that L is regular is wrong. Regular languages: Summary Regular languages are the languages recognized by DFAs For every NFA there is an equivalent DFA The subset construction A language can be described by a regular expression if and only if it can be recognized by a DFA GNFA construction, closure properties There are simple non-regular languages Pumping lemma