TDDD65 Introduction to the Theory of Computation Lecture 2 Gustav Nordh, IDA

advertisement
TDDD65
Introduction to the Theory of Computation
Lecture 2
Gustav Nordh, IDA
gustav.nordh@liu.se
2012-08-31
Outline - Lecture 2
Closure properties of regular languages
Regular expressions
Equivalence of regular expressions and finite automata
Pumping lemma for regular languages
Closure properties of regular languages
The natural numbers N = {0, 1, 2, 3, . . . } are closed under
multiplication in the sense that for any natural numbers x and y,
x · y is again a natural number
The natural numbers are not closed under subtraction
(3 − 5 = −2 which is not a natural number)
Definition
We say that a class of languages C is closed under an
operation op if applying op to any languages from C results in a
language in C
Closure properties of regular languages
Understanding under which operations a class of languages C
is closed is important!
Closure properties of regular languages
Theorem
The class of regular languages is closed under union (if L1 and
L2 are regular languages, then so is L1 ∪ L2 )
Proof.
ε
ε
Closure properties of regular languages
Theorem
The class of regular languages is closed under concatenation
(if L1 and L2 are regular languages, then so is L1 L2 )
Proof.
ε
ε
Closure properties of regular languages
Theorem
The class of regular languages is closed under star (if L1 is a
regular language, then so is L∗1 )
Proof.
ε
ε
ε
Regular expressions
Regular expressions
Regular expressions
Definition of regular expressions
Definition
L(R) denotes the language described by the regular expression
R.
R is a regular expression if R is
∗
1
a for a ∈ Σ, L(a) = {a}
2
ε, L(ε) = {ε}
3
∅, L(∅) = ∅
4
R1 + R2 where R1 and R2 are regular expressions,
L(R1 + R2 ) = L(R1 ) ∪ L(R2 )
5
R1 R2 where R1 and R2 are regular expressions,
L(R1 R2 ) = L(R1 )L(R2 )
6
R1∗ where R1 is a regular expression, L(R1∗ ) = L(R1 )∗
has higher precedence than concatenation and +,
concatenation has higher precedence than +
Examples of regular expressions
Example
(0 + 1)∗ 0 binary strings ending with 0
(0 + 1)∗ 00(0 + 1)∗ binary strings with at least two
consecutive 0’s
(0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9)∗ 1234(0 + 1 + 2 +
3 + 4 + 5 + 6 + 7 + 8 + 9)∗
Equivalence with finite automata
Theorem
A language is regular if and only if some regular expression
describes it
Equivalence with finite automata
Lemma
If a language is described by a regular expression then it is
recognized by a NFA
Equivalence with finite automata
Lemma
If a language is described by a regular expression then it is
recognized by a NFA
Proof.
R = a for a ∈ Σ, L(R) = {a}
a
R = ε, L(R) = {ε}
R = ∅, L(R) = ∅
Equivalence with finite automata
Lemma
If a language is described by a regular expression then it is
recognized by a NFA
Proof.
R = R1 + R2 , L(R) = L(R1 ) ∪ L(R2 )
ε
ε
Equivalence with finite automata
Lemma
If a language is described by a regular expression then it is
recognized by a NFA
Proof.
R = R1 R2 , L(R) = L(R1 )L(R2 )
ε
ε
Equivalence with finite automata
Lemma
If a language is described by a regular expression then it is
recognized by a NFA
Proof.
R = R1∗ , L(R) = L(R1 )∗
ε
ε
ε
Equivalence with finite automata: Example
Example
(0 + 1)∗ to NFA
0
0
1
1
0
0+1
ε
ε
1
0
(0 + 1)∗
ε
ε
ε
ε
ε
1
Equivalence with finite automata
RegExp
Closure
Properties
NFA
Subset
Construction
DFA
Equivalence with finite automata
RegExp
Closure
Properties
NFA
GNFA
Construction
Subset
Construction
DFA
Equivalence with finite automata
Lemma
If a language is recognized by a DFA then it is described by a
regular expression
Idea: Use a generalized NFA (GNFA) where the transition
arrows can be labeled by regular expressions
(aa)∗
a∗
ab ∗
q1
q2
ab + ba
Equivalence with finite automata
Lemma
If a language is recognized by a DFA then it is described by a
regular expression
Idea: Use a generalized NFA (GNFA) where the transition
arrows can be labeled by regular expressions
Given a DFA
Add a new start state with an ε transition to the old start
state
Add a new accept state with ε transitions from all old
accept states
Replace transitions of the form a, b, c by a + b + c
Equivalence with finite automata
Lemma
If a language is recognized by a DFA then it is described by a
regular expression
Add a new start state with an ε transition to the old start
state
Add a new accept state with ε transitions from all old
accept states
Replace transitions of the form a, b, c by a + b + c
1
0
1
q1
q2
0
Equivalence with finite automata
Lemma
If a language is recognized by a DFA then it is described by a
regular expression
Add a new start state with an ε transition to the old start
state
Add a new accept state with ε transitions from all old
accept states
Replace transitions of the form a, b, c by a + b + c
0
1
1
qs
ε
q1
q2
0
ε
qF
Equivalence with finite automata
Lemma
If a language is recognized by a DFA then it is described by a
regular expression
Add a new start state with an ε transition to the old start
state
Add a new accept state with ε transitions from all old
accept states
Replace transitions of the form a, b, c by a + b + c
Eliminate a state different from the start and accept state
(reducing the number of states by 1)
Equivalence with finite automata
Lemma
If a language is recognized by a DFA then it is described by a
regular expression
Eliminate a state different from the start and accept state
(reducing the number of states by 1)
R2
q1
R1
qe
R4
R3
q2
Equivalence with finite automata
Lemma
If a language is recognized by a DFA then it is described by a
regular expression
Eliminate a state different from the start and accept state
(reducing the number of states by 1)
R2
q1
R1
qe
R3
q2
R4
q1
R1 R2∗ R3 + R4
q2
Equivalence with finite automata
Lemma
If a language is recognized by a DFA then it is described by a
regular expression
Eliminate a state different from the start and accept state
(reducing the number of states by 1)
0
1
1
qs
ε
q1
q2
0
ε
qF
Equivalence with finite automata
Lemma
If a language is recognized by a DFA then it is described by a
regular expression
Eliminate q2
0
1
1
qs
ε
q1
q2
0
ε
qF
Equivalence with finite automata
Lemma
If a language is recognized by a DFA then it is described by a
regular expression
Eliminate q2
0
1
1
qs
ε
q1
q2
ε
qF
0
Using the rule R1 R2∗ R3 + R4 the new transition from q1 to qF is
labeled 11∗ ε + ∅
Equivalence with finite automata
Lemma
If a language is recognized by a DFA then it is described by a
regular expression
Eliminate q2
0
1
1
qs
ε
q1
q2
ε
qF
0
Using the rule R1 R2∗ R3 + R4 the new transition from q1 to qF is
labeled 11∗ ε + ∅
Using the rule R1 R2∗ R3 + R4 the new transition from q1 to q1 is
labeled 11∗ 0 + 0
Equivalence with finite automata
Lemma
If a language is recognized by a DFA then it is described by a
regular expression
Eliminate q2
0
1
1
qs
ε
q1
q2
ε
qF
0
11∗ 0 + 0
qs
ε
q1
11∗ ε + ∅
qF
Equivalence with finite automata
Lemma
If a language is recognized by a DFA then it is described by a
regular expression
Eliminate q1
11∗ 0 + 0
qs
ε
q1
11∗ ε + ∅
qF
Equivalence with finite automata
Lemma
If a language is recognized by a DFA then it is described by a
regular expression
Eliminate q1
11∗ 0 + 0
qs
ε
q1
11∗ ε + ∅
qF
Using the rule R1 R2∗ R3 + R4 the new transition from qs to qF is
labeled ε(11∗ 0 + 0)∗ (11∗ ε + ∅) + ∅
Equivalence with finite automata
Lemma
If a language is recognized by a DFA then it is described by a
regular expression
Eliminate q1
11∗ 0 + 0
qs
qs
ε
q1
11∗ ε + ∅
ε(11∗ 0 + 0)∗ (11∗ ε + ∅) + ∅
qF
qF
Equivalence with finite automata
RegExp
Closure
Properties
NFA
GNFA
Construction
Subset
Construction
DFA
Nonregular languages
Nonregular languages
Example
L = {an b n | n ≥ 0} is not a regular language
Why?
Nonregular languages
Example
L = {an b n | n ≥ 0} is not a regular language
Why?
Imagine a DFA that recognize L
When reading a string the DFA needs to keep track of the
number of a’s seen so far so that it can check that the
same number of b’s follow
The number of a’s is unbounded so the DFA needs to
remember an unbounded number
This is not possible since the memory of the DFA is
bounded (finite number of states)
Nonregular languages, pumping
Given a sufficiently long string s in a regular language L
Then the DFA that recognizes L will be in the same state qt
several times
s = s1 · · · sj−1 sj · · · sk −1 sk · · · sn ∈ L
| {z } | {z } | {z }
qt
qt
qaccept
Nonregular languages, pumping
Given a sufficiently long string s in a regular language L
Then the DFA that recognizes L will be in the same state qt
several times
s = s1 · · · sj−1 sj · · · sk −1 sk · · · sn ∈ L
| {z } | {z } | {z }
qt
qt
qaccept
So
s′ = s1 · · · sj−1 (sj · · · sk −1 )i sk · · · sn ∈ L
| {z } | {z } | {z }
qt
for any i ≥ 0
qt
qaccept
The Pumping Lemma for Regular Languages
Lemma
If L is a regular language, then there exists a positive integer p
(the pumping length) such that every string s ∈ L, |s| ≥ p, can
be partitioned into three pieces, s = xyz, such that the following
conditions hold:
|y| > 0,
|xy| ≤ p, and
for each i ≥ 0, xy i z ∈ L
Proof of the Pumping Lemma
Lemma
If L is a regular language, then there exists a positive integer p (the pumping length) such that every string s ∈ L,
|s| ≥ p, can be partitioned into three pieces, s = xyz, such that the following conditions hold: |y | > 0, |xy | ≤ p,
and for each i ≥ 0, xy i z ∈ L
Proof.
If L is regular, let M be a DFA recognizing L. Let p be the
number of states of M and let s = s1 s2 · · · sn ∈ L with n ≥ p. Let
r1 · · · rn+1 be the sequence of states that M enters while
reading s. Among the first p + 1 states in this sequence, two
must be the same. Let rj = rk (j < k) be such two. Let
x = s1 · · · sj−1 , y = sj · · · sk −1 , and z = sk · · · sn .
Proof of the Pumping Lemma
Lemma
If L is a regular language, then there exists a positive integer p (the pumping length) such that every string s ∈ L,
|s| ≥ p, can be partitioned into three pieces, s = xyz, such that the following conditions hold: |y | > 0, |xy | ≤ p,
and for each i ≥ 0, xy i z ∈ L
Proof.
If L is regular, let M be a DFA recognizing L. Let p be the
number of states of M and let s = s1 s2 · · · sn ∈ L with n ≥ p. Let
r1 · · · rn+1 be the sequence of states that M enters while
reading s. Among the first p + 1 states in this sequence, two
must be the same. Let rj = rk (j < k) be such two. Let
x = s1 · · · sj−1 , y = sj · · · sk −1 , and z = sk · · · sn .
y = sj · · · sk −1
x = s1 · · · sj−1
r1
z = sk · · · sn
rj = rk
rn+1
Proof of the Pumping Lemma
Lemma
If L is a regular language, then there exists a positive integer p (the pumping length) such that every string s ∈ L,
|s| ≥ p, can be partitioned into three pieces, s = xyz, such that the following conditions hold: |y | > 0, |xy | ≤ p,
and for each i ≥ 0, xy i z ∈ L
Proof.
If L is regular, let M be a DFA recognizing L. Let p be the
number of states of M and let s = s1 s2 · · · sn ∈ L with n ≥ p. Let
r1 · · · rn+1 be the sequence of states that M enters while
reading s. Among the first p + 1 states in this sequence, two
must be the same. Let rj = rk (j < k) be such two. Let
x = s1 · · · sj−1 , y = sj · · · sk −1 , and z = sk · · · sn .
yi , i ≥ 0
x = s1 · · · sj−1
r1
z = sk · · · sn
rj = rk
rn+1
Proof of the Pumping Lemma
Lemma
If L is a regular language, then there exists a positive integer p (the pumping length) such that every string s ∈ L,
|s| ≥ p, can be partitioned into three pieces, s = xyz, such that the following conditions hold: |y | > 0, |xy | ≤ p,
and for each i ≥ 0, xy i z ∈ L
Proof.
If L is regular, let M be a DFA recognizing L. Let p be the
number of states of M and let s = s1 s2 · · · sn ∈ L with n ≥ p. Let
r1 · · · rn+1 be the sequence of states that M enters while
reading s. Among the first p + 1 states in this sequence, two
must be the same. Let rj = rk (j < k) be such two. Let
x = s1 · · · sj−1 , y = sj · · · sk −1 , and z = sk · · · sn .
We have xy i z ∈ L for each i ≥ 0.
Proof of the Pumping Lemma
Lemma
If L is a regular language, then there exists a positive integer p (the pumping length) such that every string s ∈ L,
|s| ≥ p, can be partitioned into three pieces, s = xyz, such that the following conditions hold: |y | > 0, |xy | ≤ p,
and for each i ≥ 0, xy i z ∈ L
Proof.
If L is regular, let M be a DFA recognizing L. Let p be the
number of states of M and let s = s1 s2 · · · sn ∈ L with n ≥ p. Let
r1 · · · rn+1 be the sequence of states that M enters while
reading s. Among the first p + 1 states in this sequence, two
must be the same. Let rj = rk (j < k) be such two. Let
x = s1 · · · sj−1 , y = sj · · · sk −1 , and z = sk · · · sn .
We have |y| > 0 (since j 6= k).
We have |xy| = |s1 · · · sk −1 | ≤ p since rk occurs among the first
p + 1 places in the sequence of states (so k ≤ p + 1).
Strategy for showing that a language L is not regular
1
2
Assume that L is regular (with the aim of reaching a
contradiction).
Choose a string s ∈ L, such that |s| ≥ p where p is the
pumping length given by the Pumping Lemma.
The Pumping Lemma then says that s can be partitioned
into three pieces s = xyz where |y | > 0 and |xy | ≤ p, such
that xy i z ∈ L for all i ≥ 0.
3
Show that for ALL POSSIBLE partitions of s = xyz
satisfying |y| > 0 and |xy| ≤ p there exists an i such that
xy i z ∈
/ L.
If we succeed to show this, then we have a contradiction
with the Pumping Lemma and our assumption that L is
regular is wrong.
The standard example
Example
L = {an b n | n ≥ 0} is not regular.
Proof.
1
2
Assume that L is regular
Choose s = ap b p where p is the pumping length given by
the Pumping Lemma.
s ∈ L and |s| ≥ p, so the Pumping Lemma says that s can
be partitioned into three pieces s = xyz where |y | > 0 and
|xy | ≤ p, such that xy i z ∈ L for all i ≥ 0.
3
s = ap b p = xyz where |y| > 0 and |xy| ≤ p. For all such
partitions of s, y is a string of a’s of length at least 1.
Choose i = 2, xy 2 z contains more a’s than b’s, thus,
xy 2 z ∈
/ L.
This contradicts the Pumping Lemma, hence, our
assumption that L is regular is wrong.
Regular languages: Summary
Regular languages are the languages recognized by DFAs
For every NFA there is an equivalent DFA
The subset construction
A language can be described by a regular expression if
and only if it can be recognized by a DFA
GNFA construction, closure properties
There are simple non-regular languages
Pumping lemma
Download