Document

advertisement
Chapter 6
Context-Free
and NonContext-Free
Languages
1
Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
The Pumping Lemma for Context-Free
Languages
• It’s easy to find a language that cannot be accepted by
a finite automaton, even if proving it is a little harder
– For example, AnBn cannot be accepted by a FA,
because with only a finite number of states, we can’t
keep track of how many a’s we’ve seen
– It might be argued, in a similar way, that neither
AnBnCn = {anbncn | n  0} nor XX = {xx | x  {a,b}*} can
be accepted by a PDA
Introduction to Computation
2
The Pumping Lemma for Context-Free
Languages (cont’d.)
• The way a PDA processes aibjck allows it to confirm
that i = j but not to remember that number long
enough to compare it to k
• One way to prove AnBn is not regular is to use the
pumping lemma for regular languages
• Now we’ll establish a result for CFLs that is similar to
the pumping lemma, but a little more complicated
• The basic idea is that a sufficiently long derivation in
a grammar G will have to contain a self-embedded
variable
Introduction to Computation
3
The Pumping Lemma for Context-Free
Languages (cont’d.)
• For instance, in S* vAz * vwAyz * vwxyz, the
string derived from the first occurrence of A also
includes an occurrence of A
– S * vAz * vwAyz * vwkAykz * vwkxykz
must also be a valid derivation, for every k  1, and
S * vAz * vxz = vw0xy0z is also valid
– This observation will be useful if we can guarantee
that the strings w and y are not both null, and even
more useful if we can impose some other restrictions
on the five strings v, w, x, y, and z of terminals
– We do this by requiring that the grammar be in
Chomsky normal form (see Chapter 4)
Introduction to Computation
4
The Pumping Lemma for Context-Free
Languages (cont’d.)
• Theorem 6.1: Suppose L is a CFL
– Then there is an integer n so that for every u  L with
|u|  n, u can be written as u = vwxyz so that:
• |wy| > 0
• |wxy|  n
• For every m  0, vwmxymz  L
• Proof:
– We can find a CFG G so that L(G) = L - {} and G is in
Chomsky normal form, so that the right side of every
production is either a single terminal or a string of two
variables
Introduction to Computation
5
The Pumping Lemma for Context-Free
Languages (cont’d.)
• Every derivation tree in this grammar is then a
binary tree
• A binary tree of height h has no more than 2h leaf
nodes
– Therefore if u  L(G) and h is the height of the
derivation tree for u, then |u|  2h
• Let n be 2p+1 where p is the number of distinct
variables in G, and suppose that u is a string in L(G) of
length at least n
– Then it follows that every derivation tree for u must
have height greater than p
Introduction to Computation
6
The Pumping Lemma for Context-Free
Languages (cont’d.)
• Thus, in a derivation tree for u, there must be a path
from the root to a leaf node with at least p+1 interior
nodes
– That path must include the same variable twice; call it
A
– Let x be the substring of u derived from the lowest
occurrence of A in the path, and let w and y be the
strings of terminals such that the substring of u
derived from the occurrence of A farther from the leaf
is wxy
– Finally, let v and z be the prefix and suffix of u so that
u = vwxyz
Introduction to Computation
7
The Pumping Lemma for Context-Free
Languages (cont’d.)
• The subtree starting at the higher occurrence of A has
height at most p+1, thus |wxy|  2p+1=n
• The leaf nodes corresponding to the symbols of x are
descendants of only one of the two children of the
higher occurrence of A
• Because G is in Chomsky normal form, the other child
also has descendant nodes
– Therefore, w and y can’t both be . Finally, we have
S * vAz * vwAyz * vwxyz, and we’ve already seen
how this establishes the third part of the theorem
Introduction to Computation
8
The Pumping Lemma for Context-Free
Languages (cont’d.)
• Applying the pumping lemma to AnBnCn
– Suppose, for the sake of contradiction, that AnBnCn is a
context-free language, and let n be the integer in the
pumping lemma
• Let u be the string anbncn
– Then u  AnBnCn and |u|  n
– Therefore, according to the pumping lemma, u=vwxyz
for some strings satisfying the three conditions
– The first condition, |wy| > 0, implies that the string wy
contains at least one symbol
Introduction to Computation
9
The Pumping Lemma for Context-Free
Languages (cont’d.)
• Let u be the string anbncn (cont’d.)
– The second, |wxy|  n, implies that wxy contains no
more than two distinct symbols.
– If 1 is one of the three symbols that occurs in wy and
2 is one that doesn’t, then the string vw0xy0z obtained
from u by deleting w and y contains fewer than n
occurrences of 1 and exactly n occurrences of 2
– This is a contradiction because the third condition
implies that vw0xy0z is in AnBnCn and so must have
equal numbers of all three symbols
Introduction to Computation
10
The Pumping Lemma for Context-Free
Languages (cont’d.)
• Theorem 6.7, Ogden’s Lemma (a stronger version of the
pumping lemma):
– Suppose L is a CFL. Then there is an integer n so that for
every u  L with |u|  n, and every choice of n or more
“distinguished” positions in the string u, there are strings
v, w, x, y, and z so that u = vwxyz and the following
conditions are satisfied:
• wy contains at least one symbol in a distinguished
position
• wxy contains n or fewer symbols in distinguished
positions
• For every m  0, vwmxymz  L
• Proof: see book
Introduction to Computation
11
Intersections and Complements of CFLs
• The set of CFLs, like the set of regular languages, is
closed under the operations of union, concatenation,
and Kleene *
• Unlike the set of regular languages, it is not closed
under intersection or difference
• Consider AnBnCn = {anbncn | n  0}
– This set is {aibick | i, k  0} ∩ {aibjcj | i, j  0}
– The two simpler languages are CFLs but their
intersection is not
Introduction to Computation
12
Intersections and Complements of CFLs
• We know that XX = {xx | x  {a,b}*} is not a CFL
– Surprisingly, its complement is
• Let L be the complement of XX, i.e., L = {a,b}* - XX
– All odd-length strings are in L
– If x  L and |x| = 2n for some n  1, then for some k
with 1  k  n, the kth and (n+k)th symbols are different
(say a and b, respectively)
– There are k -1 symbols before the a, n -1 symbols
between them, and n - k symbols after the b
Introduction to Computation
13
Intersections and Complements of CFLs
(cont’d.)
• L = {a,b}* - XX (cont’d.)
– Think of the n -1 symbols between the two as k -1 and
then n - k symbols.
– This means that x is the concatenation of two oddlength strings, one with a in the middle and k - 1
symbols on either side, and one with b in the middle
and n - k symbols on either side.
– Furthermore, every such string is in L
– Let G be the context-free grammar with productions
S  A | B | AB | BA A  EAE | a
B  EBE | b E  a | b
Introduction to Computation
14
Intersections and Complements of CFLs
(cont’d.)
• L = {a,b}* - XX (cont’d.)
– The variables A and B generate odd-length strings with
middle symbol a and b, respectively, and together
generate all odd-length strings
– From AB and BA we can derive all the even-length
elements of L
– Therefore L = L(G), and L is a CFL whose complement
(i.e., XX) is not a CFL
Introduction to Computation
15
Intersections and Complements of CFLs
(cont’d.)
• Theorem 6.13: If L1 is a CFL and L2 is a regular
language, then L1 ∩ L2 is a CFL.
• Proof: Let M1= (Q1, , , q1, Z0, A1, 1) be a PDA accepting
L1 and M2 = (Q2, , q2, A2, 2) an FA accepting L2. The
intuitive idea is that because the two involve only one
stack between them, we can use the same construction
involving the Cartesian product as in Theorem 2.15
• Define M=(Q1Q2, , ,(q1,q2), Z0, A1A2, ) as follows:
– For , ((p, q), , Z) is the set of pairs ((p’, q’), )
for which (p’,)  1(p, , Z) and 2(q, ) = q’
– ((p, q), , Z) is the set of pairs ((p’, q),  for which
(p’, )  1(p, , Z)
Introduction to Computation
16
Intersections and Complements of CFLs
(cont’d.)
• M=(Q1Q2, , , (q1,q2), Z0, A1A2, ) (cont’d.)
– This allows M to simulate the computation of M1
because for each move, M consults the state of M1, the
input, and the stack
– It also allows M to simulate the computation of M2,
which requires only the state of M2 and the input
• M is nondeterministic if M1 is, but this does not affect
the second part of the state-pair
– If M1 makes a -transition, so does M, but the second
component of the state-pair is unchanged
– The stack is used as if it were the stack of M1
Introduction to Computation
17
Intersections and Complements of CFLs
(cont’d.)
• The rest of the proof depends on the following fact:
for every state-pair (p, q), every string  of stack
symbols, and every integer n  0, these two
statements are equivalent:
– (q1, yz, Z1) ⊢M1n (p, z, ) and 2*(q2, y) = q
– ((q1, q2), yz, Z1) ⊢M n ((p, q), z, )
• Both directions can be proved by a straightforward
induction argument
• See the book for details
Introduction to Computation
18
Intersections and Complements of CFLs
(cont’d.)
• Thinking about nondeterminism helps to understand
how it might happen that no PDA can accept
precisely the strings in L’, even if there is a PDA that
accepts precisely the strings in L
• Example:
– A PDA M might be able to choose between two
sequences of moves on an input string x, so that both
choices read all the symbols of x but only one causes M
to end up in an accepting state
– In this case, the PDA obtained from M by reversing the
accepting and nonaccepting states will still accept x
Introduction to Computation
19
Intersections and Complements of CFLs
(cont’d.)
• Even if M is a deterministic PDA that accepts L, the
presence of -transitions might prevent the PDA M’
obtained from M by reversing the accepting and
nonaccepting states from accepting L’
• For a DPDA M without -transitions, the machine M’
obtained by reversing the accepting and nonaccepting states of M accepts the complement of L(M)
• The complement of an arbitrary language accepted
by a DPDA can be accepted by a DPDA, though the
proof is not quite as obvious
Introduction to Computation
20
Decision Problems Involving ContextFree Languages
• The membership problem for CFLs is the decision
problem:
– Given a CFG G and a string x, is x  L(G)?
• For regular languages we would have either an FA to
start with or a regular expression from which we could
obtain one, and the question would be easy to answer:
Just run the FA on the string x
• Trying to use this approach for a CFL or a PDA would be
more complicated, because a PDA may have
nondeterminism that cannot be eliminated
Introduction to Computation
21
Decision Problems Involving ContextFree Languages (cont’d.)
• There is an algorithm to solve the membership
problem starting with a CFG G that generates L
– If x =  just see whether the start variable is nullable
– Otherwise, we have an algorithm to find a CFG G1 with
no -productions or unit productions so that
L(G1) = L(G) - {}, and we can decide whether
x  L(G1) by trying all possible derivations in G1 with
2|x| - 1 or fewer steps
Introduction to Computation
22
Decision Problems Involving ContextFree Languages (cont’d.)
• Other interesting decision problems include these:
– Given a CFL L, is L nonempty?
– Given a CFL L, is L infinite?
• We can use the pumping lemma for CFGs to solve these
problems, just as we used the pumping lemma for regular
languages to solve the corresponding problems for finite
automata
• We will see that some easy-to-state problems involving CFGs,
such as
– Given CFGs G1 and G2 is L(G1) ∩ L(G2) nonempty?
– Given CFGs G1 and G2 is L(G1)  L(G2)?
turn out to be undecidable
Introduction to Computation
23
Download