Equivalence of PDA and CFG

advertisement
Lecture 17
Oct 25, 2011
• Section 2.1 (push-down automata)
• Section 2.2 (pumping lemma for context-free
languages)
Pushdown Automata
Pushdown automata are for context-free languages
what finite automata are for regular languages.
PDAs are recognizing automata that have a
single stack (= memory):
Last-In First-Out pushing and popping
Note: PDAs are nondeterministic.
Informal Description PDA (1)
input w = 00100100111100101
internal state
set Q
stack
x
y
y
z
x
The PDA M reads w
and stack element.
Depending on
- input wi  ,
- stack sj  , and
- state qk  Q
the PDA M:
- jumps to a new state,
- pushes an element 
(nondeterministically)
Informal Description PDA (2)
input w = 00100100111100101
internal state
set Q
After the PDA has
read complete input,
M will be in state  Q
x
y
y
z
x
If possible to end in
accepting state FQ,
then M accepts w
stack
Formal Description PDA
A Pushdown Automata M is defined by a
six tuple (Q,,,,q0,F), with
• Q finite set of states
•  finite input alphabet
•  finite stack alphabet
• q0 start state  Q
• F set of accepting states Q
•  transition function
: Q      P (Q  )
PDA for L = { 0n1n | n0 }
Example 2.14:
The PDA first pushes “ $ 0n ” on stack.
Then, while reading the 1n string, the
zeros are popped again.
If, in the end, $ is left on stack, then “accept”
q1
q4
, $
0, 0
q2
1, 0
, $
q3
1, 0
Machine Diagram for 0n1n
q1
q4
, $
0, 0
q2
1, 0
, $
q3
1, 0
On w = 000111 (state; stack) evolution:
(q1; )  (q2; $)  (q2; 0$)  (q2; 00$)
 (q2; 000$)  (q3; 00$)  (q3; 0$)  (q3; $)
 (q4; ) This final q4 is an accepting state
Machine Diagram for 0n1n
q1
q4
, $
0, 0
q2
1, 0
, $
q3
1, 0
On w = 0101 (state; stack) evolution:
(q1; )  (q2; $)  (q2; 0$)  (q3; $)  (q4; ) …
But we still have part of input “01”.
There is no accepting path.
Another Example of a PDA
Another example of PDA
Consider the language over the alphabet {a, b}:
L = { w | #a(w) = #b(w) }
(#a(w) stands for the number of a’s in w.)
PDA design intuition: push a symbol 1 on seeing a’s, pop
on seeing b’s.
Problem: what if we see a lot of b’s in the start, and a’s
come later?
Can change the role. Push on b, pop on a.
Need to know which one – using two different states.
Another example of PDA
Consider the language over the alphabet {a, b}:
L = { w | #a(w) = #b(w) }
One more PDA – for even length palindromes
L = { w wR | w is in {0, 1}* }
PDAs versus CFL
Theorem 2.20: A language L is context-free if and only if
there is a pushdown automata M that recognizes L.
Two step proof:
1) Given a CFG G, construct a PDA MG
2) Given a PDA M, make a CFG GM
Equivalence of PDA and CFG (0)
Part 1: For every CFG, we can build an equivalent PDA.
General construction: each rule of CFG A  w is
included in the PDA’s move.
Equivalence of PDA and CFG (1)
Part 1: For every CFG, we can build an equivalent PDA.
Example: (page 115 of text)
NPDA, CFG equivalence
Proof of (): L is recognized by a NPDA
implies L is described by a CFG.
– harder direction
– first step: convert NPDA into “normal form”:
• single accept state
• empties stack before accepting
• each transition either pushes or pops a symbol
2011
NPDA, CFG equivalence
– main idea: non-terminal Ap,q generates exactly the
strings that take the NPDA from state p (w/ empty
stack) to state q (w/ empty stack)
– then Astart, accept generates all of the strings in the
language recognized by the NPDA.
2011
NPDA, CFG equivalence
• Two possibilities to get from state p to q:
generated by Ap,r
generated by Ar,q
stack
height
p
input
r
q
abcabbacacbacbacabacabbabbabaacab
bbababaacaccaccccc
string taking NPDA from p to q
2011
NPDA, CFG equivalence
• NPDA P = (Q, Σ, , δ, start, {accept})
• CFG G:
– non-terminals V = {Ap,q : p, q  Q}
– start variable Astart, accept
– productions:
for every p, r, q  Q, add the rule
Ap,q → Ap,rAr,q
2011
NPDA, CFG equivalence
• Two possibilities to get from state p to q:
generated by Ar,s
stack
height
r
p
input
push d
s
pop d
q
abcabbacacbacbacabacabbabbabaacab
bbababaacaccaccccc
string taking NPDA from p to q
2011
NPDA, CFG equivalence
• NPDA P = (Q, Σ, , δ, start, {accept})
from state p, read a, push d,
• CFG G:
move to state r
– non-terminals V = {Ap,q : p, q  Q}
– start variable Astart, accept
from state s, read b, pop d,
move to state q
– productions:
for every p, r, s, q  Q, d  , and a, b  (Σ  {ε})
if (r, d)  δ(p, a, ε), and
(q, ε)  δ(s, b, d), add the rule
Ap,q → aAr,sb
2011
NPDA, CFG equivalence
• NPDA P = (Q, Σ, , δ, start, {accept})
• CFG G:
– non-terminals V = {Ap,q : p, q  Q}
– start variable Astart, accept
– productions:
for every p  Q, add the rule
Ap,p → ε
NPDA, CFG equivalence
•
two claims to verify correctness:
1. if Ap,q generates string x, then x can take
NPDA P from state p (w/ empty stack) to q
(w/ empty stack)
2. if x can take NPDA P from state p (w/ empty
stack) to q (w/ empty stack), then Ap,q
generates string x
2011
NPDA, CFG equivalence
1. if Ap,q generates string x, then x can take
NPDA P from state p (w/ empty stack) to q
(w/ empty stack)
– induction on length of derivation of x.
– base case: 1 step derivation. must have only
terminals on rhs. In G, must be production of
form Ap,p → ε.
2011
NPDA, CFG equivalence
1. if Ap,q generates string x, then x can take
NPDA P from state p (w/ empty stack) to q
(w/ empty stack)
– assume true for derivations of length at most k,
prove for length k+1.
– verify case: Ap,q → Ap,rAr,q →k x = yz
– verify case: Ap,q → aAr,sb →k x = ayb
2011
NPDA, CFG equivalence
2. if x can take NPDA P from state p (w/
empty stack) to q (w/ empty stack), then
Ap,q generates string x
– induction on # of steps in P’s computation
– base case: 0 steps. starts and ends at same state
p. only has time to read empty string ε.
– G contains Ap,p → ε.
2011
NPDA, CFG equivalence
2. if x can take NPDA P from state p (w/
empty stack) to q (w/ empty stack), then
Ap,q generates string x
– induction step. assume true for computations of
length at most k, prove for length k+1.
– if stack becomes empty sometime in the middle
of the computation (at state r)
• y is read going from state p to r
y)
• z is read going from state r to q
z)
• conclude: Ap,q → Ap,rAr,q →* yz = x
2011
(Ap,r→*
(Ar,q→*
NPDA, CFG equivalence
2. if x can take NPDA P from state p (w/
empty stack) to q (w/ empty stack), then
Ap,q generates string x
– if stack becomes empty only at beginning and
end of computation.
•
•
•
•
2011
first step: state p to r, read a, push d
go from state r to s, read string y (Ar,s→* y)
last step: state s to q, read b, pop d
conclude: Ap,q → aAr,sb →* ayb = x
PDACFG conversion
Summary of the construction:
Non-CF Languages
The language L = { anbncn | n0 } does not appear to be
context-free.
Informal: A PDA can compare #a’s with #b’s. But by the
time b’s are processed, the stack is empty. Not possible to
count a’s with c’s.
The problem of A * vAy :
If S * uAz * uvAyz * uvxyz  L,
then S * uAz * uvAyz * … * uviAyiz
* uvixyiz  L as well, for all i=0,1,2,…
Pumping Lemma for CFLs
Idea: If we can prove the existence of derivations
for elements of the CFL L that use the step
A * vAy, then a new form of ‘v-y pumping’
holds: A * vAy * v2Ay2 * v3Ay3 * …)
Observation: We can prove this existence if the parsetree is tall enough.
Recall Parse Trees
Parse tree for S  AbbcBa * cbbccccaBca
 cbbccccacca
S
A
b b
c B
a
c c
a B c
c
A
c
c
Pumping a Parse Tree
S
A
A
v
x
y
u
z
If s = uvxyz  L is long, then its parse-tree is tall.
Hence, there is a path on which a variable A
repeats itself. We can pump this A–A part.
A Tree Tall Enough
Let L be a context-free language, and let G be its
grammar with maximal b symbols on the right side of
the rules: A  X1…Xb
A parse tree of depth h produces a string with maximum
length of bh. Long strings implies tall trees.
Let |V| be the number of variables of G. If h = |V|+2 or
bigger, then there is a variable on a ‘top-down path’ that
occurs more than once.
uvxyz L
S
A
A
u
v
x
y
z
By repeating the A–A part we get…
uv2xy2z L
S
A
A
u
v
A
R
x
y
z
y
x
v
… while removing the A–-A gives…
uxz  L
S
A
x
u
z
In general uvixyiz  L for all i=0,1,2,…
Pumping Lemma for CFL
For every context-free language L, there is a pumping
length p, such that for every string sL and |s|p, we can
write s = uvxyz with
1) uvixyiz  L for every i{0,1,2,…}
2) |vy|  1
3) |vxy|  p
Note that
1) implies that uxz  L
2) says that v and y cannot be both empty strings 
Condition 3) is not always used. (It is not crucial part of
pumping lemma, but helps to reduce the number of cases.)
Formal Proof of Pumping Lemma
Let G=(V,,R,S) be the grammar of a CFL.
Maximum size of rules is b2: A  X1…Xb
A string s requires a minimum tree-depth  logb|s|.
If |s|  p=b|V|+2, then tree-depth  |V|+2, hence
there is a path and variable A where A repeats
itself: S * uAz * uvAyz * uvxyz
It follows that uvixyiz  L for all i=0,1,2,…
Furthermore:
|vy|  1 because tree is minimal
|vxy|  p because bottom tree with  p leaves
has a ‘repeating path’
Pumping lemma for {anbncn | n >= 0}
Assume that B = {anbncn | n0} is CFL
Let p be the pumping length, and s = apbpcp  B
P.L.: s = uvxyz = apbpcp, with uvixyiz  B for all i0
Options for vxy:
1) The strings v and y are uniform
(v=a…a and y=c…c, for example).
Then uv2xy2z will not contain the same number
of a’s, b’s and c’s, hence uv2xy2zB
2) At least one of v or y is not uniform. (i.e., it has at
least two different symbols occurring in it).
Then uv2xy2z will not be a…ab…bc…c
Hence uv2xy2zB
Pumping lemma applied to {anbncn} continued
Assume that B = {anbncn | n0} is CFL
Let p be the pumping length, and s = apbpcp  B
P.L.: s = uvxyz = apbpcp, with uvixyiz  B for all i0
We showed: For every way of partitioning s into uvxyz,
there is an i such that uvixyiz is not in B. Contradiction.
B is not a context-free language.
Another example
Proof that C = {aibjck | 0ijk } is not context-free.
Let p be the pumping length, and s = apbpcp  C
P.L.: s = uvxyz, such that uvixyiz  C for every i  0
vxy can’t have a’s and c’s. Why?
So only two options for vxy:
1) vxy belongs to a*b*, then the string uv2xy2z has
not enough c’s, hence uv2xy2zC
2) vxy belongs to b*c*, then the string uv0xy0z = uxz
has too many a’s, hence uv0xy0zC
Contradiction: C is not a context-free language.
D = { ww | w{0,1}* } (Ex. 2.22)
Carefully take the strings sD.
Let p be the pumping length, take s=0p1p0p1p.
Three options for s=uvxyz with 1  |vxy|  p:
1) If a part of y is to the left of | in 0p1p|0p1p, then second
half of uv2xy2z starts with “1”
2) Same reasoning if a part of v is to the right
of middle of 0p1p|0p1p, hence uv2xy2z  D
3) If x is in the middle of 0p1p|0p1p, then uxz
equals 0p1i 0j1p  D (because i or j < p)
Contradiction: D is not context-free.
Pumping lemma for CFG - remarks
Using the CFL pumping lemma is more difficult
than the pumping lemma for regular languages.
You have to choose the string s carefully, and divide the
options efficiently.
Additional CFL properties would be helpful (like we had
for regular languages).
What about closure under standard operations?
Union Closure Properties
Lemma: Let A1 and A2 be two CF languages, then the
union A1A2 is context free as well.
Proof: Assume that the two grammars are
G1=(V1,,R1,S1) and G2=(V2,,R2,S2).
Construct a third grammar G3=(V3,,R3,S3) by:
V3 = V1  V2  { S3 } (new start variable) with
R3 = R1  R2  { S3  S1 | S2 }.
It follows that L(G3) = L(G1)  L(G2).
Intersection, Complement?
Let again A1 and A2 be two CF languages.
One can prove that, in general,
the intersection A1  A2 ,
and
the complement Ā1= * \ A1
are not context free languages.
Intersection, Complement?
Proof for complement:
Recall that a problem in HW 5 shows that
L = { x#y | x, y are in {a, b}*, x != y} IS context-free.
Complement of this language is
L’ = { w | w has no # symbol} U
{ w | w has two or more # symbols} U
{ w#w | w is in {a,b}* }.
We can show that L’ is NOT context-free.
Context-free languages are NOT closed under
intersection
Proof by counterexample: Recall that in an earlier slide in
this lecture, we showed that
L = {anbncn | n >= 0} is NOT context-free.
Let
A = {anbncm | n, m >= 0} and
B = L = {anbmcm | n, m >= 0}. It is easy to see that both A
and B are context-free. (Design CFG’s.)
This shows that CFG’s are not closed under intersection.
Download