courses:cs240-201601:cfl-closure-properties.pptx (153.6 KB)

advertisement
Closure Properties of CFLs
If A and B are context free languages then is…
• AR a context-free language ?
• A* a context-free language ?
• A a context-free language (complement)?
• A  B a context-free language ?
• A  B a context-free language ?
Some of these are true. Some of them are false.
CFLs Closed Under Reverse
Given a CFL A, is AR a CFL?
Since A is a CFL, there is some CFG G that recognizes A
Proof-by-construction:
There is a CFG GR that recognizes AR.
G = (V, Σ, R, S)
GR = (V, Σ, RR, S)
RR = { A  αR | A  α  R }
CFLs Closed Under *
Given a CFL A, is A* a CFL?
Since A is a CFL, there is some CFG G that recognizes A
Proof-by-construction:
There is a CFG G* that recognizes A*
G = (V, Σ, R, S)
G* = (V  {S0}, Σ, R*, S0)
R* = R  { S0  S }  { S0  S0S0 }  { S0  ε }
Closure Properties of CFLs
If A and B are context free languages then
• AR is a context-free language TRUE
• A* is a context-free language TRUE
• A is a context-free language (complement)?
• A  B is a context-free language ?
• A  B is a context-free language ?
CFLs Closed Under Union
Given two CFLs A and B is A  B a CFL?
Proof-by-construction:
There is a CFG GAUB that recognizes A  B.
Since A and B are CFLs, there are CFGs GA = (VA, ΣA,
RA, SA) and GB = (VB, ΣB, RB, SB) that generate A and B.
GAUB = (VA  VB, ΣA  ΣB, RAUB, S0)
RAUB = RA  RB  { S0  SA }  { S0  SB }
Assumes VA and VB are disjoint (easy to arrange this
by changing variable names.)
CFLs Closed Under
Concatenation
Given two CFLs A and B is A•B a CFL?
Proof-by-construction:
There is a CFG GAB that recognizes A•B.
Since A and B are CFLs, there are CFGs GA = (VA, ΣA,
RA, SA) and GB = (VB, ΣB, RB, SB) that generate A and B.
Construct GAB = (VAB,,SAB,RAB):
- rename elements of VB so that VA  VB = 
- define VAB = VA  VB  {SAB}, SAB  VA ,VB
- define RAB = RA  RB  {SAB  SASB}
Closure Properties of CFLs
If A and B are context free languages then:
• AR is a context-free language TRUE
• A* is a context-free language TRUE
• A is a context-free language?
• A  B is a context-free language TRUE
• A  B is a context-free language ?
Non-closure Under Intersection
CFLs are not closed under intersection
• Example:
– L = {anbncn|n  1} is not context-free
– L1 = {anbnci |n  1,i 1 }, L2 = {aibncn |n  1,i 1 } are
CFLs with corresponding grammars :
• L1: S->AB; A->aAb | ab; B->cB | c
• L2: S ->AB; A->aA | a; B->bBc | bc
– However, L = L1  L2
– Thus the intersection of two CFLs is not a CFL
Non-closure Under Intersection
Another example
• The following language L = {0i1j2k3l | i = k and j = l} is not
a CFL
– Intuitively, you need a variable and productions like A  0A2 | 02 to
generate the matching 0's and 2's, while you need another variable to
generate matching 1's and 3's. But these variables would have to
generate strings that did not interleave
• However, the simpler language {0i1j2k3l | i = k} is a CFL
– A grammar:
S  S3 | A
A  0A2 |B
B  1B | 
• Likewise the CFL {0i1j2k3l | j = l}
• Their intersection is L
Intersection with Regular
Languages
• Theorem: If L is CFL and R is a regular
language, then L  R is a CFL
Accept/
FA
AND
PDA
Stack
Reject
Closure Properties of CFLs
If A and B are context free languages then:
• AR is a context-free language TRUE
• A* is a context-free language TRUE
• A is a context-free language?
• A  B is a context-free language TRUE
• A  B is a context-free language FALSE
Closure under Complement?
• The complements of some CFLs are
also CFLs
• Example: {anbn | n ≥ 0}
• Complement can be accepted by a
PDA:
– swap accepting states of PDA that recognizes
anbn
Non-closure of CFL's Under Complement
But not always!
The complement of non-CFL L = {0i1j2k3l | i = k
and j = l} is a CFL (what is L?)
Here is a PDA P recognizing it:
• Non-deterministically choose whether to check i  k or j  l.
– Non-deterministic PDA—checks one or the other, but capable of
checking either one
Say we want to check i  k.
As long as 0's come in, count them on the stack.
Ignore 1's.
Pop the stack for each 2.
As long as we have not just exposed the bottom-of-stack marker
when the first 3 comes in, accept, and keep accepting as long as 3's
come in.
• But we also have to accept, and keep accepting, as soon as we see
that the input is not in L(0*1*2*3*).
•
•
•
•
•
Closure Properties of CFLs
If A and B are context free languages then:
• AR is a context-free language TRUE
• A* is a context-free language TRUE
• A is a context-free language MAYBE
• A  B is a context-free language TRUE
• A  B is a context-free language MAYBE
Closure Properties of CFLs
• CFLs closed under reversal, Kleene
star, union
• CFLs not closed under intersection and
complement
Using Closure to Prove a
Language is not Context-free
L={w in {a,b,c}* with equal numbers of as, bs, and
cs}
• Suppose L is context-free.
• Consider L1 = L ∩ a*b*c*
• Because context-free languages are closed under
intersection with regular languages, L1 must be
context free
• But L1 is anbncn, which we know not to be context free
• So we must have been wrong in our assumption that
L is context-free
Using Closure to Prove a
Language is not Context-free
L = {www | w∈{a,b}∗}
• Suppose L is context-free
• Intersect L with a∗ba∗ba∗b to get L1 =
anbanbanb
• If L1 is not context-free (can prove it is
not with the pumping lemma), L is not
context-free either
Using Closure to Prove a
Language is not Context-free
• In general:
– Can simplify showing that a language is
not context-free by using closure properties
– Assume L is context-free
– Transform to a simpler language, L’, by
using some operation(s) under which CFLs
are closed
– Use the pumping lemma to show L’ is not
context-free, so neither is L
Testing Emptiness of a CFL
• As for regular languages, we really take
a representation of some language and
ask whether it represents 
– In this case, the representation can be a
CFG or PDA
• Our choice, since there are algorithms to
convert one to the other
– The test: Use a CFG; check if the start
symbol is useless
Testing Finiteness of a CFL
• Let L be a CFL. Then there is some pumping lemma
constant n for L
• Test all strings of length between n and 2n - 1 for
membership
• If there is any such string, it can be pumped, and the
language is infinite
• If there is no such string, then n - 1 is an upper limit on
the length of strings, so the language is finite
– Trick: If there were a string z = uvwxy of length 2n or longer, you can
find a shorter string uwy in L, but it's at most n shorter (why?). Thus, if
there are any strings of length 2n or more, you can repeatedly cut out
vx to get, eventually, a string whose length is in the range n to 2n - 1.
Testing Membership of a String
in a CFL
• Simulating a PDA for L on string w doesn't
quite work, because the PDA can grow its
stack indefinitely on  input, and we never
finish, even if the PDA is deterministic
• There is an O(n3) algorithm (n = length of w)
that uses a "dynamic programming" technique.
– Called Cocke-Younger-Kasami (CYK) algorithm.
CYK Algorithm
• Start with a CNF grammar for L
• Build a two-dimensional table:
– Row = length of a substring of w
– Column = beginning position of the substring
– Entry in row i and column j = set of variables that
generate the substring of w beginning at position j
and extending for i positions
– These entries are denoted Xj,i+j-1 i.e., the subscripts
are the first and last positions of the string
represented, so the first row is X11,X22, …,Xnn, the
second row is X12,X23, …,Xn-1,n, and so on
Table
• The horizontal axis corresponds to the
positions of the string w = a1a2…an
• Table entry Xij is the set of non* that A  a a …a
terminals A such
i i+1
j
– We are particularly interested in whether S is in
* Sw
X1n because that is the same as saying
(that is, w is in L)
• Basis: (row 1) Xii = the set of variables A such
that A  a is a production, and a is the symbol
at position i of w.
– The grammar is in CNF, therefore the only way to derive a
terminal is with a production of the form A  a, so Xii is the set of
non-terminals such that A  ai is a production of G
• Induction: Suppose we want to compute Xij,
which is in row j – i +1
– We can derive aiai+1 … aj from A if there is a production A  BC,
B derives any prefix of aiai+1 … aj, and C derives the rest.
– Thus, we must ask if there is any value of k such that
• ik<j
• B is in Xik
• C is in Xk+1,j
Example
• We'll use the algorithm to determine if the string w =
aabbb is in the language generated by the grammar
S  AB
A  BB | a
B  AB |b
• Note that w11 = a, so X11 is the set of all variables that
immediately derive a, that is X11 = {A}. Since w22 = a,
we also have X22 = {A}, and so on to get
X11 = {A}, X22 = {A}, X33 = {B}, X44 = {B}, X55 = {B}
S  AB
A  BB | a
B  AB |b
a
1,1
A
a
2,2
A
b
3,3
B
1,2
2,3
3,4
1,3
2,4
3,5
1,4
2,5
1,5
b
4,4
B
4,5
b
5,5
B
• Compute X12 : since X11 = {A} and X22 =
{A}, X12 consists of all variables on the left
side of a production whose right side is
AA. None, so X12 is empty.
• Next X23 = {A | A BB, B  X22, B  X33}
so the required right side is AB, thus X23 =
{S,B}
• Rest is easy:
–
–
–
–
X12 = , X23 = {S,B}, X34 = {A}, X45 = {A},
X13 = {S,B}, X24 = {A}, X35 = {S,B},
X14 = {A}, X25 = {S,B},
X15 = {S,B}
 Since S is in X15, w  L(G)
S  AB
A  BB | a
B  AB |b
1,1
A
1,2
2,2
A
2,3
S, B
1,3
S, B
1,4
A
2,4
A
a a b b b
3,3
B
3,4
A
4,4
B
4,5
5,5
B
A
A
A
B
A
3,5
S, B
B
2,5
S, B
B
1,5
S, B
S
B
B
Another Example
• X  aXb | ab
• Step 1: put into CNF
• Apply CYK algorithm to aaabbb
X  aXb |
ab
a
a
a
b
b
b
1,1
2,2
3,3
4,4
5,5
6,6
1,2
2,3
3,4
4,5
5,6
1,3
2,4
3,5
4,6
1,4
2,5
3,6
1,5
2,6
1,6
S  AB | BC
A  BA | a
B  CC | b
C  AB |a
Another Example
Test for string baaba
b
a
a
a
1,1
2,2
3,3
4,4
1,2
2,3
3,4
4,5
1,3
2,4
3,5
1,4
2,5
1,5
a
5,5
CYK as a Parsing Algorithm
• Applicability of the CYK algorithm as a
parser limited by the computational
requirements needed to find a
derivation
– For an input string of length n, (n2+n)/2 sets need
to be constructed to complete the dynamic
programming table
– Each of these sets may require the consideration
of several decompositions of the associated
substring
Preview of Undecidable CFL Problems
•
•
•
•
•
Is a given CFG ambiguous?
Is a given CFG inherently ambiguous?
Is the intersection of two CFL’s empty?
Are two CFLs the same?
Is a given CFL equal to Σ*, where Σ is
the alphabet of the language?
The Chomsky Hierarchy
Turing Machine
r
Recursively Enumerable
Languages
Context Sensitive
Languages
Context Free
Languages
Linear Bounded Automata
Regular
Languages
Push Down Automata
Finite Automata
Context-Sensitive Grammars
The next grammar type, more powerful than
CFGs, is a "somewhat restricted" grammar
A grammar is context-sensitive if all
productions are of the form x  y, where x,y are
in (V  T)+ and |x| ≤ |y|
• Fundamental property:
• grammar is non-contracting--i.e., the length of successive
sentential forms can never decrease
• Why "context-sensitive"?
• All productions can be rewritten in a normal form xAy  xvy
• Effectively, "A can be replaced by v only in the context of a
preceding x and a following y"
Example
• CSG for {anbncn | n ≥ 1}
S
Ab
Ac
bB
aB
 abc | aAbc
 bA
 Bbcc
 Bb
 aa | aaA
• Try to derive a3b3c3
S  aAbc abAc
 abBbcc  aBbbcc
 aaAbbcc  aabAbcc
 aabbAcc  aabbBbccc
 aabBbbccc  aaBbbbccc
 aaabbbccc
A and B are "messengers"- an A is
created on the left, travels to the
right to the first c, creates another b
and c. Then sends B back to create
the corresponding a.
Similar to the way one would
program a TM to accept the
language.
Linear-Bounded Automata
A limited Turing Machine in which tape
use is restricted
• Use only part of the tape occupied by the input
• I.e., has an unbounded tape, but the amount that can
be used is a function of the input
• Restrict usable part of tape to exactly the cells taken by the
input
LBA is assumed to be nondeterministic
Relation between CSLs and
LBAs
If a language L is accepted by some
linear bounded automaton, then
there is a context-sensitive grammar
that generates L
• Every step in a derivation from a CSG is a
bounded function of |w| because any CSG G is
non-contracting
Download