Module 28 • Context Free Grammars – Definition of a grammar G

advertisement
Module 28
• Context Free Grammars
– Definition of a grammar G
– Deriving strings and defining L(G)
• Context-Free Language definition
1
Context-Free Grammars
Definition
2
Definition
• A context-free grammar G = (V, S, S, P)
– V: finite set of variables (nonterminals)
– S: finite set of characters (terminals)
– S: start variable
• element of V
• role is similar to that of q0 for an FSA or NFA
– P: finite set of grammar rules or production rules
• Syntax of a production
• variable → string of variables and terminals
3
English Context-Free Grammar
• ECFG = (V, S, S, P)
– V = {<sentence>, <noun phrase>, <verb phrase>, ... }
• people sometimes use < > to delimit variables
• In this course, we generally will use capital letters to denote
variables
– S = {a, b, c, ..., z, ;, ,, ., ...}
– S = <sentence>
– P = { <sentence> → <noun phrase> <verb phrase>
<pct>, <noun phrase> → <article> <adj> <noun>, ...}
4
i
i
{a b
| i>0} CFG
• ABG = (V, S, S, P)
– V = {S}
– S = {a, b}
– S=S
– P = {S → aSb, S → ab} or S → aSb | ab
• second format saves some space
5
Context-Free Grammars
Deriving strings, defining L(G), and
defining context-free languages
6
Defining →, ==> notation
• First: → notation
– This is used to define the productions of a grammar
• S → aSb | ab
• Second: ==>G notation
– This is used to denote the application of a production
rule from a grammar G
• S ==>ABG aSb ==>ABG aaSbb ==>ABG aaabbb
– We say that string S derives string aSb (in one step)
– We say that string aSb derives string aaSbb (in one step)
– We say that string aaSbb derives string aaabbb (in one step)
• We often omit the grammar subscript when the intended
grammar is unambiguous
7
Defining ==> continued
• Third: ==>kG notation
– This is used to denote k applications of production rules
from a grammar G
• S ==>2ABG aaSbb
– We say that string S derives string aaSbb in two steps
• aSb ==>2ABG aaabbb
– We say that string aSb derives string aaabbb in two steps
• We often omit the grammar subscript when the intended
grammar is unambiguous
8
Defining ==> continued
• Fourth: ==>*G notation
– This is used to denote 0 or more applications of
production rules from a grammar G
• S ==>*ABG S
– We say that string S derives string S in 0 or more steps
• S ==>*ABG aaSbb
– We say that string S derives string aaSbb in 0 or more steps
• aSb ==>*ABG aaSbb
– We say that string aSb derives string aaSbb in 0 or more steps
• aSb ==>*ABG aaabbb
– We say that string aSb derives string aaabbb in 0 or more steps
• We often omit the grammar subscript when the intended
grammar is unambiguous
9
Defining derivations *
• Derivation of a string x
– The complete step by step derivation of a string x from
the start variable S
– Key fact: each step in a derivation makes only one
application of a production rule from G
– Example: Derivation of string aaabbb using ABG
• S ==>ABG aSb ==>ABG aaSbb ==>ABG aaabbb
– Example 2: AG= (V, S, S, P) where P = S →SS | a
• Deriving string aaa
• S ==> SS ==> Sa ==> SSa ==> aSa ==> aaa
10
Defining L(G) *
• Generating strings
– If S ==>G* x, then grammar G generates string x
• Note G generates strings which contain terminals and
nonterminals
– aSb contains nonterminals and terminals
– S contains only nonterminals
– aaabbb contains only terminals
• L(G)
– The set of strings over S generated by grammar G
• Note we only consider terminal strings generated by G
– {aibi | i > 0} = L(ABG)
– {ai | i > 0} = L(AG)
11
Context-Free Languages *
• Context-Free Languages
– A language L is a context-free language (CFL) iff
• Results so far
– {ai | i > 0} is a CFL
• One CFG G such that L(G) = this language is AG
• Note this language is also regular
– {aibi | i > 0} is a CFL
• One CFG G such that L(G) = this language is ABG
• Note this language is NOT regular
12
Example *
• Let BAL = the set of strings over {(,)} in which
the parentheses are balanced
• Prove that BAL is a CFL
– To prove this, you need to come up with a CFG BALG
such that L(BALG) = BAL
• BALG = (V, S, S, P)
–
–
–
–
V = {S}
S = {(, )}
S=S
P=?
• Give derivations of ((( ))) and ( )(( )) with your grammar
13
Module 29
• Parse/Derivation Trees
– Leftmost derivations, rightmost derivations
• Ambiguous Grammars
– Examples
• Arithmetic expressions
• If-then-else Statements
– Inherently ambiguous CFL’s
14
Context-Free Grammars
Parse Trees
Leftmost/rightmost derivations
Ambiguous grammars
15
Parse Tree
• Parse/derivation trees are structured
derivations
– The structure graphically illustrates semantic
information about the string
• Formalization of concept we encountered in
regular languages unit
– Note, what we saw before were not exactly
parse trees as we define them now, but they
were close
16
Parse Tree Example
• Parse tree for string ( )(( )) and grammar BALG
– BALG = (V, S, S, P)
• V = {S}, S = {(, )}, S = S
• P = S → SS | (S) | l
– One derivation of ( )(( ))
• S ==> SS ==> (S)S ==> ( )S ==> ( )(S) ==> ( )((S)) ==> ( )(( ))
S
– Parse tree
S
(
S
l
S
)
( S )
( S )
17
l
Comments about Example *
• Syntax:
S
S
(
S
l
S
)
( S )
( S )
l
– draw a unique arrow from each
variable to each character that is a
direct child of that variable
• A line instead of an arrow is ok
– The derived string can be read in a left
to right traversal of the leaves
• Semantics
– The tree graphically illustrates the
nesting structure of the string of
parentheses
18
Leftmost/Rightmost Derivations
• There is more than one derivation of the
string ( )(( )).
S
S
(
S
l
S
)
( S )
( S )
– S ==> SS ==> (S)S ==>( )S ==> ( )(S)
==> ( )((S)) ==> ( )(( ))
– S ==> SS ==> (S)S ==> (S)(S) ==> ( )(S)
==> ( )((S)) ==> ( )(( ))
– S ==> SS ==> S(S) ==> S((S)) ==> S(( ))
==> (S)(( )) ==>( )(( ))
• Leftmost derivation
l
– Leftmost variable is always expanded
– Which one of the above is leftmost?
• Rightmost derivation
– Rightmost variable is always expanded
– Which one of the above is rightmost?
19
Comments
S
S
(
S
l
S
)
( S )
( S )
l
– S ==> SS ==> (S)S ==>( )S ==> ( )(S)
==> ( )((S)) ==> ( )(( ))
– S ==> SS ==> (S)S ==> (S)(S) ==> ( )(S)
==> ( )((S)) ==> ( )(( ))
– S ==> SS ==> S(S) ==> S((S)) ==> S(( ))
==> (S)(( )) ==>( )(( ))
• Fix a string and a grammar
– Any derivation corresponds to a
unique parse tree
– Any parse tree can correspond to
many different derivations
• Example
– The one parse tree corresponds to all
three derivations
• Unique mappings
– For any parse tree, there is a unique
leftmost/rightmost derivation that it
corresponds to
20
Example *
• S ==> SS ==> SSS ==> (S)SS ==> ( )SS ==> ( )S ==> ( )
– The above is a leftmost derivation of the string ( ) from the
grammar BALG
– Draw the corresponding parse tree
– Draw the corresponding rightmost derivation
• S ==> (S) ==> (SS) ==> (S(S)) ==> (S( )) ==> (( ))
– The above is a rightmost derivation of the string (( )) from the
grammar BALG
– Draw the corresponding parse tree
– Draw the corresponding leftmost derivation
21
Ambiguous Grammars
Examples:
Arithmetic Expressions
If-then-else statements
Inherently ambiguous grammars
22
Ambiguous Grammars
• A grammar G is ambiguous if there exists a string
x in L(G) with two or more distinct parse trees
– (2 or more distinct leftmost/rightmost derivations)
• Example
– Grammar AG is ambiguous
• String aaa in L(AG) has 2 rightmost derivations
– S ==> SS ==> SSS ==> SSa ==> Saa ==> aaa
– S ==> SS ==> Sa ==> SSa ==> Saa ==> aaa
23
2 Simple Examples
• Grammar BALG is ambiguous
– String ( ) in L(BALG) has >1 leftmost
derivation
• S ==> (S) ==> ( )
• S ==> (S) ==> (SS) ==>(S) ==>( )
• Give another leftmost derivation of ( ) from BALG
• Grammar ABG is NOT ambiguous
– Consider any string x in {aibi | i > 0}
• There is a unique parse tree for x
24
Legal Arithmetic Expressions
• Develop a grammar MATHG = (V, S, S, P) for the
language of legal arithmetic expressions
– S = {0, 1, +, *, -, /, (, )}
– Strings in the language include
•
•
•
•
0
10
10*11111+100
10*(11111+100)
– Strings not in the language include
• 10+
• 11++101
• )(
25
Grammar MATHG1
•
•
•
•
V = {E, N}
S = {0, 1, +, *, -, /, (, )}
S=E
P:
– E → N | E+E | E*E | E/E | E-E | (E)
– N → N0 | N1 | 0 | 1
26
MATHG1 is
ambiguous
E → N | E+E | E*E | E/E | E-E | (E)
N → N0 | N1 | 0 | 1
• Come up with two distinct leftmost derivations of
the string 11+0*11
– E ==> E+E ==> N+E ==> N1+E ==> 11+E ==>
11+E*E ==> 11+N*E ==> 11+0*E ==> 11+0*N ==>
11+0*N1 ==> 11+0*11
– E ==> E*E ==> E+E*E ==> N+E*E ==> N1+E*E ==>
11+E*E ==> 11+N*E ==> 11+0*E ==> 11+0*N ==>
11+0*N1 ==>11+0*11
• Draw the corresponding parse trees
27
Corresponding Parse Trees
•
E ==> E+E ==> N+E ==> N1+E ==>
11+E ==> 11+E*E ==> 11+N*E ==>
11+0*E ==> 11+0*N ==> 11+0*N1
==> 11+0*11
•
E ==> E*E ==> E+E*E ==> N+E*E
==> N1+E*E ==> 11+E*E ==>
11+N*E ==> 11+0*E ==> 11+0*N ==>
11+0*N1 ==>11+0*11
E
E
E
+
N
N
1
1
E
E
E
*
E
* E
E
+ E
N
N
N
N
N
0
N
0
1
1
1
N
1
1
N
1
28
Parse Tree Meanings
E
E
E
+
N
N
1
1
E
E
E
*
E
* E
E
+ E
N
N
N
N
N
0
N
0
1
1
1
N
1
N
1
1
Note how the parse trees captures the semantic meaning of string 11+0*11.
More specifically, what number does the first parse tree represent?
What number does the second parse tree represent?
29
Implications
• Two interpretations of string 11+0*11
– 11+(0*11) = 11
– (11+0)*11 = 1001
• What if a line in a program is
– MSU_Tuition = 11+0*11;
– What is MSU_Tuition?
• Depends on how the expression 11+0*11 is parsed.
• This is not good.
• Ambiguity in grammars is undesirable, particularly if the grammar is
used to develop a compiler for a programming language like C++.
• In this case, there is an unambiguous grammar for the
language of arithmetic expressions
30
If-Then-Else Statements
• A grammar ITEG = (V, S, S, P) for the language
of legal If-Then-Else statements
–
–
–
–
V = (S, BOOL)
S = {D<85, D>50, grade=3.5, grade=3.0, if, then, else}
S=S
P:
• S → if BOOL then S else S | if BOOL then S |grade=3.5 |
grade=3.0
• BOOL → D<85 | D>50
31
ITEG is
ambiguous
S → if BOOL then S |grade=3.5 |
grade=3.0 | if BOOL then S else S
BOOL → D<85 | D>50
• Come up with two distinct leftmost derivations of
the string
– if D<85 then if D>50 then grade=3.5 else grade=3.0
– S ==>if BOOL then S else S ==> if D<85 then S else S ==> if D<85 then
if BOOL then S else S ==> if D<85 then if D>50 then S else S ==> if
D<85 then if D>50 then grade=3.5 else S ==> if D<85 then if D>50 then
grade=3.5 else grade=3.0
– S ==>if BOOL then S ==> if D<85 then S ==> if D<85 then if BOOL
then S else S ==> if D<85 then if D>50 then S else S ==> if D<85 then if
D>50 then grade=3.5 else S ==> if D<85 then if D>50 then grade=3.5 else
grade=3.0
• Draw the corresponding parse trees
32
Corresponding Parse Trees
•
S ==>if BOOL then S else S ==> if D<85
then S else S ==> if D<85 then if BOOL then
S else S ==> if D<85 then if D>50 then S else
S ==> if D<85 then if D>50 then grade=3.5
else S ==> if D<85 then if D>50 then
grade=3.5 else grade=3.0
•
S ==>if BOOL then S ==> if D<85 then S
==> if D<85 then if BOOL then S else S ==>
if D<85 then if D>50 then S else S ==> if
D<85 then if D>50 then grade=3.5 else S ==>
if D<85 then if D>50 then grade=3.5 else
grade=3.0
S
if
D<85
B
then
S
S
else
if B then S
D>50
S
grade=3.0
grade=3.5
if
D<85
D>50
B
then
S
S
if B then S else
grade=3.5
grade=3.0
33
Parse Tree Meanings
S
S
if
D<85
B
then
S
else
if B then S
D>50
if
S
grade=3.0
grade=3.5
D<85
D>50
B
then
S
S
if B then S else
grade=3.5
grade=3.0
If you receive a 90 on type D points, what is your grade?
By parse tree 1
By parse tree 2
34
Implications
• Two interpretations of string
–
–
–
–
if D<85 then if D>50 then grade=3.5 else grade=3.0
Issue is which if-then does the last ELSE attach to?
This phenomenon is known as the “dangling else”
Answer: Typically, else binds to NEAREST if-then
• In this case, there is an unambiguous grammar for handling
if-then’s as well as if-then-else’s
35
Inherently ambiguous CFL’s
• A CFL L is inherently ambiguous iff for all CFG’s G such
that L(G) = L, G is ambiguous
• Examples so far
– None of the CFL’s we’ve seen so far are inherently ambiguous
– While the CFG’s we’ve seen ambiguous, there do exist
unambiguous CFG’s for those CFL’s.
• Later result
– There exist inherently ambiguous CFL’s
– Example: {aibjck | i=j or j=k or i=j=k}
• Note i=j=k is unnecessary, but I added it here for clarity
36
Summary
• Parse trees illustrate “semantic” information about strings
• Ambiguous grammars are undesirable
– This means there are multiple parse trees for some string
– These strings can be interpreted in multiple ways
• There are some heuristics people use for taking an
ambiguous grammar and making it unambiguous, but this
is not the focus of this course
• There are some inherently ambiguous CFL’s
– Thus, the above heuristics do not always work
37
Module 30
• EQUAL language
– Designing a CFG
– Proving the CFG is correct
38
EQUAL language
Designing a CFG
39
EQUAL
• EQUAL is the set of strings over {a,b} with an equal
number of a’s and b’s
• Strings in EQUAL include
– aabbab
– bbbaaa
– abba
• Strings in {a,b}* not in EQUAL include
–
–
–
–
aaa
bbb
aab
ababa
40
Designing a CFG for EQUAL
• Think recursively
• Base Case
– What is the shortest possible string in EQUAL?
– Production Rule:
41
Recursive Case
• Recursive Case
– Now consider a longer string x in EQUAL
– Since x has length > 0, x must have a first character
• This must be a or b
– Two possibilities for what x looks like
• x = ay
– What must be true about relative number of a’s and b’s in y?
• x = bz
– What must be true about relative number of a’s and b’s in z?
42
Case 1: x=ay
• x = ay where y has one extra b
– What must y look like?
• Some examples
–
–
–
–
b
babba
aabbbab
aaabbbb
• Is there a general pattern that applies to all of the above
examples?
• More specifically, show how we can decompose all of the
above strings y into 3 pieces, two of which belong to EQUAL.
– Some of these pieces might be the empty string l
43
Decomposing y
• y has one extra b
– Possible examples
• b, babba, aabbbab, aaabbbb
– Decomposition
• y = ubv where
– u and v both have an equal number of a’s and b’s
• Decompose the 4 strings above into u, b, v
– lbl, aabbbab, lbabba, aaabbbbl
44
Implication
• Case 1: x=ay
– y has one extra b
• Case 1 refined: x=aubv
– u, v belong to EQUAL
• Production rule for this case?
45
Case 2: x=bz
• Case 2: x=bz
– z has one extra a
• Case 2 refined: x=buav
– u, v belong to EQUAL
• Production rule for this case?
46
Final Grammar
• EG = (V, S, S, P)
–
–
–
–
V = {S}
S = {a,b}
S=S
P:
47
EQUAL language
Proving CFG is correct
48
Is our grammar correct?
• How do we prove our grammar is correct?
– Informal
• Test some strings
• Review logic behind program (CFG) design
– Formal
• First, show every string derived by EG belongs to EQUAL
– That is, show L(EG) is a subset of EQUAL
• Second, show every string in EQUAL can be derived by EG
– That is, show EQUAL is a subset of L(EG)
• Both proofs will be inductive proofs
– Inductive proofs and recursive algorithms go well together
49
L(EG) subset of EQUAL
• Let x be an arbitrary string in L(EG)
• What does this mean?
– S ==>*EG x
• Follows from definition of x in L(EG)
– We will prove the following
•
•
•
•
•
If S ==>1EG x, then x is in EQUAL
If S ==>2EG x, then x is in EQUAL
If S ==>3EG x, then x is in EQUAL
If S ==>4EG x, then x is in EQUAL
...
50
Base Case
• Statement to be proven:
– For all n ≥ 1, if S ==>nEG x, then x is in EQUAL
– Prove this by induction on n
• Base Case:
– n=1
– What is the set of strings {x | S ==>1EG x}?
– What do we need to prove about this set of strings?
51
Inductive Case
• Inductive Hypothesis:
– For 1 ≤ j ≤ n, if S ==>jEG x, then x is in EQUAL
• Note, this is a “strong” induction hypothesis
• Traditional inductive hypothesis would take form:
– For some n ≥ 1, if S ==>nEG x, then x is in EQUAL
• The difference is we assume the basic hypothesis for all
integers between 1 and n, not just n
• Statement to be Proven in Inductive Case:
– If S ==>n+1EG x, then x is in EQUAL
52
“Regular” induction vs Strong
induction
• Infinite Set of Facts • Base Case
–
–
–
–
–
–
–
Fact 1
Fact 2
Fact 3
Fact 4
Fact 5
Fact 6
…
– Prove fact 1
• Regular inductive case
– For n 1,
• Fact n --> Fact n+1
• Strong inductive case
– For n ≥ 1,
• Fact 1 to Fact n --> Fact n+1
53
Visualization of Induction
Regular Induction
Strong Induction
Fact 1
Fact 2
Fact 3
Fact 4
Fact 5
Fact 6
Fact 7
Fact 8
Fact 9
…
Fact 1
Fact 2
Fact 3
Fact 4
Fact 5
Fact 6
Fact 7
Fact 8
Fact 9
…
54
Proving Inductive Case
• If S ==>n+1EG x, then x is in EQUAL
– Let x be an arbitrary string such that S ==>n+1EG x
– Examining EG, what are the three possible first
derivation steps
• Case 1: S ==>
• Case 2: S ==>
• Case 3: S ==>
==>nEG x
==>nEG x
==>nEG x
– One of the cases is impossible. Which one and why?
55
Case 2: S ==>
==>nEG x
• This means x has the form aubv where
– What can we conclude about u (don’t apply IH)?
– What can we conclude about v (don’t apply IH)?
• Apply the inductive hypothesis
– u and v belong to EQUAL
– Why do we need the strong inductive hypothesis?
• Conclude x belongs to EQUAL
– x = aubv where u and v belong to EQUAL
• Clearly the number of a’s in x equals the number of b’s in x
56
Case 3: S ==>
==>nEG x
• This means x has the form buav where
– What can we conclude about u (no IH)?
– What can we conclude about v (no IH)
• Apply the inductive hypothesis
– u and v belong to EQUAL
– Why do we need the strong inductive hypothesis?
• Conclude x belongs to EQUAL
– x = buav where u and v belong to EQUAL
• Clearly the number of a’s in x equals the number of b’s in x
57
L(EG) subset of EQUAL
• Wrapping up inductive case
– In all possible derivations of x, we have shown that x
belongs to EQUAL
– Thus, we have proven the inductive case
• Conclusion
– By the principle of mathematical induction, we have
shown that L(EG) is a subset of EQUAL
58
EQUAL subset of L(EG)
• Let x be an arbitrary string in EQUAL
• What does this mean?
• We will prove the following
•
•
•
•
•
If |x| = 0 and x is in EQUAL, then x is in L(G)
If |x| = 1 and x is in EQUAL, then x is in L(G)
If |x| = 2 and x is in EQUAL, then x is in L(G)
If |x| = 3 and x is in EQUAL, then x is in L(G)
...
59
EQUAL subset of L(EG)
• Statement to be proven:
– For all n ≥ 0, if |x| = n and x is in EQUAL, then x is in
L(EG)
– Prove this by induction on n
• Base Case:
– n=0
– What is the only string x such that |x|=0 and x is in
EQUAL?
– Prove this string belongs to L(EG)
60
Inductive Case
• Inductive Hypothesis:
– For 0 ≤ j ≤ n, if |x| =j and x is in EQUAL, then x is in
L(EG)
• Again, this is a “strong” induction hypothesis
• Statement to be Proven in Inductive Case:
– For n ≥ 0,
– if |x| = n+1 and x is in EQUAL, then x is in L(EG)
61
Proving Inductive Case
• If |x|=n+1 and x is in EQUAL, then x is in L(EG)
– Let x be an arbitrary string such that |x|=n+1 and x is in
L(EG)
– Examining S, what are the two possibilities for the first
character in x?
• Case 1: first character in x is
• Case 2: first character in x is
– In each case, what can we say about the remainder of
x?
• Case 1: the remainder of x
• Case 2: the remainder of x
62
Case 1: x = ay
• What can we say about y in this case?
• This means x has the form aubv where
– u is in EQUAL and has length ≤ n
– v is in EQUAL and has length ≤ n
– Proving this statement true
• Consider all the prefixes of string y
–
–
–
–
–
length 0: l
length 1: y1
length 2: y1y2
…
length n: y1y2 … yn = y
63
Case 1: x = ay
• Consider all the prefixes of string y
–
–
–
–
–
length 0: l
length 1: y1
length 2: y1y2
…
length n: y1y2 … yn = y
• The first prefix l has the same number of a’s as b’s
• The last prefix y has one extra b
• The relative number of a’s and b’s changes in the length i
prefix differs by only one from the length i-1 prefix
• Thus, there must be a first prefix t of y where t has one extra b
• Furthermore, the last character of t must be b
– Otherwise, t would not be the FIRST prefix of y with one extra b
• Break t into u and b and let the remainder of y be v
• The statement follows
64
Case 1: x = aubv *
• x = aubv
– u is in EQUAL and has length ≤ n
– v is in EQUAL and has length ≤ n
• Apply the induction hypothesis
– What can we conclude from applying the IH?
– Why did we need a strong inductive hypothesis?
• Conclude x is in L(EG) by constructing a derivation
– S ==> aSbS ==>*EG aubS ==>*EG aubv
65
Case 2: x = buav
• x = buav
– u is in EQUAL and has length ≤ n
– v is in EQUAL and has length ≤ n
• Apply the induction hypothesis
– What can we conclude about u and v?
• Conclude x is in L(EG) by constructing a derivation
– S ==> bSaS ==>*EG buaS ==>*EG buav
• Justify each of the steps in this derivation
66
EQUAL subset of L(EG)
• Wrapping up inductive case
– For all possible first characters of x, we have shown
that x belongs to L(EG)
– Thus, we have proven the inductive case
• Conclusion
– By the principle of mathematical induction, we have
shown that EQUAL is a subset of L(EG)
67
Module 31
• Closure Properties for CFL’s
– Kleene Closure
• construction
• examples
• proof of correctness
– Others covered less thoroughly in lecture
• union, concatenation
• CFL’s versus regular languages
– regular languages subset of CFL
68
Closure Properties for CFL’s
Kleene Closure
69
CFL closed under Kleene Closure
• Let L be an arbitrary CFL
• Let G1 be a CFG s.t. L(G1) = L
– G1 exists by definition of L1 in CFL
•
•
•
•
Construct CFG G2 from CFG G1
Argue L(G2) = L*
There exists CFG G2 s.t. L(G2) = L*
L* is a CFL
70
Visualization
• Let L be an arbitrary CFL
• Let G1 be a CFG s.t. L(G1) = L
L
L*
– G1 exists by definition of L1 in CFL
•
•
•
•
Construct CFG G2 from CFG G1
Argue L(G2) = L*
There exists CFG G2 s.t. L(G2) = L*
L* is a CFL
CFL
G1
G2
CFG’s
71
Algorithm Specification
• Input
– CFG G1
• Output
– CFG G2 such that L(G2) =
CFG G1
A
CFG G2
72
Construction
• Input
– CFG G1 = (V1, S, S1, P1)
• Output
– CFG G2 = (V2, S, S2, P2)
• V2 = V1 union {T}
– T is a new symbol not in V1 or S
• S2 = T
• P2 = P1 union ??
73
Closure Properties for CFL’s
Kleene Closure Examples
74
Example 1
• Input grammar:
–
–
–
–
V = {S}
S = {a,b}
S=S
P:
V2 = V1 union {T}
T is a new symbol not in V1 or S
S2 = T
P2 = P1 union {T → ST | l}
• Output grammar
–
–
–
–
V=
S = {a,b}
Start symbol is
P:
S → aa | ab | ba | bb
75
Example 2
• Input grammar:
–
–
–
–
V = {S, T}
S = {a,b}
Start symbol is T
P:
V2 = V1 union {T}
T is a new symbol not in V1 or S
S2 = T
P2 = P1 union {T → ST | l}
• Output grammar
–
–
–
–
V=
S = {a,b}
Start symbol is
P:
T → ST | l
S → aa | ab | ba | bb
76
Closure Properties for CFL’s
Kleene Closure Proof of Correctness
77
Is our construction correct?
• How do we prove our construction is correct?
– Informal
• Test some strings
• Review logic behind construction
– Formal
• First, show every string derived by G2 belongs to (L(G1))*
– That is, show L(G2) is a subset of (L(G1))*
• Second, show every string in (L(G1))* can be derived by G2
– That is, show (L(G1))* is a subset of L(G2)
• Both proofs will be inductive proofs
– Inductive proofs and recursive algorithms go well together
78
L(G2) is a subset of (L(G1))*
• We want to prove the following
– If x in L(G2), then x is in (L(G1))*
• This is equivalent to the following
– If T ==>*G2 x, then x is in (L(G1))*
– The two statements are equivalent because
• x in L(G2) means that T ==>*G2 x
• We break the second statement down as follows:
– If T ==>1G2 x, then x is in (L(G1))*
– If T ==>2G2 x, then x is in (L(G1))*
– If T ==>3G2 x, then x is in (L(G1))*
– ...
79
L(G2) is a subset of (L(G1))*
• Statement to be proven:
– For all n ≥ 1, if T ==>nG2 x, then x is in (L(G1))*
– Prove this by induction on n
• Base Case:
– n=1
– Examining grammer G2, what is the only string x such
that T ==>1G2 x ?
– Prove this string is in (L(G1))*
80
Inductive Case
• Inductive Hypothesis:
– For 1 ≤ j ≤ n, if T ==>jG2 x, then x is in (L(G1))*
• Note, this is a “strong” induction hypothesis
• Statement to be Proven in Inductive Case:
– For n above, if T ==>n+1G2 x, then x is in (L(G1))*
• Proving this statement
– Let x be an arbitrary string such that T ==>n+1G2 x
– Examining G2, what are the two possible first
derivation steps?
• Case 1: T ==>G2
• Case 2: T ==>G2
==>nG2 x
==>nG2 x
81
Case Analysis
• Case 1: T ==>G2 ==>n x is not possible
– Why not?
• Case 2: T ==>G2 ==>nG2 x
– This means x has the form uv where
• What can we say about u (no IH)?
• What can we say about v (no IH)?
– Applying the inductive hypothesis, what can we conclude?
82
Concluding Case 2:
T ==>G2 ==>nG2 x
– Concluding string u belongs to L(G1)
• Follows from S ==>* G2 u and
• Our construction insures that all strings derived from S in L(G2) are
also in L(G1)
– How do we conclude that x belongs to (L(G1))*
• Wrapping up inductive case
– In all possible derivations of x, we have shown that x belongs to
(L(G1))*
– Thus, we have proven the inductive case
• Conclusion
– By the principle of mathematical induction, we have shown that
L(G2) is a subset of (L(G1))*
83
(L(G1))* is a subset of L(G2)
• We want to prove the following
– If x is in (L(G1))*, then x is in L(G2)
• This is equivalent to the following
– If x is in (L(G1))*, then T ==>*G2 x
– The two statements are equivalent because
• x in L(G2) means that T ==>*G2 x
• We break the second statement down as follows:
– If x is in (L(G1))0, then T ==>*G2 x
– If x is in (L(G1))1, then T ==>*G2 x
– If x is in (L(G1))2, then T ==>*G2 x
– ...
84
(L(G1))* is a subset of L(G2)
• Statement to be proven:
– For all n ≥ 0, if x is in (L(G1))n, then x is in L(G2)
– Prove this by induction on n
• Base Case:
– n=0
– What is the only string x in (L(G1))0?
– Show this string belongs to L(G2)
85
Inductive Case
• Inductive Hypothesis:
– For n ≥ 0, if x is in (L(G1))j, then T ==>*G2 x
• Note, this is a “normal” induction hypothesis
• Statement to be Proven in Inductive Case:
– For n ≥ 0, if x is in (L(G1))n+1, then T ==>*G2 x
• Proving this statement
– Let x be an arbitrary string in (L(G1))n+1
– This means x = uv where
• u in L(G1)
• What can we say about v?
86
Deriving x
– x = uv where
• u is a string in L(G1)
• v is a string in
– Justify all the steps in the following derivation
– T ==> G2 ST ==>* G2 Sv ==>* G2 uv = x
• First step:
• Second step:
• Third step:
– Thus T ==>* G2 x
• The inductive case follows
• The result is proven by the principle of mathematical
induction
87
Construction for Set Union
• Input
– CFG G1 = (V1, S, S1, P1)
– CFG G2 = (V2, S, S2, P2)
• Output
– CFG G3 = (V3, S, S3, P3)
• V3 = V1 union V2 union {T}
– Variable renaming to insure no names shared between V1 and V2
– T is a new symbol not in V1 or V2 or S
• S3 = T
• P3 =
88
Construction for Set
Concatenation
• Input
– CFG G1 = (V1, S, S1, P1)
– CFG G2 = (V2, S, S2, P2)
• Output
– CFG G3 = (V3, S, S3, P3)
• V3 = V1 union V2 union {T}
– Variable renaming to insure no names shared between V1 and V2
– T is a new symbol not in V1 or V2 or S
• S3 = T
• P3 =
89
CFL’s and regular languages
90
CFL Closure Properties
• What have we just proven
– CFL’s are closed under Kleene closure
– CFL’s are closed under set union
– CFL’s are closed under set concatenation
• What can we conclude from these 3 results?
– It follows that regular languages are a subset of
CFL’s
91
Regular languages subset of CFL
• Recursive definition of regular languages
– Base Case:
• {}, {l}, {a}, {b} are regular languages over {a,b}
• P={}, P={S → l}, P={S → a}, P={S → b}
– Inductive Case:
• If L1 and L2 are are regular languages, then L1*,
L1L2, L1 union L2 are regular languages
• Use previous constructions to see that these
resulting languages are also context-free
92
Other CFL Closure Properties
• We will show that CFL’s are NOT closed under
many other set operations
• Examples include
– set complement
– set intersection
– set difference
93
Language class hierarchy
?
H
H
Equal
REG
CFL
REC
RE
All languages over alphabet S
94
Module 32
• Pushdown Automata (PDA’s)
– definition
– Example
• We define configurations and computations
of PDA’s
• We define L(M) for PDA’s
95
Pushdown Automata
Definition and Motivating Example
96
Pushdown Automata (PDA)
• In this presentation we introduce the PDA
model of computation (programming
language).
– The key addition to a PDA (from an NFA-/\) is
the addition of external memory in the form of
an infinite capacity stack
• The word “pushdown” comes from the stacks of
trays in cafeterias where you have to pushdown on
the stack to add a tray to it.
97
NFA for
a
b
/\
I
m
n
{a b
/\
B
C
• What strings end up in
each state of the above
NFA?
– I:
– B:
– C:
| m,n ≥ 0}
• Consider the language
{anbn | n ≥ 0}.
• This NFA can recognize
strings which have the
correct form,
– a’s followed by b’s.
• However, the NFA cannot
remember the relative
number of a’s and b’s seen
at any point in time.
98
PDA for
a
b
/\
I
/\
B
C
NFA
a;push a
I
Initialize stack to empty
|n≥0} *
Imagine we now have memory in the form of a stack which
we can use to help remember how many a’s we have seen by
pushing onto and popping from the stack
When we see an a in state I, we do the following two actions:
1) We push an a on the stack.
2) We stay in state I.
b;pop
/\
n
n
{a b
When we see a b in state B, we do the following two actions:
1) We pop an a from the stack.
/\;only if stack is empty
2) We stay in state B.
PDA
B
C
From state B, we allow a /\-transition to state C only if
1) The stack is empty.
Finally, when we begin, the stack should be empty.
99
Formal PDA definition
• PDA M = (Q, S, G, q0, Z, A, d)
• Modified elements
– G is the stack alphabet
• Z is a special character that is initially on the stack
• Often used to represent an empty stack
– d is modified as follows
• Pop to read the top character on the stack
• Stack update action
– What to push back on the stack
– If we push /\, then the net result of the action is a pop
100
Example PDA
a;a; aa
a;Z; aZ
I
Initialize stack to
only contain Z
/\;Z;Z
/\;a;a
b;a;/\
B
/\;Z;Z
C
Q = {I, B, C}
S = {a,b}
G = {Z, a}
q0 = I
Z is the initial stack character
A = {C}
d:
Example PDA
S
I
I
I
I
B
B
a
a
a
/\
/\
b
/\
TopSt
a
Z
a
Z
a
Z
NS
I
I
B
B
B
C
stack update
push aa
push aZ
push a
push Z
push /\
push Z
101
Computing with PDA’s *
• Configurations change compared with NFA-/\’s
– Configuration components:
• current state
• remaining input to be processed
• stack contents
• Computations are essentially the same as with
NFA-/\’s given the modified configurations
– Determining which transitions of a PDA can be
applied to a given configuration is more
complicated though
102
Computation Graph of PDA
Q = {I, B, C}
S = {a,b}
G = {Z, a}
q0 = I
Z is the initial stack character
A = {C}
d:
S
I
I
I
I
B
B
a
a
a
/\
/\
b
/\
TopSt
a
Z
a
Z
a
Z
NS
I
I
B
B
B
C
Computation graph for this PDA on the input string aabb
(I,aabb,Z)
(B,aabb,Z)
(I,abb,aZ)
(I,bb,aaZ)
stack update
push aa
push aZ
push a
push Z
push /\
push Z
(B,abb,aZ)
(C,aabb,Z)
(B,bb,aaZ)
(B,b,aZ)
(B,/\,Z)
(C,/\,Z)
103
a;a; aa
a;Z; aZ
Definition of ├
I
/\;Z;Z
/\;a;a
b;a;/\
B
/\;Z;Z
C
Input string aabb
(I, aabb, Z) ├ (I,abb,aZ)
(I,aabb,Z)
(I, aabb, Z) ├ (B, aabb, Z)
(B,aabb,Z)
(I,abb,aZ)
(I,bb,aaZ)
(B,bb,aaZ)
(B,b,aZ)
(B,/\,Z)
(B,abb,aZ)
(C,aabb,Z)
(I, aabb, Z) ├ 2 (C, aabb, Z)
(I, aabb, Z) ├ 3 (B, bb, aaZ)
(I, aabb, Z) ├ * (B, abb, aZ)
(I, aabb, Z) ├ * (B, /\, Z)
(I, aabb, Z) ├ * (C, /\, Z)
(C,/\,Z)
104
a;a; aa
a;Z; aZ
Acceptance and
Rejection
(I,aabb,Z)
(B,aabb,Z)
(I,abb,aZ)
(I,bb,aaZ)
(B,abb,aZ)
(C,aabb,Z)
I
/\;Z;Z
/\;a;a
b;a;/\
B
/\;Z;Z
C
Input string aabb
M accepts string x if one of the
configurations reached is an accepting
configuration
(q0, x, Z) ├* (f, /\, a),f in A, a in G*
Stack contents can be anything
(B,bb,aaZ)
(B,b,aZ)
Not an accepting
configuration since input
not completely processed
(B,/\,Z)
(C,/\,Z)
An accepting
configuration
Not an accepting
configuration since state
is not accepting
M rejects string x if all configurations
reached are either not halting
configurations or are rejecting
configurations
105
Defining L(M) and LPDA
M accepts string x if one of the
configurations reached is an accepting
configuration
(q0, x, Z) ├* (f, /\, a),f in A, a in G*
Stack contents can be anything
M rejects string x if all configurations
reached are either not halting
configurations or are rejecting
configurations
• L(M) (or Y(M))
– The set of strings ?
• N(M)
– The set of strings ?
• LPDA
– Language L is in language
class LPDA iff ?
106
Deterministic PDA’s
• A PDA is deterministic if its transition function
satisfies both of the following properties
– For all q in Q, a in S union {/\}, and X in G,
• the set d(q,a,X) has at most one element
– For all q in Q and X in G,
• if d(q, /\, X) ≠ { }, then d(q,a,X) = { } for all a in S
• A computation graph is now just a path again
• Our default assumption is that PDA’s are
nondeterministic
107
Two forms of nondeterminism
Trans Current Input Top of Next Stack
#
State Char. Stack State Update
------------------------------------------------------1
q0
a
Z
q0
aZ
2
q0
a
Z
q0
aa
3
4
q0
q0
/\
a
Z
Z
q0
q0
aZ
aa
108
LPDA and DCFL
• A language L is in language class LPDA if and
only if there exists a PDA M such that L(M) = L
• A language L is in language class DCFL
(Deterministic Context-Free Languages) if and
only if there exists a deterministic PDA M such
that L(M) = L
• To be proven
– LPDA = CFL
– CFL is a proper superset of DCFL
109
PDA Comments
• Note, we can use the stack for much more
than just a counter
• See examples in chapter 7 for some details
110
Module 33
• Pushdown Automata (PDA’s)
– Another example
111
Palindromes
• Let PAL be the set of palindromes over {a,b}
– Let PAL1 be the following related language:
• {wcwr | w consists only of a’s and b’s}
– we add c to the input alphabet as a special “marker” character
– Strings in PAL1
» aca, bcb, abcba, aabcbaa, c
– strings not in PAL1
» aaca, aaccaa, abccba, abcb, abba
– Let PAL2 be the set of even length palindromes
• {wwr | w consists only of a’s and b’s}
112
PAL1
• Lets first construct a PDA for PAL1
• Basic ideas
– Have one state remember first “half” of string
– Have one state “match” second half of string to
first half
– Transition between these two states when the
first c is encountered
113
PDA for PAL1
• M = (Q, S, G, q0, Z, A, d)
–
–
–
–
–
–
Q = {q0, qm, qf}
S = {a, b, c}
G = {Z, a, b}
q0 = q 0
Z=Z
A = {qf}
114
Transition Function
Trans Current Input Top of Next Stack
#
State Char. Stack State Update
------------------------------------------------------1
q0
a
Z
q0
aZ
2
q0
a
a
q0
aa
3
q0
a
b
q0
ab
4
5
6
q0
q0
q0
b
b
b
Z
a
b
q0
q0
q0
bZ
ba
bb
7
8
9
q0
q0
q0
c
c
c
Z
a
b
qm
qm
qm
Z
a
b
10
11
12
qm
qm
qm
a
b
l
a
b
Z
qm
qm
qf
l
l
Z
First three transitions push a on
top of the stack
Second three transitions push b
on the stack
Third three transitions switch
state q0 to qm
No change to stack
Transitions 10 and 11 “match”
characters from first and last
half of input string
115
Notation comment
Trans Current Input Top of Next Stack
#
State Char. Stack State Update
------------------------------------------------------1
q0
a
Z
q0
aZ
2
q0
a
a
q0
aa
3
q0
a
b
q0
ab
4
5
6
q0
q0
q0
b
b
b
Z
a
b
q0
q0
q0
bZ
ba
bb
7
8
9
q0
q0
q0
c
c
c
Z
a
b
qm
qm
qm
Z
a
b
10
11
12
qm
qm
qm
a
b
l
a
b
Z
qm
qm
qf
l
l
Z
We might represent transition 1
in two other ways
d(q0,a,Z) = (q0, aZ)
(q0, a, Z, q0, aZ)
•Question
•Is this PDA deterministic?
116
Computation Graph 1
Trans Current Input Top of Next Stack
#
State Char. Stack State Update
------------------------------------------------------1
q0
a
Z
q0
aZ
2
q0
a
a
q0
aa
3
q0
a
b
q0
ab
4
5
6
q0
q0
q0
b
b
b
Z
a
b
q0
q0
q0
bZ
ba
bb
7
8
9
q0
q0
q0
c
c
c
Z
a
b
qm
qm
qm
Z
a
b
10
11
12
qm
qm
qm
a
b
l
a
b
Z
qm
qm
qf
l
l
Z
(q0, abcba, Z)
(q0, bcba, aZ)
(q0, cba, baZ)
(qm, ba, baZ)
(qm, a, aZ)
(qm, l, Z)
(qf, l, Z)
117
Computation Graph 2
Trans Current Input Top of Next Stack
#
State Char. Stack State Update
------------------------------------------------------1
q0
a
Z
q0
aZ
2
q0
a
a
q0
aa
3
q0
a
b
q0
ab
4
5
6
q0
q0
q0
b
b
b
Z
a
b
q0
q0
q0
bZ
ba
bb
7
8
9
q0
q0
q0
c
c
c
Z
a
b
qm
qm
qm
Z
a
b
10
11
12
qm
qm
qm
a
b
l
a
b
Z
qm
qm
qf
l
l
Z
(q0, abcab, Z)
(q0, bcab, aZ)
(q0, cab, baZ)
(qm, ab, baZ)
118
Computation Graph 3
Trans Current Input Top of Next Stack
#
State Char. Stack State Update
------------------------------------------------------1
q0
a
Z
q0
aZ
2
q0
a
a
q0
aa
3
q0
a
b
q0
ab
4
5
6
q0
q0
q0
b
b
b
Z
a
b
q0
q0
q0
bZ
ba
bb
7
8
9
q0
q0
q0
c
c
c
Z
a
b
qm
qm
qm
Z
a
b
10
11
12
qm
qm
qm
a
b
l
a
b
Z
qm
qm
qf
l
l
Z
(q0, acab, Z)
(q0, cab, aZ)
(qm, ab, aZ)
(qm, b, Z)
(qf, b, Z)
119
PAL2
• Lets now construct a PDA for PAL
• What is harder this time?
– When do we switch from putting strings on the
stack to matching?
– Example
• After seeing aab, should we switch to match mode
or stay in stack mode?
– Solution
• Do both using nondeterminism
120
PDA for PAL2
• M = (Q, S, G, q0, Z, A, d)
–
–
–
–
–
–
Q = {q0, qm, qf}
S = {a, b}
G = {Z, a, b}
q0 = q 0
Z=Z
A = {qf}
121
Transition Relation
Trans Current Input Top of Next Stack
#
State Char. Stack State Update
------------------------------------------------------1
q0
a
Z
q0
aZ
2
q0
a
a
q0
aa
3
q0
a
b
q0
ab
4
5
6
q0
q0
q0
b
b
b
Z
a
b
q0
q0
q0
bZ
ba
bb
7
8
9
q0
q0
q0
l
l
l
Z
a
b
qm
qm
qm
Z
a
b
10
11
12
qm
qm
qm
a
b
l
a
b
Z
qm
qm
qf
l
l
Z
First three transitions push a on
top of the stack
Second three transitions push b
on the stack
Third three transitions switch
state q0 to qm
Is the PDA deterministic or
nondeterministic?
122
Computation Graph 1
Trans Current Input Top of Next Stack
#
State Char. Stack State Update (q0, abba, Z)
------------------------------------------------------1
q0
a
Z
q0
aZ
(q0, bba, aZ)
2
q0
a
a
q0
aa
3
q0
a
b
q0
ab
(q0, ba, baZ)
4
5
6
q0
q0
q0
b
b
b
Z
a
b
q0
q0
q0
bZ
ba
bb
7
8
9
q0
q0
q0
l
l
l
Z
a
b
qm
qm
qm
Z
a
b
10
11
12
qm
qm
qm
a
b
l
a
b
Z
qm
qm
qf
l
l
Z
(qm, abba, Z)
(qm, bba, aZ)
(q0, a, bbaZ)
(q0, l, abbaZ)
(qf, abba, Z)
(qm, ba, baZ)
(qm, a, bbaZ)
(qm, l, abbaZ)
(qm, a, aZ)
(qm, l, Z)
(qf, l, Z)
123
Computation Graph 2
Trans Current Input Top of Next Stack
#
State Char. Stack State Update
------------------------------------------------------1
q0
a
Z
q0
aZ
2
q0
a
a
q0
aa
3
q0
a
b
q0
ab
4
5
6
q0
q0
q0
b
b
b
Z
a
b
q0
q0
q0
bZ
ba
bb
7
8
9
q0
q0
q0
l
l
l
Z
a
b
qm
qm
qm
Z
a
b
10
11
12
qm
qm
qm
a
b
l
a
b
Z
qm
qm
qf
l
l
Z
(q0, aba, Z)
(q0, ba, aZ)
(qm, aba, Z)
(q0, a, baZ)
(qm, ba, aZ)
(q0, l, abaZ)
(qf, aba, Z)
(qm, a, baZ)
(qm, l, abaZ)
124
PAL
• Challenge
– Construct a PDA for PAL
– First step
• Construct a PDA for odd length palindromes
– Then
• Combine PDA’s for odd length and even length
palindromes
125
Module 34
• CFG → PDA construction
– Shows that for any CFL L, there exists a PDA
M such that L(M) = L
– The reverse is true as well, but we do not prove
that here
126
CFL subset LPDA
• Let L be an arbitrary CFL
• Let G be the CFG such that L(G) = L
– G exists by definition of L is CF
•
•
•
•
Construct a PDA M such that L(M) = L(G)
Argue L(M) = L
There exists a PDA M such that L(M) = L
L is in LPDA
– By definition of L in LPDA
127
Visualization
•Let L be an arbitrary CFL
•Let G be the CFG such that L(G) = L
•G exists by definition of L is CF
•Construct a PDA M such that L(M) = L
•M is constructed from CFG G
•Argue L(M) = L
•There exists a PDA M such that L(M) = L
•L is in LPDA
•By definition of L in LPDA
L
L
CFL
G
CFG’s
LPDA
M
PDA’s
128
Algorithm Specification
• Input
– CFG G
• Output
– PDA M such that L(M) =
CFG G
A
PDA M
129
Construction Idea
• The basic idea is to have a 2-phase PDA
– Phase 1:
• Derive all strings in L(G) on the stack nondeterministically
• Do not process any input while we are deriving the string on
the stack
– Phase 2:
• Match the input string against the derived string on the stack
– This is a deterministic process
• Move to an accepting state only when the stack is empty
130
Illustration
• Input Grammar G
–
–
–
–
V = {S}
S = {a,b}
S=S
P:
S → aSb | l
• What is L(G)?
1. Derive all strings in L(G) on the stack
2. Match the derived string against input
Illustration of how the PDA might work,
though not completely accurate.
(q0, aabb, Z)
/* put S on stack */
(q1, aabb, SZ)
/* derive aabb on stack */
(q1, aabb, aSbZ)
(q1, aabb,
aaSbbZ)
(q1, aabb, aabbZ)
/* match stack vs input */
(q2, aabb, aabbZ)
(q2, abb, abbZ)
(q2, bb, bbZ)
(q2, b, bZ)
(q2,l, Z)
131
(q3, l, Z)
Difficulty
(q0, aabb, Z)
/* put S on stack */
(q1, aabb, SZ)
/* derive aabb on stack */
(q1, aabb, aSbZ)
(q1, aabb,
aaSbbZ)
(q1, aabb, aabbZ)
/* match stack vs input */
(q2, aabb, aabbZ)
(q2, abb, abbZ)
(q2, bb, bbZ)
(q2, b, bZ)
(q2,l, Z)
(q3, l, Z)
1. Derive all strings in L(G) on the stack
2. Match the derived string against input
What is illegal with the
computation graph on the
left?
132
Construction
• Input Grammar
– G=(V,S, S, P)
• Output PDA
–
–
–
–
–
–
–
M=(Q, S, G, q0, Z, F, d)
Q = {q0, q1, q2}
S=S
G = V union S union {Z}
Z=Z
q0 = q0
F = {q2}
• d:
– d(q0, l, Z) = (q1, SZ)
– d(q1, l, Z) = (q2, Z)
– For all productions A → a
• d(q1, l, A) = (q1, a)
– For all a in S
• d(q1, a, a) = (q1, l)
133
Examples
134
Palindromes
• PALG:
–
–
–
–
V = {S}
S = {a,b}
S=S
P:
• S → aSa | bSb | a | b | l
• Output PDA
M=(Q,S,G,q0,Z,F,d)
–
–
–
–
–
•
d:
Q = {q0, q1, q2}
G = {a,b,S,Z}
q0 = q0
Z=Z
F = {q2}
– d(q0, l, Z) = (q1, SZ)
– d(q1, l, Z) = (q2, Z)
– Production Transitions
•
•
•
•
•
d(q1, l, S) = (q1, aSa)
d(q1, l, S) = (q1, bSb)
d(q1, l, S) = (q1, a)
d(q1, l, S) = (q1, b)
d(q1, l, S) = (q1, l)
– Matching transitions
• d(q1, a, a) = (q1, l)
• d(q1, b, b) = (q1, l)
135
Palindrome Transition Table
Transition
Current Input
Top of Next Stack
Number
State
Symbol Stack
State Update
--------------------------------------------------------------------------------1
q0
l
Z
q1
SZ
2
q1
l
Z
q2
Z
3
q1
l
S
q1
aSa
4
q1
l
S
q1
bSb
5
q1
l
S
q1
a
6
q1
l
S
q1
b
7
q1
l
S
q1
l
8
q1
a
a
q1
l
9
q1
b
b
q1
l
136
Partial Computation Graph
(q0, aba, Z)
(q1, aba, SZ)
(q1, aba, aSaZ) (other branches not shown)
(q1, ba, SaZ)
(q1, ba, baZ) (other branches not shown)
(q1, a, aZ)
(q1, l, Z)
(q2, l, Z)
On your own, draw computation trees for other strings
not in the language and see that they are not accepted.
137
n
n
{a b
• Grammar G:
–
–
–
–
V = {S}
S = {a,b}
S=S
P:
| n ≥ 0}
• δ:
– δ(q0, l, Z) = (q1, SZ)
– δ(q1, l, Z) = (q2, Z)
– Production Transitions
•
• S → aSb | l
• Output PDA
M=(Q,S,G,q0,Z,F,d)
–
–
–
–
–
– Matching transitions
Q = {q0, q1, q2}
G = {a,b,S,Z}
q0 = q0
Z=Z
F = {q2}
138
n
n
{a b
| n ≥ 0} Transition Table
Transition
Current Input
Top of Next Stack
Number
State
Symbol Stack
State Update
--------------------------------------------------------------------------------1
q0
l
Z
2
q1
l
Z
3
q1
l
S
4
q1
l
S
5
q1
a
a
6
q1
b
b
139
Partial Computation Graph
(q0, aabb, Z)
(q1, aabb, SZ)
(q1, aabb, aSbZ) (other branch not shown)
(q1, abb, SbZ)
(q1, abb, aSbbZ) (other branch not shown)
(q1, bb, SbbZ)
(q1, bb, bbZ) (other branch not shown)
(q1, b, bZ)
(q1, l, Z)
(q2, l, Z)
140
i
j
{a b
• Grammar G:
–
–
–
–
V = {S,T,U}
S = {a,b}
S=S
P:
• S→T|U
• T → aTb | l
• U → aaUb | l
• Output PDA
M=(Q,S,G,q0,Z,F,d)
–
–
–
–
–
Q = {q0, q1, q2}
G = {a,b,S,T,U,Z}
q0 = q0
Z=Z
F = {q2}
| i = j or i = 2j}
•
d
– d(q0, l, Z) = (q1, SZ)
– d(q1, l, Z) = (q2, Z)
– Production Transitions
•
– Matching transitions
•
141
{aibj | i = j or i = 2j} Transition Table
Transition
Current Input
Top of Next Stack
Number
State
Symbol Stack
State Update
--------------------------------------------------------------------------------1
q0
l
Z
q1
SZ
2
q1
l
Z
q2
Z
3
q1
l
S
q1
T
4
q1
l
S
q1
U
5
q1
l
T
q1
aTb
6
q1
l
T
q1
l
7
q1
l
U
q1
aaUb
8
q1
l
U
q1
l
9
q1
a
a
q1
l
10
q1
b
b
q1
l
142
Partial Computation Graph
(q0, aab, Z)
(q1, aab, SZ)
(q1, aab, UZ) (other branch not shown)
(q1, aab, aaUbZ) (other branch not shown)
(q1, ab, aUbZ)
(q1, b, UbZ)
(q1, b, bZ) (other branch not shown)
(q1, l, Z)
(q2, l, Z)
143
Things you should be able to do
• You should be able to execute this algorithm
– Given any CFG, construct an equivalent PDA
• You should understand the idea behind this
algorithm
– Derive string on stack and then match it against input
• You should understand how this construction can
help you design PDA’s
• You should understand that it can be used in
answer-preserving input transformations between
decision problems about CFL’s.
144
Module 35
• Attempt to prove that CFL’s are closed
under intersection
– Review previous constructions
– Translate previous constructions to current
setting
– Prove modified result
145
High Level Overview
146
CFL closed under set intersection
• Let L1 and L2 be arbitrary CFL’s
• Let M1 and M2 be PDA’s s.t. L(M1) = L1, L(M2) =
L2
– M1 and M2 exist by definition of L1 and L2 are CFL’s and
the fact that for every CFG, there is an equivalent PDA
•
•
•
•
Construct PDA M3 from PDA’s M1 and M2
Argue L(M3) = L1 intersect L2
There exists a PDA M3 s.t. L(M3) = L1 intersect L2
L1 intersect L2 is a CFL
147
Visualization
•Let L1 and L2 be arbitrary CFL’s
•Let M1 and M2 be PDA’s s.t. L(M1) = L1,
L(M2) = L2
•M1 and M2 exist by definition of L1
and L2 are CFL’s and the fact that for
every CFG, there is an equivalent PDA
•Construct PDA M3 from PDA’s M1 and M2
•Argue L(M3) = L1 intersect L2
•There exists a PDA M3 s.t. L(M3) = L1
intersect L2
•L1 intersect L2 is a CFL
L1
L1 intersect L2
L2
CFL
M1
M3
M2
PDA’s
148
Algorithm Specification
• Input
– Two PDA’s M1 and M2
• Output
– PDA M3 such that L(M3) =
PDA M1
PDA M2
A
PDA M3
149
Review Previous Results
150
Underlying Idea
• Previous Results
– recursive languages are closed under set intersection
– r.e. languages are closed under set intersection
– regular languages are closed under set intersection
• What is the idea underlying the constructions used
to prove these previous results?
151
Implementation with FSA’s *
• Given the basic idea underlying these
constructions, how was this idea implemented in
when dealing with FSA’s?
• That is, restate the construction used to prove that
the regular languages are closed under set
intersection.
– Specify the output FSA in terms of the input FSA’s
152
Applying previous approach to
PDA’s
153
Applying approach to PDA’s *
• Given the basic idea underlying these
constructions, try and implement this idea in a
construction working with PDA’s rather than
FSA’s.
• That is, give a construction specifying how the
output PDA is built out of the input PDA’s
154
Problem
• Describe what goes wrong when applying this
idea to PDA’s instead of FSA’s.
• Does this prove that CFL’s are NOT closed under
set intersection?
155
Modified Result *
• What happens if the inputs are 1 FSA and 1 PDA?
• What modified result does the resulting
construction prove?
156
Module 36
• Non context-free languages
– Examples and Intuition
• Pumping lemma for CFL’s
– Pumping condition
– No proof of pumping lemma
– Applying pumping lemma to prove that some
languages are not CFL’s
157
Examples and Intuition
158
Examples
• What are some examples of nonregular languages?
• Can we build on any of these languages to create a
non context-free language?
159
Intuition
• Try and prove that these languages are CFL’s and
identify the stumbling blocks
– Why can’t we construct a CFG to generate this language?
– Why can’t we construct a PDA to accept this language?
– Compare to similar CFL languages to try and identify
differences.
160
Pumping Lemma for CFL’s
161
Comparison to regular language
pumping lemma/condition
162
What’s different about CFL’s than
regular languages? *
• In regular languages, a single substring “pumps”
– Consider the language of even length strings over {a,b}
– We can identify a single substring which can be pumped
• In CFL’s, multiple substrings can “pump”
– Consider the language {anbn | n > 0}
– No single substring can be pumped and allow us to stay
in the language
– However, there do exist pairs of substrings which can be
pumped resulting in strings which stay in the language
• This results in a modified pumping condition
163
Modified Pumping Condition
• A language L satisfies the
regular language pumping
condition if:
– there exists an integer n > 0
such that
– for all strings x in L of length
at least n
– there exist strings u, v, w such
that
•
•
•
•
x = uvw and
|uv| ≤ n and
|v| ≥ 1 and
For all k ≥ 0, uvkw is in L
• A language L satisfies the CFL
pumping condition if:
– there exists an integer n > 0
such that
– for all strings x in L of length
at least n
– there exist strings u, v, w, y, z
such that
• x = uvwyz and
• |vwy| ≤ n and
• |vy| ≥ 1 and
• For all k ≥ 0, uvkwykz is in L
164
Pumping Lemma
• All CFL’s satisfy the CFL pumping
condition
CFL’s
“Pumping Languages”
All languages over {a,b}
165
Pumping
Implications
CFL
• We can use the pumping lemma to prove a
language L is not a CFL
– Show L does not satisfy the CFL pumping
condition
• We cannot use the pumping lemma to prove
a language is context-free
– Showing L satisfies the pumping condition does
not guarantee that L is context-free
166
Pumping Lemma
What does it mean?
167
Pumping Condition
• A language L satisfies the CFL pumping condition if:
– there exists an integer n > 0 such that
– for all strings x in L of length at least n
– there exist strings u, v, w, y, z such that
• x = uvwyz and
• |vwy| ≤ n and
• |vy| ≥ 1 and
• For all k ≥ 0, uvkwykz is in L
168
v and y can
be pumped
1) x in L
2) x = uvwyz
3) For all k ≥ 0, uvkwykz is in L
• Let x = abcdefg be in L
• Then there exist 2 substrings v and y in x such that v and y
can be repeated (pumped) in place any number of times
and the resulting string is still in L
– uvkwykz is in L for all k ≥ 0
• For example
– v = cd and y = f
• uv0wy0z = uwz = abeg is in L
• uv1wy1z = uvwyz = abcdefg is in L
• uv2wy2z = uvvwyyz = abcdcdeffg is in L
• uv3wy3z = uvvvwyyyz = abcdcdcdefffg is in L
• …
169
What the other parts mean
• A language L satisfies the CFL pumping condition if:
– there exists an integer n > 0 such that
• Since we skip this proof, we will not see what n really means
– for all strings x in L of length at least n
• x must be in L and have sufficient length
– there exist strings u, v, w, y, z such that
• x = uvwyz and
• |vwy| ≤ n and
– v and y are contained within n characters of x
– Note: these are NOT necessarily the first n characters of x
• |vy| ≥ 1 and
– v and y cannot both be l,
– One of them might be l, but not both
• For all k ≥ 0, uvkwykz is in L
170
Example
• Let L be the set of palindromes over {a,b}
– Let x = aabaa
– Let n = 3
– What are the possibilities for v and y ignoring the
pumping constraint?
– Which ones satisfy the pumping lemma?
171
Pumping Lemma
Applying it to prove a specific
language L is not context-free
172
How we use the Pumping Lemma
• We choose a specific language L
– For example, {ajbjcj | j > 0}
• We show that L does not satisfy the
pumping condition
• We conclude that L is not context-free
173
Showing L “does not pump”
• A language L satisfies the
CFL pumping condition if:
– there exists an integer n > 0
such that
– for all strings x in L of
length at least n
– there exist strings u, v, w, y,
z such that
• x = uvwyz and
• |vwy| ≤ n and
• |vy| ≥ 1 and
• For all k ≥ 0, uvkwykz is
in L
• A language L does not
satisfy the CFL pumping
condition if:
– for all integers n of
sufficient size
– there exists a string x in L
of length at least n such that
– for all strings u, v, w, y, z
such that
• x = uvwyz and
• |vwy| ≤ n and
• |vy| ≥ 1
– There exists a k ≥ 0 such
that uvkwykz is not in L
174
Example Proof
•
A language L does not satisfy the
CFL pumping condition if:
– for all integers n of sufficient size
– there exists a string x in L of
length at least n such that
– for all strings u, v, w, y, z such that
•
•
•
•
• x = uvwyz and
• |vwy| ≤ n and
• |vy| ≥ 1
– There exists a k ≥ 0 such that
uvkwykz is not in L
Proof that L = {aibici | i>0} does not
satisfy the CFL pumping condition
Let n be the integer from the pumping
lemma
Choose x = anbncn
Consider all strings u, v, w, y, z s.t.
• x = uvwyz and
• |vwy| ≤ n and
• |vy| ≥ 1
•
Argue that uvkwykz is not in L for
some k ≥ 0
– Argument must apply to all possible
u,v,w,y,z
– Continued on next slide
175
Example Proof Continued
•
•
•
•
Proof that L = {aibici | i>0} does not
satisfy the CFL pumping condition
Let n be the integer from the pumping
lemma
Choose x = anbncn
Consider all strings u, v, w, y, z s.t.
• x = uvwyz and
• |vwy| ≤ n and
• |vy| ≥ 1
•
Argue that uvkwykz is not in L for
some k ≥ 0
– Argument must apply to all possible
u,v,w,y,z
– Continued next column
•
•
Identify possible cases for vwy
What is impossible for vwy?
•
Case 1
– vwy contains no a’s
•
Case 2
– vwy contains no c’s
•
Must argue uvkwykz is not in L for
both cases described above
– Can use different values of k
– Continued on next slide
176
Example Proof Continued
• Identify possible cases for vwy
• What is impossible for vwy?
• Case 1: vwy contains no a’s
– vy contains at least 1 b or c
• follows from
• Case 1
– vwy contains no a’s
• Case 2
– vwy contains no c’s
• Must argue uvkwykz is not in L
for both cases described above
– Can use different values of k
– Continued next column
– vwy contains no a’s and
– |vy| ≥ 1
– uwz is not in L
• uwz has n a’s
– follows from fact vwy contains
no a’s and x originally had n
a’s
• uwz has fewer than n b’s or
fewer than n c’s
– follows from vy contains at
least 1 b or c and x originally
only had n b’s and n c’s
• Continued next slide
177
Example Proof Continued
• Case 1: vwy contains no a’s
– vy contains at least 1 b or c
• Case 2: vwy contains no c’s
– vy contains at least
• follows from
– vwy contains no a’s and
– |vy| ≥ 1
– uwz is not in L
• uwz has n a’s
– follows from fact vwy contains
no a’s and x originally had n
a’s
• uwz has fewer than n b’s or
fewer than n c’s
– follows from vy contains at
least 1 b or c and x originally
only had n b’s and n c’s
• Continued next column
– uv2wy2z is not in L
• uv2wy2z has n c’s
– follows from fact vwy contains
no c’s and x originally had n
c’s
• uv2wy2z has more than n a’s or
more than n b’s
– follows from vy contains at
least 1 a or b and x originally
has n a’s and n b’s
• Continued next slide
178
Example Proof Completed
• Case 2: vwy contains no c’s
– vy contains at least
– uv2wy2z is not in L
•
uv2wy2z
has n c’s
– follows from fact vwy
contains no c’s and x
originally had n c’s
• uv2wy2z has more than n a’s
or more than n b’s
– follows from vy contains at
least 1 a or b and x
originally has n a’s and n
b’s
• For all possible u, v, w, y, z, we
have shown there exists a k ≥ 0
such that
– uvkwykz is not in L
• Note, we used a different value
of k for each case (though we
didn’t have to)
• Therefore L does not satisfy the
CFL pumping condition
• There L is not a CFL
• Continued next column
179
Other example languages
• TWOCOPIES = {ww | w is in {a,b}* }
– abbabb is in TWOCOPIES but abaabb is not
• EQUAL3 = the set of strings over {a, b, c}
such that the number of a’s equals the
number of b’s equals the number of c’s
• {aibjck | i < j < k}
180
Pumping Lemma
Two rules of thumb
181
Two Rules of Thumb
• Try to use blocks of at least n characters in x
– For TWOCOPIES, choose x = anbnanbn rather than
anbanb
• Guarantees v and y cannot be in more than 2 blocks of x
• Try k=0 or k=2
– k=0
• This reduces number of occurrences of v and y
– k=2
• This increases number of occurrences of v and y
182
Summary
• We use the Pumping Lemma to prove a
language is not a CFL
– Note, does not work for all non CFL languages
– Can be strengthened to Ogden’s Lemma
• In book
• Choosing a good string x is first key step
• Choosing a good k is second key step
• Typically have several cases for v, w, y
183
Module 37
• Showing CFL’s not closed under set
intersection and set complement
184
Nonclosure Properties for CFL’s
185
CFL’s not closed under set
intersection
• How can we prove that CFL’s are not closed
under set intersection?
186
Counterexample
• What is a possible L1 intersect L2?
– What non-CFL languages do we know?
• What could L1 and L2 be?
– L1 =
– L2 =
– How can we prove that L1 and L2 are contextfree?
187
CFL’s not closed under
complement
• How can we prove that CFL’s are not closed
under complement?
– Another way
• Use fact that any language class which is closed
under union and complement must also be closed
under intersection
188
Language class hierarchy
H
H
Equal
Equal-3
REG
CFL
REC
RE
All languages over alphabet S
189
Download