Context-Free Grammars

advertisement
4.2 Context-Free Grammars
In this section, we shall introduce context-free grammars to
generate a class of languages larger than the class of regular
languages over the same alphabet.
Definition 3: A context-free grammar(CFG) is denoted by
G = (V, T, P, S), where
V is a finite set of variables, S V is the start symbol,
T is a finite set of terminals and T  V = , and
P is a finite set of productions or rewriting rules, each
production is of the form:
A  , where A  V and   (VT)*.
Example 4: Let a CFG be G = (V, T, P, S), where V = {S},
T = {0, 1} and P contains the following productions:
S  01 | 0S1
We have that
S  01,
S  0S1  0011, and
S  0S1  00S11  000111.
Therefore, S * 0 n1 n, for n = 1, 2, ....
Definition 4: The set generated by a context-free grammar G = (V,
T, P, S) is {  T* | S * } denoted by L(G). And L(G) is said
to be a context-free language(CFL). If S *   T*, then  is
said to be a sentence generated by the grammar G.
We know that the set L = {0 n1 n | n = 1, 2, ....} is not regular. By
definition 4 and example 4, L is context-free but not regular.
Regular grammars are a special type of context-free grammars.
Therefore, we have that the class of regular sets is a proper
subset of the class of context-free sets over the same alphabet.
Two grammars are equivalent if they generate the same set.
Example 5: The set of well-formed regular expressions over an
alphabet, say  = {0, 1}, is context-free.
Solution : It is sufficient to show the rewriting rules as follows.
S   |  | 0 | 1 | (S*) | (S+S) | (SS)
Note: The definition of the set of productions is same as the
recursive definition of regular expressions.
Example 6: Construct a CFG G such that L(G)= {  {0, 1}* |
=  R}.
Solution :
S  0S0 | 1S1 | 0 | 1 | 
Example 7: Construct a CFG G such that L(G)= { R |
  {0, 1}* }.
Solution :
S  0S0 | 1S1 | 
We know that the set { R |   {0, 1}* } is a subset of the set
{  {0, 1}* | =  R}. And the difference is the set {0, 1}.
Given a CFL L, to construct a CFG G to generate L is not an easy
job. Basically, there is no fixed method to solve the problem.
If the language L is regular, it is quite easy. Usually, we can
construct a regular grammar directly. Or by a DFA accepting the
language, then convert to a regular grammar.
For non-regular language, to construct a CFG G is more
difficult. Most of time, we need to take care of the complicated
recursive structure of the grammar. And we have to come to a
case that a new defined variable must use previous defined
variable in order to finish the recurrence.
Sometimes, it could be easier if we construct a pushdown
automaton M first, then convert the machine M to a CFG G. We
have to wait until we reach chapter 5 to apply this method.
For some cases, we need to grasp the whole picture in order to
construct a CFG. And sometimes, we may miss something.
Example 8 : Construct a CFG G such that L = L(G) = { 
{0, 1}* | number of 0’s equal to number of 1’s in }.
Solution : Method 1 : We may get a CFG G as follows.
S  01S | 0S1 | S01 | 10S | 1S0 | S10 | 
But, how do we know the set generated by the grammar is what
we want? We need to prove it by induction in most cases.
Although, the grammar can generate strings with equal number
of 1’s and 0’s, but can not generate the string 00111100.
So, the grammar is not sufficient. What is the set generated by
the above grammar?
Now, we need to modify the above grammar.
Method 2 : Consider the following grammar :
S  0S1S | 1S0S | 
The reason is that a string  may start with 0, and 1 is somewhere
after the symbol. And between 0 and 1 there are zero, one or
more pairs of 0’s and 1’s. Similarly, after the pair of 0 and 1, there
could be also many pairs of 0 and 1.
If a string  starts with 1, then the reason is similar.
But, we only know that L(G)  L . Before we prove that L
 L(G), we need to prove the following lemma first.
Lemma : If  has n of 0‘s and n+1 of 1’s. Show that there
exist a 1 such that = 1, and each of  and  has equal
number of 0’s and 1’s, i.e., ,   L.
Proof : Let  = a 1 a 2 … a 2n+1, where a i is either 0 or 1.
Define f(1) =  1, if a 1 = 0. Otherwise,
f(1) = 1, as a 1 = 1.
Define f(i+1) = f(i)  1, if a i = 0. Otherwise,
f(i+1) = f(i) + 1, as a i = 1, for i = 1, 2, …, 2n.
For 1  k  2n+1, let  = a 1 a 2 … a k , then f(k)
= number of 1’s in   number of 0’s in . And
f(k+1) is one greater than or less than f(k).
If a 1 = 1, then = 1. We are done.
If a 1 = 0, then f(1) = 1. Since f(2n+1) = 1, there exist a smallest
k, 3  k  2n+1, such that a k = 1, f(k) = 1 and f( j )  1, for 1  j
 k . The value f(k1) must be 0, one less than f(k).
Let  = a 1 a 2 … a k-1. Then  has equal number of 0’s and 1’s.
Done!
For instance,  = 00101101110. Consider the graph of f(n). The
value of f(1) is 1. We have f(8) = 0 and f(9) = 1, where a 9 = 1.
And  = 00101101 has equal number of 0’s and 1’s.
3
2
1
0
1
2
3
0
0
1
0
1
1
0
1
1
1
1
2
3
4
5
6
7
8
9 10 11
0
Now, we can show that L  L(G) by induction as follows.
(1)   L, S * . Hence,   L(G).
(2)   L, and |  | = 2n, Assume that S * ,  L(G).
(3) ’  L, and | ’ | = 2n+2.
(a) Consider ’ = 0  L. Then  has n of 0‘s and n+1 of 1’s.
By lemma, there exists a 1 such that = 1, and each of 
and  has equal number of 0’s and 1’s, i.e., ,   L.
We have S  0S1S * 01S * 01 = ’  L(G).
(b) Consider ’ = 1  L. Then  has n of 1‘s and n+1 of
0’s. By the same reason as in (a), we have that
S  1S0S * 10S * 10 = ’  L(G). Done!
Sometimes, we need to set up the recurrence as follows.
Method 3 : Another solution for example 8.
A required CFG is as follows.
S  0B | 1A |
B  1S | 0BB |1
A  0S | 1AA |0
The reason is that for any string , there are only 3 cases:
(a) number of 0’s equal to number of 1’s in , let it be S.
(b) number of 0’s is one more than number of 1’s in ,
let it be A.
(c) number of 0’s is one less than number of 1’s in , let
it be B.
Let consider the following steps:
(1)  L(G), we need a production S  
(2) If S * 0, we need S  0B, where B * , and has
one more 1’s than 0’s. Mark B as a state with one extra 1’s.
(3) If S * 1, we need S  1A, where A * , and  has
one more 0’s than 1’s. Mark A as a state with one extra 0’s.
(4) If B * 1, then  has equal number of 0’s and 1’s.
Hence, we need B  1S | 1. The recurrence relation is set up.
(5) If B * 0, then  has 2 more 1’s than 0’s. Hence, we
need B  0BB. The recurrence relation is set up.
(6) The case for variable A is similar to the variable B. And
we have that A  0 | 0S | 1AA. Done!
Example 9: Construct a CFG G such that L(G) =L= {  {0, 1}* |
number of 0’s is twice as many as number of 1’s in }.
Solution :
Method 1 : Consider the following CFG.
S  0S0S1S | 0S1S0S | 1S0S0S | 
Question: Is L(G) = L?
Method 2 : Consider the following CFG.
For any string , let n and m be the numbers of 0’s and 1’s
in , respectively. Then there are only 5 cases:
(a) n = 2m, let it be the state S.
(b) n = 2m + 2, or (n  2) = 2m, let it be A.
(c) n = 2m  1, or (n  1) = 2(m  1), let it be B.
(d) n = 2m  2, or n = 2(m  1), let it be C.
(e) n = 2m + 1, or, (n  1) = 2m, let it be D.
Let consider the following steps:
(1)   L(G), we do not need a production S  
(2) If S * 0, we need S  0B, where B * , and  has
one less 0 than twice as many 0’s as 1’s. Mark B as a state of
one extra 0’s and one extra 1’s, or <1, 1> for short.
(3) If S * 1, we need S  1A, where A * , and has
two more 0’s than twice as many 0’s as 1’s. Mark A as a state
of two extra 0’s, or <2, 0> for short.
(4) If B * 0, we need B  0C, where C * , and  has
one more 1 than twice as many 0’s as 1’s. Mark C as a state
of one extra 1’s, or <0, 1> for short.
(5) If B * 1, we need B  1D, where D * , and  has
one more 0’s than twice as many 0’s as 1’s. Mark D as a state
of one extra 0’s, or <1, 0> for short.
(6) If A * 0, we need A  0D.
(7) If A * 1, we need A  1AA | 1DDDD | 1ADD |
1DAD | 1DDA.
(8) If C * 1, we need C  1S | 1.
(9) If C * 0, we need C  0BC | 0CB | 0CCD | 0CDC |
0DCC.
(10) If D * 0, we need D  0S | 0.
(11) If D * 1, we need D  1AD | 1DA | 1DDD.
The result is as follows.
(1) S  0B | 1A.
(2) B  0C | 1D.
(3) A  0D | 1AA | 1DDDD | 1ADD | 1DAD | 1DDA.
(4) C  1S | 1 | 0BC | 0CB | 0CCD | 0CDC | 0 DCC.
(5) D  0S | 0 | 1AD | 1DA | 1DDD.
We need to prove that L = L(G). Done!
Derivations and derivation trees(parse trees)
Example 10: Consider the following grammar
E  E + E | E  E | (E) | <integer>
where the production of the variable <integer> is
defined in section 4.1.
Now, consider the derivation for E * 9  4 + 3.
We use leftmost derivation, i.e., rewrite the leftmost variable,
to get the derivations.
E  E  E  <integer>  E * 9  E  9  E + E
 9  <integer> + E * 9  4 + E  9  4 + <integer>
* 9  4 + 3.
Now, select a different production at the first step, we have
E  E + E  E  E + E  <integer>  E + E
* 9  E + E  9  <integer> + E * 9  4 + E
 9  4 + <integer> * 9  4 + 3.
For a production E  E  E, a derivation E  E  E corresponds
to a subtree of derivation (or parse) tree as follows. The variable E
on the left side of the production is the root of the subtree, and the
items on the right side of the production become the children of the
root and listed in the same order as in the right side of the
production.
E
E

E
For derivations E * 9  4 + 3 as follows, we have a corresponding
parse tree listed below.
E  E  E  <integer>  E * 9  E  9  E + E
 9  <integer> + E * 9  4 + E  9  4 + <integer>
* 9  4 + 3.
E
E

E
E
+
E
<integer>
* <integer> <integer>
9
*
*
4
3
For derivations E * 9  4 + 3 as follows, we have a corresponding
parse tree listed below.
E  E + E  E  E + E  <integer>  E + E
* 9  E + E  9  <integer> + E * 9  4 + E
 9  4 + <integer> * 9  4 + 3.
E
E
E

+
E
E
<integer>
*
<integer>
<integer>
3
*
*
4
9
The above two parse trees correspond to the following two
expression trees.

+

+
9
4
3
9
3
4
The evaluation of the left tree is equal to 2, which is 9(4+3).
The evaluation of the right tree is equal to 8, which is (94)+3.
Therefore, the two parse trees of the derivations E * 9  4 + 3 are
totally different. Actually, these two parse trees are not graphically
isomorphic. And they possess different interpretation.
Definition 5: Let G = (V, T, P, S) be a context-free grammar. If
there is a string T* and there are two different parse trees for
the derivation S *, then the CFG G is said to be ambiguous.
Example 11: Modify the following grammar
E  E + E | E  E | (E) | <integer>
to a new grammar
E  (E + E) | (E  E) | <integer>
Although we can eliminate the ambiguity of the first grammar,
but sacrifice the convenience for dropping parentheses.
Left recursion and right recursion
Example 12 : Let L ={20 n1 | n  0}. L is context-free. Consider
the following grammar G 1.
S A1
A A0 | 2
The production A A0 is a left recursive production, for
instance
S  A1  A01  A001  A0001  A00001  20001
Consider the following grammar G 2.
S  A1
A  2 | 2B
B  0 | 0B
The production B 0B is a right recursive production, for
instance
S  A1  2B1  20B1  200B1  20001
Left recursive productions can be modified to right recursive
productions, and vise versa. The method to modify is shown
by the following theorem.
Definition 6: An A-production is a production with variable A on
the left side.
Theorem 4 : Let G = (V, T, P, S) be a CFG.
Let A  A  1 | A  2 | … | A  s be the set of left recursive Aproductions.
Let A   1 |  2 | … |  t be the set of remaining A-productions.
Let G’ = (V  {B}, T, P’, S) be a CFG, where
P’ = P \ {old A-productions of P} {new A-productions and B
productions defined as follows}
A 1 | 2 | … | t | 1B | 2 B | … | tB
B  1 | 2 | … | s| 1B | 2B | … | sB
Then L(G) = L(G’).
Proof : Leave as an exercise.
Download