Uploaded by PRINCE KOFI GYIMAH

Context-free Grammars

advertisement
CONTEXT-FREE GRAMMAR
Definition of Context-free Grammar:
A context-free grammar (G) is a 4-tuple (quadruple)
G = (V, T, S, P)
where
V = Finite set of objects called Variables
T = Finite set of objects called Terminal symbols
SV = Start variable
P = Finite set of Production rules, with each rule being a
variable and a string of variables and terminals
A production rule P is of the form
X y where X is a variable and y is a string of
symbols from (V U T)*.
• Given a string w, of the form w = uxv, we can use the
production rule xy and obtain a new string z = uyv.
.
• The set of all strings obtained by using Production rules
is the “Language” generated by the Grammar.
• If the grammar G = (V, T, S, P) then
L(G) = {w  T * : S  w}
• If W  L(G), then the sequence
S  w1  w2  w 3 …  w n  w
is a “derivation” of the sentence w.
• The string S, w1 , w2 , … wn , which
contain variables as well as terminals, are
called “sentential forms” of the derivation.
Grammar:
S 
Derivation:
S aS a  a
S aS
• String Generators: Grammars specify
languages by generating strings in the
language using production rules e.g.
SaBb, BbBa | Sa, etc.
• Pattern Recognizers: Grammars can be
viewed as a notation for describing a
family of recognition algorithms.
• Context-freeness: A context-free
grammars allow the following:
– An A-rule can be applied whenever A
occurs in a string, irrespective of the
context (that is, non-terminals and
terminals around A)
4
S aSb, S  λ
SaSb  ab
a^1b^1
SaSb aaSbb aabb a2b2
SaSb  aaSbb  aaaSbbb aaabbb
……….
a3b3
anbn
L(G) =anbn
Example: Given a Grammar G = ({S}, {a, b}, S, P) with P defined
as S aSb, S  λ
(i) Obtain a sentence in language generated by G and the
sentential form
(ii) Obtain the language L(G).
Solution
SaSb ab
S  aSb aaSbb aabb SaSbaaSbbaabb
Therefore we have S* aabb. So a sentence in the language
generated by G is aabb.
The Sentential form = aaSbb.
(ii) The rule S  aSb is recursive.
All sentential forms will have the forms w i = ai S bi
Applying the production rule S  aSb, we get
ai bi ai+1Sbi+1
This is true for all i.
In order to get a sentence we apply S  λ
Therefore we get
S  anSbn anbn
Therefore L(G) = {anbn ; n > 0}.
Example: Given G1 =( { A, S}, {a, b }, S , P1 )
with P1 defined by the production rules:
S  aAb | λ
A  aAb | λ
(i). show that L(G1 ) = {anbn : n > 0}.
(ii). show that G1 is equivalent to G where
G = ({S}, {a, b}, S, P) where P
is given by
S aSb
S λ
Solution
Given P1 as S aAb | λ; A aAb | λ
S aAb
aλb
ab
S aAb  aaAbb  aabb i.e. a2b2 and so on
Therefore L(G1) = {anbn : n > 0}.
Given G = ({S},{a, b}, S, P) where P is S  aSb, S  λ.
The rule S  aSb is recursive.
All sentential forms will have the forms: wi =a iSb i
Applying the production rule S  aSb, we get
aiSbi  ai+1Sbi+1
This is true for all i.
In order to get a sentence, we apply S  λ.
Therefore we get S  anSbnanbn
Hence L(G) = {anbn : n > 0}.
Hence G1 is equivalent to G as both the grammars are
given by {anbn : n > 0}.
Example: Given a grammar G defined by the
production rules
S AB
A Aa
B Bb
A a
Bb.
Show that the word w = a2b4  L(G), where L is a
language determined by G.
Solution
S AB AaB aaB aaBb aaBbb
aaBbbb aabbbb i.e. a2b4
Hence the word w = a2b4  L(G).
Question:
Suppose a context free grammar
G = ( {S,A} ,{a,b},P,S) with the following
productions rules:
SaSb | aAb , AbAa , Aba
Determine its language .
Solution:
SaAbabab
SaSbaaAbb aababb (sub S->aAb)
S-aSb aaSbb aaaAbbbaaababbb
Thus L={anbmambn, where n>=1. m=1}
Example: Give a simple description of the language
generated by the grammar with productions
(a). S  aA, A bS, S λ
(b). S Aa, A B, B Aa
Solution
(a) For the given production rules
S aA  abS  ab
S  aA abS  abaA ababS abab
S  aA abS  abaA ababS ababaA abababS
 ababab , etc
we have the language L given by L ={(ab)n | n ≥1}
(b) For the given production rules
S  Aa  Ba  Aaa  Baa  Aaaa  Baaa  Aaaaa
There is no proper termination; so, there is no language L
produced.
Right-Linear Grammars
• In right-linear grammar, all productions have one of the
two forms: V T *V
or
V T *
i.e. the LHS should have a single variable and the RHS
consists of any number of terminals (members of T)
optionally followed by a single variable.
e.g. A xyzB | xB | 
• The following automaton and right-linear grammar both
recognize the set of set of strings consisting of an even number
of 0’s and an even number of 1’s.
•
and NFAs
Right Linear Grammars
• This is another Right Linear Grammar:
Aa
A  aB
A
where A, B V and a  .
13
Left-Linear Grammars
• In a left-linear grammar, all productions have one of
the two forms: V VT * or V T *
i.e. the LHS must consist of a single variable, and the
RHS consists of an optional single variable followed
by one number of terminals. e.g.
Aa
A  Ba
A
where A, B  V and a  .
Example: Determine the context-free languages. for the
grammar G = ({S}, {a, b}, S, P) with productions:
(a). S aSa, S  bSb, S  λ
(b). S  abB, A aaBb, B bbAa, A λ
Solution
(a) S aSa aaSaa aabSbaa aabbaa
The language is L(a) = {wwR : w ϵ{a, b}*}.
or L(G) ={anbnan : n ≥ 0 ).
(b). S  abB  abbbAa  abbbaaBba  abbbaabbAaba
abbbaabbaaBbaba  abbbaabbaabbAababa
 abbbaabbaabbababa
The language is L(G) = {ab(bbaa)nbba(ba)n : n ≥ 0}
DERIVATION TREES
A ‘derivation tree’ is an ordered tree which the nodes are labeled
with the left sides of productions and in which the children of a
node represent its corresponding right sides.
Definition of a Derivation Tree
Let G = (V, T, S, P) be a CFG. An ordered tree is a derivation tree
for G iff (if and only if) it has the following properties:
i. The root of the derivation tree is S.
ii. Each and every leaf in the tree has a label from T U{λ}.
iii. Each and every interior vertex (a vertex which is no a leaf) has
a label from V.
iv. If a vertex has label V, and its children are labeled (from left to
right) a1 , a2 , …an , then P must contain a production of the
form A  a1, a2, ... an
v. A leaf labeled l has no siblings, that is, a vertex with a child
labeled l can have no other children.
Sentential Form
For a given CFG with productions S aA, A aB,
B bB, B a.
The derivation tree is as shown below:
Right Most/Left Most/Mixed Derivation
Consider the grammar G with production
1. S aSS
2. S b
Left most Derivation:
S aSS aaSSS  aabSS  aabaSSS  aababSS  aababbS
 aababbb The sequence followed is “1121222”
Mixed Derivation:
S  aSS  aSb aaSSb  aabSb  aabaSSb  aabaSbb 
aababbb
The sequence followed is “1212122”
Right most Derivation:
S  aSS  aSb  aaSSb  aaSaSSb  aaSaSbb  aaSabbb 
aababbb The sequence followed is “1211222”
A grammar G is context-free and has the
productions:
S aAB, A  Bba, B  bB, B  c
(i). Derive the word acbabc
(ii). Obtain the derivation tree.
Solution:
(i). The word w = acbabc is derived as follows:
S aAB  a(Bba)B  acbaB
acba(bB)acbabc.
B
c
c
A CFG given by productions is
S a, S aAS, A bS
Obtain the derivation tree of the word w = abaabaa.
Given a CFG given by G = (N, T, P, S)
with N = {S}, T = {a, b}, P ={S aSb, S  ab}
Obtain the derivation tree and the language generated
L(G).
Given G = (N, T, P, S) with
N = {E}, S = E, T = {id, +, *, c} with the productions:
E E + E, E  E* E, E  E, E id
Obtain the derivation tree.
Given a CFG G = (N, T, P, S)
with N = {S, A}, T = {a, b} and the productions:
S aS, S  aA, A bA, A b
Obtain the derivation tree and L(G).
a
Question:
Sketch the derivation tree for the CFG given
by S  aA, A  aB, B  bB, B  a.
Solution:
Given a grammar G with production rules
S  aB, S  bA, A aS, A bAA, A a, B bS, B
aBB, B b
Obtain the (i) leftmost derivation, and (ii) rightmost
derivation for the string “aaabbabbba”.
Solution
(i) Leftmost derivation:
S aB  aaBB  aaaBBB  aaabBB  aaabbB
 aaabbabB  aaabbabbB  aaabbabbbS 
aaabbabbba
(ii) Rightmost derivation:
S  aB  aaBB  aaBbS  aaBbbA  aaaBBbba
 aaabBbba  aaabbSbba  aaabbaBbba
 aaabbabbba
Example:
Let G = (V, , P, S) be a CFG in the form:
G  ({S},{a, b},{S   , S  aSb}, S )
i...Show.that.L(G )  {a b | n  0}
n n
ii..Draw.the.derivation.tree. for.aabb
i. S  aSb  aaSbb  aabb
S  aSb  aaSbb  aaaSbbb  aaabbb
S  aSb  aaSbb  aaaSbbb  aaaaSbbbb  aaaabbbb
Thus, L(G )  {a b | n  0}
n
[See slide #5]
n
27
ii. Derivation tree for aabb is:
S
S
a
a

b
b
28
G  ({S , A, B},{a , b},
{S  AB, A  aA |  ,
B  Bb |  },
S)
L(G )  L( a * b*)
Leftmost Derivation :
S  AB  aAB  aB  aBb  ab
Rightmost Derivation :
S  AB  ABb  Ab  aAb  ab
29
Derivation Tree
S
A
B
A
a

)
B
b

30
More Examples of CFGs and CFLs
)
31
S  aSa | aBa
B  bB | b
L( S )  {a b a : m  0}
n m n
L( S )  {a b a : n, m  0}
m m
)
m
32
S  aSa | B
B  bB | 
L( S )  {a b a | n  0  m  0}
n m n
S  abSc | 
L( S )  {( ab) c | n  0}
n
)
n
33
S  AB
A  aA | a
B  bB | 
S  aS | aB
B  bB | 
L( S )  {a b | m  0, n  0}
 *
L( S )  L( a b )
n m
)
34
S  aS | B
S  AbAbA
A  aA | 
B  bA
A  aA | bC
C  aC | 
L( S )  {a * ba * ba* | a, b  0}
)
35
S 
S   | aO | bO
| aaE | abE
O
aE | bE
| baE | bbE
L( S )  {w {a, b}* | length ( w) is EVEN }
S   | aE | bO
O
aO | bE
L( S )  {w  {a, b}* | w has EVEN number of b' s}
)
36
Example: Given the grammar G = (V, T,
P,E) with the following productions:
A  AbA
AB
B  aBa
Bb
Derive the string aabaababa.
Solution:
A  AbA  BbA  aBabA  aaBaabA
 aabaabA  aabaabB  aabaabaBa
 aabaababa
Consider the grammar G = (V, T, P,E) where V =
{E,N}, T = {+,*,(,), 0,1},and P contains the
following productions:
E E + E | E * E | (E) | N
N  0N |1N | 0 | 1
All the following words are in the language L(G):
0
0 * 1 + 111
(1 + 1) * 0
(1 * 1) + (((0000)) * 1111)
For instance, (1 + 1) * 0 is derived by
E  E * E  (E) * E  (E + E) * E  (N + N) * N
 (1 + 1) * 0:
The derivation tree for the grammar is:
Leftmost derivation:
E  E + E  N + E  0N + E  01 + E  01 +
(E)  01 + (E * E)  01 + (N * E)  01 + (1 *
E)  01 + (1 * N)  01 + (1 * 0)
Rightmost derivation:
E  E + E  E + (E)  E + (E * E)  E + (E * N)
 E + (E * 0)  E + (N * 0)  E + (1 * 0)  N +
(1 * 0)  0N + (1 * 0)  01 + (1 * 0)
• Leftmost derivation uses the depth first traversal
of the tree from left to right encounters them.
• Rightmost derivation corresponds to the depth
first traversal from right to left.
Ambiguity in Context-free Grammars (CFGs)
and Context-free Languages (CFLs)
41
• A context-free grammar G is called ambiguous if
some word has more than one leftmost
derivation (equivalently: more than one
derivation tree).
• Otherwise the grammar is unambiguous.
E.g. the word 1+0+1 has the following
two leftmost derivations
• EE+EE+E+E1+E+E
 1 + 0 + E  1 + 0 + 1 and
• EE+E1+E1+E+E1+0+E
1+0+1
These correspond to different derivation trees;
thus the CFG for the word 1+0+1 is ambiguous.
Ambiguity in CFGs
Example:
S ==> AS | 
A ==> A1 |
0A1 | 01
Input string: 00111
• Can be derived in two ways
Leftmost derivation #1:
S => AS
=> 0A1S
=>0A11S
=> 00111S
=> 00111
Leftmost derivation #2:
S => AS
=> A1S
=> 0A11S
=> 00111S
=> 00111
44
• The grammar G1 = ({S}, {a, b}. P1, S) where P1
contains the productions
S aSb | aaS | έ is ambiguous because the word
aaab has two different leftmost derivations:
S  aaS  aaaSb  aaab and
S  aSb  aaaSb  aaab:
• The language {a2k+nbn | k, n >=0} it generates is not
inherently ambiguous because it is generated by
the equivalent unambiguous grammar ({S,A}, {a, b},
P11, S) with productions
S  aSb | A, A  aaA | έ
Note: έ and λ are used synonymously.
Why does ambiguity matter?
Given E ==> E + E | E * E | (E) | a | b | c | 0 | 1
Derive the string: = a * b + c
LM derivation #1:
E => E + E => (E)+E => (E * E) + E
=> (a * b) + c
E
E
*
a
E
+
E
(a*b)+c
c
E
b
E
LM derivation #2
E => E * E => a * E =>a*(E)
=> a * (E + E) => a * (b + c)
E
a
The calculated value depends on which
of the two parse trees is actually used.
E
*
E
b
+
a*(b+c)
E
c
The Values are
different !!!
Removing Ambiguity in
Expression Evaluations
• It may be possible to remove ambiguity for some CFLs
– E.g. in a CFG for expression evaluation by imposing rules &
restrictions such as precedence
– This would imply a re-write of the grammar
Order of Precedence: (), * , +
Ambiguous
version:
E ==> E + E | E * E | (E) | a | b | c | 0 | 1
Modified/unambiguous
version
E => E + T | T
T => T * F | F
F => I | (E)
I => a | b | c | 0 | 1
Inherently Ambiguous CFLs
• However, for some languages, it may not be
possible to remove ambiguity
• A CFL is said to be inherently ambiguous if
every CFG that describes it is ambiguous
Example:
L = { anbncmdm | n,m≥1} U {anbmcmdn | n,m≥1}
L is inherently ambiguous
This can be proved using the input string: anbncndn
[The proof is beyond the scope of this course; it will be done
48
in Theory of Computing (in Level 400)]
Converting from Grammars to Finite Automata
Convert the following Grammar to Finite Automata
S
A
B
F
Solution:
a
S
c
->
->
->
->
aA | cF
bB | bA
λ
λ
b
A
b
B
F
50
Convert the following Grammars to Finite Automata
S
A
B
F
->
->
->
->
S
A
B
F
Z
aA | cF
bB | bA
λ
λ
Right-Linear Grammar
Solution:
b
a
S
c
A
->
->
->
->
->
λ
Sa | Ab
Ab
Sc
B | F
Left-Linear Grammar
b
B
F
51
Converting from Finite Automata to Grammars
Note: λ and ε are used
interchangeably as
non-input symbols.
i.e. A  aA | bC | aW
C cC | ε
W  cX
Xε
Download