CONTEXT-FREE GRAMMAR Definition of Context-free Grammar: A context-free grammar (G) is a 4-tuple (quadruple) G = (V, T, S, P) where V = Finite set of objects called Variables T = Finite set of objects called Terminal symbols SV = Start variable P = Finite set of Production rules, with each rule being a variable and a string of variables and terminals A production rule P is of the form X y where X is a variable and y is a string of symbols from (V U T)*. • Given a string w, of the form w = uxv, we can use the production rule xy and obtain a new string z = uyv. . • The set of all strings obtained by using Production rules is the “Language” generated by the Grammar. • If the grammar G = (V, T, S, P) then L(G) = {w T * : S w} • If W L(G), then the sequence S w1 w2 w 3 … w n w is a “derivation” of the sentence w. • The string S, w1 , w2 , … wn , which contain variables as well as terminals, are called “sentential forms” of the derivation. Grammar: S Derivation: S aS a a S aS • String Generators: Grammars specify languages by generating strings in the language using production rules e.g. SaBb, BbBa | Sa, etc. • Pattern Recognizers: Grammars can be viewed as a notation for describing a family of recognition algorithms. • Context-freeness: A context-free grammars allow the following: – An A-rule can be applied whenever A occurs in a string, irrespective of the context (that is, non-terminals and terminals around A) 4 S aSb, S λ SaSb ab a^1b^1 SaSb aaSbb aabb a2b2 SaSb aaSbb aaaSbbb aaabbb ………. a3b3 anbn L(G) =anbn Example: Given a Grammar G = ({S}, {a, b}, S, P) with P defined as S aSb, S λ (i) Obtain a sentence in language generated by G and the sentential form (ii) Obtain the language L(G). Solution SaSb ab S aSb aaSbb aabb SaSbaaSbbaabb Therefore we have S* aabb. So a sentence in the language generated by G is aabb. The Sentential form = aaSbb. (ii) The rule S aSb is recursive. All sentential forms will have the forms w i = ai S bi Applying the production rule S aSb, we get ai bi ai+1Sbi+1 This is true for all i. In order to get a sentence we apply S λ Therefore we get S anSbn anbn Therefore L(G) = {anbn ; n > 0}. Example: Given G1 =( { A, S}, {a, b }, S , P1 ) with P1 defined by the production rules: S aAb | λ A aAb | λ (i). show that L(G1 ) = {anbn : n > 0}. (ii). show that G1 is equivalent to G where G = ({S}, {a, b}, S, P) where P is given by S aSb S λ Solution Given P1 as S aAb | λ; A aAb | λ S aAb aλb ab S aAb aaAbb aabb i.e. a2b2 and so on Therefore L(G1) = {anbn : n > 0}. Given G = ({S},{a, b}, S, P) where P is S aSb, S λ. The rule S aSb is recursive. All sentential forms will have the forms: wi =a iSb i Applying the production rule S aSb, we get aiSbi ai+1Sbi+1 This is true for all i. In order to get a sentence, we apply S λ. Therefore we get S anSbnanbn Hence L(G) = {anbn : n > 0}. Hence G1 is equivalent to G as both the grammars are given by {anbn : n > 0}. Example: Given a grammar G defined by the production rules S AB A Aa B Bb A a Bb. Show that the word w = a2b4 L(G), where L is a language determined by G. Solution S AB AaB aaB aaBb aaBbb aaBbbb aabbbb i.e. a2b4 Hence the word w = a2b4 L(G). Question: Suppose a context free grammar G = ( {S,A} ,{a,b},P,S) with the following productions rules: SaSb | aAb , AbAa , Aba Determine its language . Solution: SaAbabab SaSbaaAbb aababb (sub S->aAb) S-aSb aaSbb aaaAbbbaaababbb Thus L={anbmambn, where n>=1. m=1} Example: Give a simple description of the language generated by the grammar with productions (a). S aA, A bS, S λ (b). S Aa, A B, B Aa Solution (a) For the given production rules S aA abS ab S aA abS abaA ababS abab S aA abS abaA ababS ababaA abababS ababab , etc we have the language L given by L ={(ab)n | n ≥1} (b) For the given production rules S Aa Ba Aaa Baa Aaaa Baaa Aaaaa There is no proper termination; so, there is no language L produced. Right-Linear Grammars • In right-linear grammar, all productions have one of the two forms: V T *V or V T * i.e. the LHS should have a single variable and the RHS consists of any number of terminals (members of T) optionally followed by a single variable. e.g. A xyzB | xB | • The following automaton and right-linear grammar both recognize the set of set of strings consisting of an even number of 0’s and an even number of 1’s. • and NFAs Right Linear Grammars • This is another Right Linear Grammar: Aa A aB A where A, B V and a . 13 Left-Linear Grammars • In a left-linear grammar, all productions have one of the two forms: V VT * or V T * i.e. the LHS must consist of a single variable, and the RHS consists of an optional single variable followed by one number of terminals. e.g. Aa A Ba A where A, B V and a . Example: Determine the context-free languages. for the grammar G = ({S}, {a, b}, S, P) with productions: (a). S aSa, S bSb, S λ (b). S abB, A aaBb, B bbAa, A λ Solution (a) S aSa aaSaa aabSbaa aabbaa The language is L(a) = {wwR : w ϵ{a, b}*}. or L(G) ={anbnan : n ≥ 0 ). (b). S abB abbbAa abbbaaBba abbbaabbAaba abbbaabbaaBbaba abbbaabbaabbAababa abbbaabbaabbababa The language is L(G) = {ab(bbaa)nbba(ba)n : n ≥ 0} DERIVATION TREES A ‘derivation tree’ is an ordered tree which the nodes are labeled with the left sides of productions and in which the children of a node represent its corresponding right sides. Definition of a Derivation Tree Let G = (V, T, S, P) be a CFG. An ordered tree is a derivation tree for G iff (if and only if) it has the following properties: i. The root of the derivation tree is S. ii. Each and every leaf in the tree has a label from T U{λ}. iii. Each and every interior vertex (a vertex which is no a leaf) has a label from V. iv. If a vertex has label V, and its children are labeled (from left to right) a1 , a2 , …an , then P must contain a production of the form A a1, a2, ... an v. A leaf labeled l has no siblings, that is, a vertex with a child labeled l can have no other children. Sentential Form For a given CFG with productions S aA, A aB, B bB, B a. The derivation tree is as shown below: Right Most/Left Most/Mixed Derivation Consider the grammar G with production 1. S aSS 2. S b Left most Derivation: S aSS aaSSS aabSS aabaSSS aababSS aababbS aababbb The sequence followed is “1121222” Mixed Derivation: S aSS aSb aaSSb aabSb aabaSSb aabaSbb aababbb The sequence followed is “1212122” Right most Derivation: S aSS aSb aaSSb aaSaSSb aaSaSbb aaSabbb aababbb The sequence followed is “1211222” A grammar G is context-free and has the productions: S aAB, A Bba, B bB, B c (i). Derive the word acbabc (ii). Obtain the derivation tree. Solution: (i). The word w = acbabc is derived as follows: S aAB a(Bba)B acbaB acba(bB)acbabc. B c c A CFG given by productions is S a, S aAS, A bS Obtain the derivation tree of the word w = abaabaa. Given a CFG given by G = (N, T, P, S) with N = {S}, T = {a, b}, P ={S aSb, S ab} Obtain the derivation tree and the language generated L(G). Given G = (N, T, P, S) with N = {E}, S = E, T = {id, +, *, c} with the productions: E E + E, E E* E, E E, E id Obtain the derivation tree. Given a CFG G = (N, T, P, S) with N = {S, A}, T = {a, b} and the productions: S aS, S aA, A bA, A b Obtain the derivation tree and L(G). a Question: Sketch the derivation tree for the CFG given by S aA, A aB, B bB, B a. Solution: Given a grammar G with production rules S aB, S bA, A aS, A bAA, A a, B bS, B aBB, B b Obtain the (i) leftmost derivation, and (ii) rightmost derivation for the string “aaabbabbba”. Solution (i) Leftmost derivation: S aB aaBB aaaBBB aaabBB aaabbB aaabbabB aaabbabbB aaabbabbbS aaabbabbba (ii) Rightmost derivation: S aB aaBB aaBbS aaBbbA aaaBBbba aaabBbba aaabbSbba aaabbaBbba aaabbabbba Example: Let G = (V, , P, S) be a CFG in the form: G ({S},{a, b},{S , S aSb}, S ) i...Show.that.L(G ) {a b | n 0} n n ii..Draw.the.derivation.tree. for.aabb i. S aSb aaSbb aabb S aSb aaSbb aaaSbbb aaabbb S aSb aaSbb aaaSbbb aaaaSbbbb aaaabbbb Thus, L(G ) {a b | n 0} n [See slide #5] n 27 ii. Derivation tree for aabb is: S S a a b b 28 G ({S , A, B},{a , b}, {S AB, A aA | , B Bb | }, S) L(G ) L( a * b*) Leftmost Derivation : S AB aAB aB aBb ab Rightmost Derivation : S AB ABb Ab aAb ab 29 Derivation Tree S A B A a ) B b 30 More Examples of CFGs and CFLs ) 31 S aSa | aBa B bB | b L( S ) {a b a : m 0} n m n L( S ) {a b a : n, m 0} m m ) m 32 S aSa | B B bB | L( S ) {a b a | n 0 m 0} n m n S abSc | L( S ) {( ab) c | n 0} n ) n 33 S AB A aA | a B bB | S aS | aB B bB | L( S ) {a b | m 0, n 0} * L( S ) L( a b ) n m ) 34 S aS | B S AbAbA A aA | B bA A aA | bC C aC | L( S ) {a * ba * ba* | a, b 0} ) 35 S S | aO | bO | aaE | abE O aE | bE | baE | bbE L( S ) {w {a, b}* | length ( w) is EVEN } S | aE | bO O aO | bE L( S ) {w {a, b}* | w has EVEN number of b' s} ) 36 Example: Given the grammar G = (V, T, P,E) with the following productions: A AbA AB B aBa Bb Derive the string aabaababa. Solution: A AbA BbA aBabA aaBaabA aabaabA aabaabB aabaabaBa aabaababa Consider the grammar G = (V, T, P,E) where V = {E,N}, T = {+,*,(,), 0,1},and P contains the following productions: E E + E | E * E | (E) | N N 0N |1N | 0 | 1 All the following words are in the language L(G): 0 0 * 1 + 111 (1 + 1) * 0 (1 * 1) + (((0000)) * 1111) For instance, (1 + 1) * 0 is derived by E E * E (E) * E (E + E) * E (N + N) * N (1 + 1) * 0: The derivation tree for the grammar is: Leftmost derivation: E E + E N + E 0N + E 01 + E 01 + (E) 01 + (E * E) 01 + (N * E) 01 + (1 * E) 01 + (1 * N) 01 + (1 * 0) Rightmost derivation: E E + E E + (E) E + (E * E) E + (E * N) E + (E * 0) E + (N * 0) E + (1 * 0) N + (1 * 0) 0N + (1 * 0) 01 + (1 * 0) • Leftmost derivation uses the depth first traversal of the tree from left to right encounters them. • Rightmost derivation corresponds to the depth first traversal from right to left. Ambiguity in Context-free Grammars (CFGs) and Context-free Languages (CFLs) 41 • A context-free grammar G is called ambiguous if some word has more than one leftmost derivation (equivalently: more than one derivation tree). • Otherwise the grammar is unambiguous. E.g. the word 1+0+1 has the following two leftmost derivations • EE+EE+E+E1+E+E 1 + 0 + E 1 + 0 + 1 and • EE+E1+E1+E+E1+0+E 1+0+1 These correspond to different derivation trees; thus the CFG for the word 1+0+1 is ambiguous. Ambiguity in CFGs Example: S ==> AS | A ==> A1 | 0A1 | 01 Input string: 00111 • Can be derived in two ways Leftmost derivation #1: S => AS => 0A1S =>0A11S => 00111S => 00111 Leftmost derivation #2: S => AS => A1S => 0A11S => 00111S => 00111 44 • The grammar G1 = ({S}, {a, b}. P1, S) where P1 contains the productions S aSb | aaS | έ is ambiguous because the word aaab has two different leftmost derivations: S aaS aaaSb aaab and S aSb aaaSb aaab: • The language {a2k+nbn | k, n >=0} it generates is not inherently ambiguous because it is generated by the equivalent unambiguous grammar ({S,A}, {a, b}, P11, S) with productions S aSb | A, A aaA | έ Note: έ and λ are used synonymously. Why does ambiguity matter? Given E ==> E + E | E * E | (E) | a | b | c | 0 | 1 Derive the string: = a * b + c LM derivation #1: E => E + E => (E)+E => (E * E) + E => (a * b) + c E E * a E + E (a*b)+c c E b E LM derivation #2 E => E * E => a * E =>a*(E) => a * (E + E) => a * (b + c) E a The calculated value depends on which of the two parse trees is actually used. E * E b + a*(b+c) E c The Values are different !!! Removing Ambiguity in Expression Evaluations • It may be possible to remove ambiguity for some CFLs – E.g. in a CFG for expression evaluation by imposing rules & restrictions such as precedence – This would imply a re-write of the grammar Order of Precedence: (), * , + Ambiguous version: E ==> E + E | E * E | (E) | a | b | c | 0 | 1 Modified/unambiguous version E => E + T | T T => T * F | F F => I | (E) I => a | b | c | 0 | 1 Inherently Ambiguous CFLs • However, for some languages, it may not be possible to remove ambiguity • A CFL is said to be inherently ambiguous if every CFG that describes it is ambiguous Example: L = { anbncmdm | n,m≥1} U {anbmcmdn | n,m≥1} L is inherently ambiguous This can be proved using the input string: anbncndn [The proof is beyond the scope of this course; it will be done 48 in Theory of Computing (in Level 400)] Converting from Grammars to Finite Automata Convert the following Grammar to Finite Automata S A B F Solution: a S c -> -> -> -> aA | cF bB | bA λ λ b A b B F 50 Convert the following Grammars to Finite Automata S A B F -> -> -> -> S A B F Z aA | cF bB | bA λ λ Right-Linear Grammar Solution: b a S c A -> -> -> -> -> λ Sa | Ab Ab Sc B | F Left-Linear Grammar b B F 51 Converting from Finite Automata to Grammars Note: λ and ε are used interchangeably as non-input symbols. i.e. A aA | bC | aW C cC | ε W cX Xε