Context free languages 1. Equivalence of context free grammars 2. Normal forms Context-free grammars In a context free grammar, all productions are of the form A -> w, where A is a nonterminal or the start symbol S, and w is a string from (N T)* Handles, recursive productions In the production A -> xw, prefix x, if a single symbol, is called the handle of the production, whether x is in N or T A production A -> Aw is called left-recursive The production A -> wA is called right recursive Repeated sentential forms In a derivation, the sentential form wAx S-> … -> wAx -> … -> wAx -> … is called a repeated sentential form. All the intervening steps are wasted steps. Leftmost derivations Minimal leftmost derivations A derivation is a leftmost derivation if at each step only the leftmost nonterminal symbol is replaced using some rule of the grammar. A leftmost derivation is called minimal if no sentential form is repeated in the derivation Weak equivalence Two context-free grammars G1 and G2 are called weakly equivalent if L(G1) = L(G2) Example of weak equivalence G1: S -> S01; S -> 1 L(G1) = { 1(01)* } G2: S -> S0S; S -> 1 L(G2) = { 1(01)* } Strong equivalence Two CFGs G1 and G2 are called strongly equivalent if they are weakly equivalent, and for each string w of terminals in L(G1) = L(G2), and the minimal left-most derivations of w in G1 and the minimal left-most derivations of w in G2 are exactly the same in number, and so can be put into one-to-one correspondence. Strong equivalence Thus G1 and G2 must both be unambiguous, or must both be ambiguous in exactly the same number of ways, for each string w in T* Weakly equivalent but not strongly equivalent G1: Grammar of expressions S: S -> T | S + T; T -> F | T * F; F -> a | ( S ); G2: Grammar of expressions S: S -> E; E -> E + E | E * E | (E) | a; L(G1) = L(G2) = valid expressions using a, +, *, (, and ). G1 has operator precedence. Example: Strong equivalence G1: S->A; A->1B; A->1; B->0A L(G1) = { (10)*1 } G2: S->B; B->A1; B->1; A->B0 L(G2) = { 1(01)* } Elementary transformations of context free grammars substitution expansion removal of useless productions removal of non-generative productions removal of left recursive productions Substitution If G has the A-rule, A->uBv, and all the B-rules are: B->w1, B->w2, . . . , B->wk, then 1. Remove the A-rule A->uBv 2. Add the A-rules: A->uw1v, A->uw2v, . . . , A->uwkv 3. Keep all the other rules of G, including the B-rules Example of substitution G1: S->H; H->TT; T->S; T->aSb; T->c G2: S->H; H->ST; H->aSbT; H->cT; T->S; T->aSb; T->c; Strong equivalence after substitution The grammar G, and the grammar G’ obtained by substitution of B into the Arule, are strongly equivalent if steps 2 and 3 do not introduce duplicate rules. Expansion If a grammar has the A-rule, A->uv Remove this A-rule, and replace it with the two rules A->Xv; X->u; or with A->uY; Y->v where X (or Y) is a new non-terminal symbol of the grammar. Strong equivalence after expansion If G is context free, and G’ is obtained from G by expansion, then G and G’ are strongly equivalent. Useful production A production A->w of a cfg G is useful if there is a string x from T* such that S-> . . -> uAv -> uwv -> . . -> x Otherwise the production, A->w is useless Thus, a production that is never used to derive a string of terminals is useless Removing useless productions T-marking S-marking Productions that are both T-marked and S-marked are useful. All other productions can be removed. T-marking Construct a sequence P0, P1, P2, . . . , of subsets of P, and a sequence N0, N1, N2, . . . of subsets of N as follows: P0 = empty, N0 = empty, j = 0 P[j+1] = { A->w|w in (N[j] + T)* } N[j+1] = { A in N | P[j+1] contains a rule A->w } Continue until P[j] = P[j+1] = P[T] S-marking Construct a sequence Q1, Q2, Q3, . . . of subsets of P[T] as follows: Q1 = {S->w in P[T]} Q[j+1] = Q[j] + {A->w in P[T] | Q[j] contains a rule B->uAv } Continue until Q[j] = Q[j+1] = P[S] P[S] are now the useful productions. Example: T/S-marking Rule T mark 1. S->H 2 2. H->AB 3. H->aH 2 4. H->a 1 5. B->Hb 2 6. C->aC Thus only 1,3,4 are useful S mark 1 2 2 Strong equivalence after removal of useless productions If grammar G’ is obtained from grammar G after removal of useless productions of grammar G, then G and G’ are strongly equivalent. Removing non-generative productions Removing left-recursive rules Let all the X-rules of grammar G be: X->u1 | u2 | . . . | uk X->Xw1 | Xw2 | . . . | Xwh Then these rules may be replaced by the following: X->u1 | u2 | . . . | uk X->u1Z | u2Z | . . . | ukZ Z->w1 | w2 | . . . | wh Z->w1Z | w2Z | . . . | whZ where Z is a new non-terminal symbol Example: Removing leftrecursive rules S->E; E->T | aT | bT; E->EaT | EbT; T->F; T->TcF | TdF; F->n | xEy S->E; E->T | aT | bT; E->TG | aTG | bTG; G->aT | bT; G->aTG | bTG; T->F; T->FH; H->cF | dF; H->cFH | dFH; F->n | xEy Strong equivalence after removal of left-recursive rules If grammar G’ is obtained from grammar G by replacing the leftrecursive rules of G by right recursive rules to get G’, then G and G’ are strongly equivalent. Well-formed grammars A context free grammar G=(N,T,P,S) is well-formed if each production has one of the forms: S-> S->A A->w where A N and w (N+T)* - N and each production is useful. Example of well-formed grammars Parenthesis grammar S->A; A->AA; A->(A); A->(); Chomsky Normal form A context free grammar G=(N,T,P,S) is in normal form (Chomsky normal form) if each production has one of the forms: S-> S->A A->BC A->a where A,B,C N and a T. Example of Chomsky normal form grammar Parenthesis S->A; A->AA; A->(A); A->(); grammar S->A; A->AA; A->BC; B-> (; C->AD; D->); A->BD; Chomsky Normal Form Theorem From any context free grammar, one can construct a strongly equivalent grammar in Chomsky normal form. Greibach normal form (standard form) A context free grammar G=(N,T,P,S) is in standard form (Greibach normal form) if each production has one of the forms: S-> S->A A->aw where A N, a T, and w (N+T)*. Example: converting to Greibach standard form First remove S->E; E->T; E->EaT; T->n; T->xEy; left-recursive rules: S->E; E->T; E->TF; F->aT; F->aTF; T->n; T->xEy; Converting to Greibach: then substitute to get nonterminal handles S->E; E->T; E->TF; F->aT; F->aTF; T->n; T->xEy; S->E; E->n | xEy; E->nF | xEyF; F->aT; F->aTF; T->n; T->xEy; Standard Form Theorem From any context free grammar, one can construct a strongly equivalent grammar in standard form (Greibach normal form). Pumping Lemma for context free languages If L is a context free language, then there exists a positive integer p such that: if w L and |w| > p, then w = xuyvz, with uv and y nonempty and xukyvkz L for all k 0.