Chomsky Normal Form of CFG’s Definition Purpose Method of Constuction 1 Chomsky Normal Form: Purpose A construct used to establish properties of context-free languages (CFLs) Every CFL without e can be generated by a CFG in Chomsky normal form. To show that language without e is a CFL it is sufficient to show that it has a CFG in Chomsky normal form. Typical approach to closure properites 2 Chomsky Normal Form: Definition A context free grammar (CFG) in which all production are of the form A->BC or A->a, where A, B and C are variables and a is a terminal 3 Chomsky Normal Form: method of construction Eliminate “useless: symbols Variables or terminals that do not appear in any derivation of a terminal string from the start symbol Eliminate e-productions A->e Eliminate unit-productions A->B for variables A and B 4 Chomsky Normal Form: method of construction - 2 For each elimination task, a method will be defined reclusively by an inductive proof. Order in which tasks are preformed is important 5 Generating and Reachable Symbols X is generating if X =>* w (terminal string) If X is a terminal, then it can generate itself in zero steps. X is reachable if S =>* aXb for some a and b, (S is a start symbol) Any symbol that is not generating and reachable is useless 6 Induction to find generating variables Basis: If there is a production A -> w, where w is a terminal string, then A is generating. Induction: If there is a production A -> a, where a consists only of terminals and variables known to derive a terminal string, then A derives a terminal string; hence is generating. 7 Algorithm to eliminate nongenerating variables 1. Discover all variables that derive terminal strings. 2. For all other variables, remove all productions in which they appear either on the LHS or RHS of ->. 8 Example: finding generating variables S->AB|C, A->aA|a, B->bB, C->c Basis: A and C are generating due to productions A->a and C->c. Induction: S is generating due to production S->C. Eliminate B->bB and S->AB Result: S->C, A->aA|a, C->c Still have unreachable variables 9 Finding reachable symbols Basis: Obviously, start symbol is reachable. Induction: if we can reach A, and there is a production A->a, then we can reach all symbols of a. In result from previous slide S->C, A->aA|a, C->c Only S and C are reachable 10 Epsilon Productions Theorem: If L is a CFL with no empty string, then it has a CFG which can be put in Chomsky form with no e-productions. A->e is clearly an e-production To eliminate all types e-productions, we must first discover the nullable variables, i.e. variables A such that A =>* ε. 11 Inductive definition of nullable symbols Basis: If there is a production A -> ε, then A is nullable. Induction: If there is a production A -> a, and all symbols of a are nullable, then A is nullable. 12 Example: Nullable Symbols S->AB, A->aA|ε, B->bB|A A is nullable because of A -> ε. B is nullable because of B -> A. S is nullable because of S -> AB. 13 Algorithm to eliminate e-productions Identify all nullable symbols. Consider each production A->X1…Xn that contains nullable symbols Suppose A->X1…Xn contains m<n nullable symbols Construct a family of productions with 2m members that are all combinations of nullable symbols present or absent If m=n exclude case with all symbols absent 14 Eliminating e-productions The new CFG with no e-productions consist of all families of productions derived from productions with nullable symbols Plus all productions from the original CFG that did not contain nullable symbols 15 Example: Eliminating ε-Productions S->ABC, A->aA|ε, B->bB|ε, C->ε A, B, C, and S are all nullable. Productions S->ABC|AB|AC|BC|A|B|C come from S->ABC Productions A->aA|a come from A->aA Productions B->bB|b come from B->bB 16 Eliminating ε-Productions continued S->ABC, A->aA|ε, B->bB|ε, C->ε No contribution to CNF from original CFG C is not generating Eliminate C in productions of the new CFG S -> ABC | AB | AC | BC | A | B | C A -> aA | a B -> bB | b 17 Define Unit Productions A unit production is a production whose right side consists of exactly one variable. A->a is not a unit production if a is terminal Eliminate by expansion is most common approach 18 Eliminate by expansion In the CFG defined by E->T|E+T T->F|T*F F->I|(E) I->a|Ia E->T eliminated by E->F|T*F|E+T E->F eliminated by E->I|(E)|T*F|E+T E->I eliminated by E->a|Ia|(E)|T*F|E+T 19 Eliminate by expansion Will not work on cycles of unit productions A->B B->C C->A Alternative: find all pairs (A,B) such that A=>*B by a sequence of unit productions Works in all cases. 20 Alternative to expansion in eliminating unit productions Basic idea: If A=>*B by a series of unit productions, and B->a is a nonunit-production, then add production A-> a and drop the unit productions. Example 21 Example of basic idea In the CFG defined by E->T|E+T T->F|T*F F->I|(E) I->a|Ia E=>*I by the series of unit productions E->T, T->F, F->I I->a is a non-unit production. Replace by E->a E->a|Ia|(E)|T*F|E+T (same as expansion method) 22 Pair search defined by induction Find all pairs (A,B) such that A=>*B by a sequence of unit productions only. Basis: A=>*A, therefor (A,A). Induction: If we have found (A,B), and B->C is a unit production, then add (A,C) 23 Example of pair search In CFG defined by E->T|E+T T->F|T*F F->I|(E) I->a|Ia Obviously (E,T), (T,F), (F,I) (T,I) and (E,F) also 24 Cleaning up a Grammar Theorem: if L is a CFL, then there is a CFG for L – {ε} that has: 1. No useless symbols. 2. No ε-productions. 3. No unit productions. every right side of a production is either a single terminal or has length > 2. 25 Clean-up continued Proof: Start with a CFG for L. Perform the following steps in order: 1. Eliminate ε-productions. 2. Eliminate unit productions. 3. Eliminate variables that derive no terminal string. 4. Eliminate variables not reached from the start symbol. Must be first. Can create unit productions and useless variables. 26 Chomsky Normal Form A CFG is said to be in Chomsky Normal Form if every production is of one of these two forms: 1. A -> BC (right side is two variables). 2. A -> a (right side is a single terminal). Theorem: If L is a CFL, then L – {ε} has a CFG in CNF. 27 Proof by construction Step 1: “Clean” the grammar, so every production has right side either a single terminal or length >2. Step 2: For each right side a single terminal, make the right side all variables. For each terminal a create new variable Aa and production Aa -> a. (not a unit production) Replace a by Aa in right sides of productions. 28 Example: Step 2 Consider production A -> BcDe. We need variables Ac and Ae. with productions Ac -> c and Ae -> e. Note: you create at most one variable for each terminal, and use it everywhere it is needed. Replace A -> BcDe by A -> BAcDAe. 29 CNF construction: final step Step 3: Break right sides longer than 2 into a chain of productions with right sides of two variables. Example: A -> BCDE is replaced by A -> BF, F -> CG, and G -> DE. F and G must be used nowhere else. 30 Example text p266 S->AB A->aAA|e B->bBB|e 31 Assignment 11, Due 11-19-14 Exercise 7.1.2 text p 275 and 277 32 33