context-free grammars 3

advertisement
Chomsky Normal Form of CFG’s
Definition
Purpose
Method of Constuction
1
Chomsky Normal Form: Purpose
A construct used to establish properties
of context-free languages (CFLs)
Every CFL without e can be generated
by a CFG in Chomsky normal form.
To show that language without e is a
CFL it is sufficient to show that it has a
CFG in Chomsky normal form.
Typical approach to closure properites
2
Chomsky Normal Form:
Definition
A context free grammar (CFG) in which all
production are of the form A->BC or A->a,
where A, B and C are variables and a is a
terminal
3
Chomsky Normal Form: method
of construction
Eliminate “useless: symbols
 Variables or terminals that do not appear in
any derivation of a terminal string from the
start symbol
Eliminate e-productions
 A->e
Eliminate unit-productions
 A->B for variables A and B
4
Chomsky Normal Form: method
of construction - 2
For each elimination task, a method will
be defined reclusively by an inductive
proof.
Order in which tasks are preformed is
important
5
Generating and Reachable Symbols
X is generating if X =>* w (terminal
string)
If X is a terminal, then it can generate
itself in zero steps.
X is reachable if S =>* aXb for some a
and b, (S is a start symbol)
Any symbol that is not generating and
reachable is useless
6
Induction to find generating
variables
Basis: If there is a production A -> w,
where w is a terminal string, then A is
generating.
Induction: If there is a production
A -> a, where a consists only of
terminals and variables known to derive
a terminal string, then A derives a
terminal string; hence is generating.
7
Algorithm to eliminate nongenerating variables
1. Discover all variables that derive
terminal strings.
2. For all other variables, remove all
productions in which they appear
either on the LHS or RHS of ->.
8
Example: finding generating variables
S->AB|C, A->aA|a, B->bB, C->c
 Basis: A and C are generating due to
productions A->a and C->c.
 Induction: S is generating due to
production S->C.
 Eliminate B->bB and S->AB
 Result: S->C, A->aA|a, C->c
 Still have unreachable variables
9
Finding reachable symbols
Basis: Obviously, start symbol is reachable.
Induction: if we can reach A, and there is
a production A->a, then we can reach all
symbols of a.
In result from previous slide
 S->C, A->aA|a, C->c
Only S and C are reachable
10
Epsilon Productions
Theorem: If L is a CFL with no empty
string, then it has a CFG which can be put
in Chomsky form with no e-productions.
A->e is clearly an e-production
To eliminate all types e-productions, we
must first discover the nullable variables,
i.e. variables A such that A =>* ε.
11
Inductive definition of nullable
symbols
Basis: If there is a production A -> ε,
then A is nullable.
Induction: If there is a production
A -> a, and all symbols of a are
nullable, then A is nullable.
12
Example: Nullable Symbols
S->AB, A->aA|ε, B->bB|A
A is nullable because of A -> ε.
B is nullable because of B -> A.
S is nullable because of S -> AB.
13
Algorithm to eliminate e-productions
Identify all nullable symbols.
Consider each production A->X1…Xn that contains
nullable symbols
Suppose A->X1…Xn contains m<n nullable
symbols
Construct a family of productions with 2m
members that are all combinations of nullable
symbols present or absent
If m=n exclude case with all symbols absent
14
Eliminating e-productions
The new CFG with no e-productions
consist of all families of productions
derived from productions with nullable
symbols
Plus all productions from the original
CFG that did not contain nullable
symbols
15
Example: Eliminating ε-Productions
S->ABC, A->aA|ε, B->bB|ε, C->ε
A, B, C, and S are all nullable.
Productions S->ABC|AB|AC|BC|A|B|C
come from S->ABC
Productions A->aA|a come from A->aA
Productions B->bB|b come from B->bB
16
Eliminating ε-Productions continued
S->ABC, A->aA|ε, B->bB|ε, C->ε
No contribution to CNF from original CFG
C is not generating
Eliminate C in productions of the new CFG
S -> ABC | AB | AC | BC | A | B | C
A -> aA | a
B -> bB | b
17
Define Unit Productions
A unit production is a production whose
right side consists of exactly one variable.
A->a is not a unit production if a is
terminal
Eliminate by expansion is most common
approach
18
Eliminate by expansion
In the CFG defined by
 E->T|E+T
 T->F|T*F
 F->I|(E)
 I->a|Ia
E->T eliminated by E->F|T*F|E+T
E->F eliminated by E->I|(E)|T*F|E+T
E->I eliminated by E->a|Ia|(E)|T*F|E+T
19
Eliminate by expansion
Will not work on cycles of unit productions
 A->B
 B->C
 C->A
Alternative: find all pairs (A,B) such that
A=>*B by a sequence of unit productions
 Works in all cases.
20
Alternative to expansion in
eliminating unit productions
Basic idea: If A=>*B by a series of
unit productions, and B->a is a nonunit-production, then add production A->
a and drop the unit productions.
Example
21
Example of basic idea
In the CFG defined by




E->T|E+T
T->F|T*F
F->I|(E)
I->a|Ia
E=>*I by the series of unit productions
E->T, T->F, F->I
I->a is a non-unit production.
Replace by E->a
E->a|Ia|(E)|T*F|E+T (same as
expansion method)
22
Pair search defined by induction
Find all pairs (A,B) such that A=>*B by a
sequence of unit productions only.
Basis: A=>*A, therefor (A,A).
Induction: If we have found (A,B), and
B->C is a unit production, then add (A,C)
23
Example of pair search
In CFG defined by
 E->T|E+T
 T->F|T*F
 F->I|(E)
 I->a|Ia
Obviously (E,T), (T,F), (F,I)
(T,I) and (E,F) also
24
Cleaning up a Grammar
 Theorem: if L is a CFL, then there is a
CFG for L – {ε} that has:
1. No useless symbols.
2. No ε-productions.
3. No unit productions.
 every right side of a production is either
a single terminal or has length > 2.
25
Clean-up continued
 Proof: Start with a CFG for L.
 Perform the following steps in order:
1. Eliminate ε-productions.
2. Eliminate unit productions.
3. Eliminate variables that derive no
terminal string.
4. Eliminate variables not reached from the
start symbol.
Must be first. Can create
unit productions and useless
variables.
26
Chomsky Normal Form
 A CFG is said to be in Chomsky
Normal Form if every production is of
one of these two forms:
1. A -> BC (right side is two variables).
2. A -> a (right side is a single terminal).
 Theorem: If L is a CFL, then L – {ε}
has a CFG in CNF.
27
Proof by construction
Step 1: “Clean” the grammar, so every
production has right side either a single
terminal or length >2.
Step 2: For each right side  a single
terminal, make the right side all variables.
 For each terminal a create new variable Aa and
production Aa -> a. (not a unit production)
 Replace a by Aa in right sides of productions.
28
Example: Step 2
Consider production A -> BcDe.
We need variables Ac and Ae. with
productions Ac -> c and Ae -> e.
 Note: you create at most one variable for
each terminal, and use it everywhere it is
needed.
Replace A -> BcDe by A -> BAcDAe.
29
CNF construction: final step
Step 3: Break right sides longer than 2
into a chain of productions with right
sides of two variables.
Example: A -> BCDE is replaced by
A -> BF, F -> CG, and G -> DE.
 F and G must be used nowhere else.
30
Example text p266
S->AB
A->aAA|e
B->bBB|e
31
Assignment 11, Due 11-19-14
Exercise 7.1.2 text p 275 and 277
32
33
Download