Simplifying Context-Free Grammars.

advertisement
Lec 7 : Computation Theory
Simplification of CFG
1
Simplifying Context-Free Grammars.
We found already in the last section on CFGs and Parsing, that
parsing can be pretty time- and space-consuming. In order to construct
good and efficient parsing algorithms, we consider now transformations
of grammars, essentially simplifications of the form of productions,
which lead to certain normal forms for grammars. These normal forms
are better to deal with, and they allow more efficient parsing algorithms.
Methods for Transforming Grammars .
 Substitution Rule
 Removing useless productions
 Removing λ-productions
 Removing unit productions
1-Substitution Rule
If a grammar contains a production
A → x1 B x2 with A, B  V and A ≠ B
and
B → y1 | y2 | ... | yn (alternatives)
then we can construct a new grammar G' with a modified set of
productions P', in which we replace the rules for A and B above with one
set of rules:
A → x1 y1 x2 | x1 y2 x2 | ... | x1 yn x2
G and G' are equivalent, i.e. the generate the same language.
Lec 7 : Computation Theory
Simplification of CFG
2
2-Removing useless productions
Definition Useful Variable
A variable A is called useful if there is at least one wL(G) such that
S*xAy * w
with x, y  (V  T)*
Otherwise, it is called useless.
A production is useless, if it involves a useless variable.
Note that the concept of 'useless variable' includes the case that the
variable does not lead to a terminal string, and the case that the variable
cannot be reached from the start symbol. The elimination of useless
variables and productions from a grammar (or the selection of those
variables and productions, which are useful) proceeds in two phases:
A. Determine and select those variables, which can lead to a terminal
string, and subsequently determine and select the related productions.
B. Determine and select those variables, which can be reached from the
start symbol, and select related productions.
Algorithm - find useful variables and productions leading to wT*
1. Set V1 = ∅
2. Repeat until closure:
For every AV with a production
A → x1 x2 ... xn and xi  (V1  T)
add A to V1.
3. Set P1 as the subset of productions from P, which contain only
symbols from (V1  T).
Lec 7 : Computation Theory
Simplification of CFG
3
4. This defines a new grammar G1 = (V1, T, S, P1)
Algorithm - eliminate unreachable variables and related productions
Draw a 'Dependency Graph' for G1, above. The dependency graph
is a graph with vertices labelled with variables and arcs going from vertex
A to a vertex B, if the grammar contains a production
A → x1 B x2 .
Eliminate from G1 all variables and related productions, which cannot be
reached from the start symbol. This gives the new grammar G', which
does not contain useless variables or productions.
3-Removing -productions
A production A → 
is called a -production.
Definition Nullable Variable
A variable A is called nullable, if there is a derivation
A * 
If G is a CFG and λ∉L(G), then there exists an equivalent grammar G'
without λ-productions.
Proof by construction
This is also an algorithm to eliminate λ-productions from a given
grammar.
First, we find all nullable variables of G, and put them into the set VN.
1. VN = ∅
2. For all λ-production A → λ , add A to VN.
3. Repeat until closure:
For all productions
Lec 7 : Computation Theory
Simplification of CFG
4
B → A1 A2 ... An and Ai  VN for i = 1,...,n
add B to VN.
4. Determine P' based on the productions from P in the following
way:
For each production
A → x1 x2 ... xm , m≥1, and xi (VT)
we add to P' this production, as well as all productions, which we
can construct from this production by substituting nullable
variables xi with λ, in all possible combinations, which is
equivalent to removing one or the other or all of those variables,
i.e. for two nullable variable on the RHS, we remove one, remove
the other, remove both, remove none. This leads to a bunch of
productions, which we collect as P'. λ-productions A → λ will
not be included in P'.
This defines an equivalent grammar G' = (V, T, S, P'), i.e. L(G)=L(G'),
without λ-productions.
4-Removing Unit Productions
Definition Unit-Production
A production of the form
A → B
with A, BV
is called a
unit-production.
For any CFG G without λ-production, we can construct an equivalent
grammar G' without unit-productions.
1. We set up the dependency graph - for unit-productions only - and
determine all dependencies
Lec 7 : Computation Theory
Simplification of CFG
5
2. Then, we determine a subset P' of P, which contains all non-unit
productions of P.
3. This subset P' we extend in the following way:
For all A, B with B dependent on A as above, and productions for A in
P':
A → y1 | y2 |... | yn
we add new productions to P':
B → y1 | y2 |... | yn
This way, you eliminate unit-productions by substituting the rightside of a unit-production - even if it goes over several derivation steps
A  ...  B - with all possible options for A's productions from
productions in P', i.e. not unit-productions.
You can imagine that this is the same as if you "inherit" the proper
productions of A down to B. This does not change the language since B
could have been generated by A without any additional terminals or
variables around it (because A  ...  B), and thus it is legitimate to
consider B in this sense as equivalent to A.
Download