Simplification of Grammars

advertisement
Simplification of Grammars
Lecture 17
Naveen Z Quazilbash
Overview
 Attendance
 Motivation
 Simplification of Grammars
 Eliminating useless variables
 Eliminating null productions
 Eliminating unit productions
 Quiz result
Motivation for grammar simplification
 Parsing Problem
 Given a CFG G and string w, determine if
w ϵ L(G).
Fundamental problem in compiler design and natural language processing
 If G is in general form then the procedure maybe very
inefficient. So the grammar is “transformed” into a simpler form
to make the parsing problem easier.
Simplification of Grammars
 It involves the removal of:
1.
2.
3.
Useless variables
ε-productions
Unit productions
 Useless variables:
There are two types of useless variables:
1.
2.
Variables that cannot be reached
Variables that do not derive any strings
3. ε-productions
E.g.: Aε
• Note that if we remove these productions, the language no longer
includes the empty string.
4.
Unit productions:
They are of the form
AB
Or
AA
1) Unreachable Variables
 E.g.:
SBS|B|E
ADA|D|S
BCB|C
CaC|a
DbD|b
EcE|c
 To find unreachable variables, draw a dependency graph
 Dependency Graph:
 Vertices of the graph are variables
 The graph doesn’t include alphabet symbols, such as “a” or “b”
 If there is a production A…..B…, i.e., the left side is A and
the right side includes B, then there is an edge AB
 A variable is reachable if there is a path from S to this variable
 S itself is always reachable
 After identifying unreachable variables, remove all
productions with unreachable left side.
SBS|B|E
ADA|D|S
BCB|C
CaC|a
DbD|b
EcE|c
B
C
E
A
S
 Drawing its dependency graph:
 Reachable: S, B, C, E
D
 Grammar without unreachable variables:
SBS|B|E
BCB|C
CaC|a
EcE|c
 Ex: Determine its language!!
2) Variables that don’t terminate
 A variable A terminates if either:
 There is a production A…. with no variables on the right,
e.g. Aaabc,
OR
 There is a production A… where all variables on the right
terminate; e.g. AaBbaC, where B and C terminate.
 Note: to find all variables that terminate, keep looking for
such productions until you cannot find any new ones.
TASK
Example:
SA|BC|DE
AaA|bA
BbB|b
CEF
DdD|BD|BA
EaE|a
FcFc|c
 Remove all productions that include a variable that doesn’t
terminate.
 Note: We remove a production if it has such a variable on either
side.
Solution
x
X
x
x
X
x
x
SA|BC|DE
AaA|bA
BbB|b
CEF
DdD|BD|BA
EaE|a
FcFc|c
 SBC
 BbB|b
 CEF
 EaE|a
 FcFc|c
 Ex: Determine its language.
3) Eliminating ε-Productions
 Nullable variables:
A variable is nullable if either:
 There is a production A ε, or
 There is a production AB1B2…Bn(only variables, no
symbols), where all variables on the right side are nullable.
 Note: to find all nullable variables, keep looking for such
productions, until you cannot find any new ones.
TASK
SSAB|SBC|BC
AaA|a
BbB|bC|C
CcC| ε
 First we find variables that can lead to the empty string:
C=> ε
B=>C=> ε
S=>BC=>B=>C=> ε
x
x
x
SSAB|SBC|BC
AaA|a
BbB|bC|C
CcC| ε
 Thus, S, B, and C can lead to ε; they are called nullable
variables
 For each production that has nullable variables, consider all
possible ways to skip some of these variables and add the
corresponding productions.
 E.g. WaWXaYZb, suppose that X,Y and Z are nullable;
then there are 8 ways to skip some of them.
 WaWab|aWXab|aWaYb|aWaZb|aWXaYb|aWXaZb|
aWaYZb|aWXaYZb
 Back to our grammar where S,B and C are nullable:
SA|AB|SA|SAB|S|B|C|SB|BC|SBC
AaA|a
Bb|bB|bC|C
Cc|cC|ε
 Now, we can remove the ε- productions without changing
the language.
 The only possible change is losing the empty string, if it is in
the original language.
 So our grammar without null productions becomes:
SA|AB|SA|SAB|S|B|C|SB|BC|SBC
AaA|a
Bb|bB|bC|C
Cc|cC
4) Eliminating Unit Productions
SAa|B
Aa|bc|B
BA|bb|C|cC
Ca|C
 First, for every variable, we find all single variables that can
be reached from it:
 For S: S=>B=>A, S=>B=>C
 For A: A=>B=>C
 For B: B=>A, B=>C
 For C: NONE (C itself doesn’t count)
 For finding reachable single variables, what should we do?
 Use Dependency Graph!
 Drawing Dependency Graph:
 Vertices of the graph are variables.
 If there is a unit production AB, then there is an edge AB.
 A single variable is reachable from A if there is a pth from A to
B.
 Dependency Graph:
B
S
C
A
 To construct an equivalent grammar without unit
productions:
 Remove all unit productions
 For each pair A=>*B, where B is a single variable reachable
from A, consider all productions Bp1|p2|…|pn; and add the
corresponding productions A p1|p2|…|pn.
 for example, since A=>*B and Bbb|cC, add the productions
Abb|cC
SAa|B
Aa|bc|B
BA|bb|C|cC
Ca|C
SAa
Bbb|cC
Aa|bc
Ca
Old non-unit
productions
Sbb|cC|a|bc|a
Ba|bc|a
Abb|cC|a
Ca
new
productions
 Note that the variable B has become useless and we need to
remove it!
Summary
 Main steps of simplifying a grammar:
 Remove useless variables, which cannot be reached or do not
terminate.
 Remove ε- productions.
 Remove unit productions.
 Remove useless variables again!
Download