Increasing power of LL(k) parsers

advertisement
Increasing power of LL(k) parsers
Bc. Jozef Lang (xlangj01)
Bc. Zoltán Zemko (xzemko01)
Outline
 LL(1) parsers
 Why increasing the power of LL(k) parsers?
 LL(k) parsers
 Linear approximate LL(k) parsing
 LL-regular parsing
 Parse tree grammars
 Extended LL(1) grammars
 Conclusion
LL(1) parsing
 Deterministic top-down parsing
 Prediction is made only of one symbol, thus LL(1) is an 1
look-ahead parsing
 The starting terminal symbol of every non-terminal symbol
is needed when a parse table is constructed
 FIRSTk set -- set of terminals that are at the first k positions
of strings that non-terminal can be derived to
 FOLLOW set – set of all terminal symbols that can follow
non-terminal symbol in any sequential form derivable from
S#
LL(1) versus strong-LL(1)
 If all entries of the parse table have at most one element then
the grammar is called strong-LL(1)
 Every LL(1) grammar is also a strong-LL(1) grammar
 When a parse table entry contains more than one entry then
it is a LL(1) conflict
 A parser with LL(1) conflict is not deterministic and thus is
less efficient
LL(1) versus strong-LL(1) (2)
 LL(1) conflicts can be solved by
 Left recursion elimination
 Left factoring
 Conflict resolvers
Why increasing the power of LL(k)
parsers?
 Let’s have a following grammar where idf produces
identifiers:
 This fragment defines expression elements like x, sin(0.41),
T[2,3]
 First token is common for all expressions, but the second
token distinguish between alternatives
 Look ahead of only one token is not enough
 It would be handy to increase power of deterministic LL
parsing.
LL(k) parsers
 It is sometimes handy to look ahead of k symbols, where
k>1
 Need to define FIRSTk sets
 Let’s have sequential form x. FIRSTk(x) is a set of terminals
where:
where y is some sequential form
LL(k) parsers (2)
 Assume that we have following rule in a grammar G
 Grammar G is a LL(k) grammar iff the sets FIRSTk(a1x#k)
… FIRSTk(akx#k) are pairwise disjoint.
 Symbol #k represents the number of look-ahead symbols
 It is obvious that every LL(k) grammar is a subset of LL(k+1)
grammars, this does not hold vice versa.
LL(k) parsers (3)
 Similarly as by LL(1) parsers, producing parse tables for




LL(k) parsers is difficult
FOLLOW set for a LL(k) grammar is defined as an union of
FIRSTk(x#k) for any prediction Ax#k
As by LL(1) parsers, the parse table will be indexed by with a
pair consisting of a non-terminal symbol and string of
terminals with the length equal to k
If a parse table has for every entry at most one element then
the grammar denoted by this parse table is strong-LL(k)
For k > 1 there are grammars that are LL(k) but not strongLL(k)
LL(k) parsers (4)
 Strong-LL(k) parsers are only seldom used in practice
 Similar effect can be obtained by using conflict resolvers
Linear-approximate LL(k) parsing
 Difficult constructing of LL(k) parse tables can be avoided by





a simple trick
In addition to FIRST set, introduce SECOND, THIRD etc set
The size complexity is reduced from O(tk) to k tables of
O(t), where k is the number of sets
Linear-approximate LL(k) grammar is weaker than LL(k)
grammar because it breaks the relationship between tokens
Let’s assume that we have LL(2) grammar that has look ahead
sets of { ab, cd }{ad, cb }
Linear-approximate LL(2) grammar has FIRST set { ac }
and SECOND {bd} – there are not disjoint
LL -regular parsing
 LL(k) provides bounded look-ahead
 There are grammars where a discriminating token can be
arbitrarily far away
 Unbounded look-ahead is needed
 Unbounded look-ahead forms its own context-free grammar
 Context-free grammar can be approximated by regular
grammar
 There is no algorithm to approximate context-free grammar,
but there are several heuristics
Parse tree grammar from LL(1)
 A straightforward process
 Basic idea is to create new rule for every prediction
 The non-terminals are numbered by an increasing global
counter
 Then are inserted into prediction stack
 New created rules forms parse tree grammar
 As far as the parser is deterministic, the parse tree grammar
is obtained instead of parse forest grammar
Parse tree grammar from LL(1) (2)
Extended LL(1) grammars
 Some parsers accept Extended LL(1) grammars instead of
ordinary one
 To accept Extended LL(1) grammar parser must transform it
to ordinary one without introducing LL(1) conflicts
 An advantage of Extended LL(1) grammars is that they allow
a more efficient implementation in recursive descent parsers
Conclusion
 LL(1) is very intuitive, makes its steps according to
prediction of one token
 There are situations where look-ahead only of one symbol is
not sufficient
 The power of LL parsers can be improved by extending the
bounding look-ahead to
 a bounded length resulting in LL(k) parsing
 a unbounded length resulting in LL – regular parsing
 Linear-approximate LL(2) parsing is a convenient and
simplified form of a LL(2) parsing
Thank you for your attention
Download