Increasing power of LL(k) parsers Bc. Jozef Lang (xlangj01) Bc. Zoltán Zemko (xzemko01) Outline LL(1) parsers Why increasing the power of LL(k) parsers? LL(k) parsers Linear approximate LL(k) parsing LL-regular parsing Parse tree grammars Extended LL(1) grammars Conclusion LL(1) parsing Deterministic top-down parsing Prediction is made only of one symbol, thus LL(1) is an 1 look-ahead parsing The starting terminal symbol of every non-terminal symbol is needed when a parse table is constructed FIRSTk set -- set of terminals that are at the first k positions of strings that non-terminal can be derived to FOLLOW set – set of all terminal symbols that can follow non-terminal symbol in any sequential form derivable from S# LL(1) versus strong-LL(1) If all entries of the parse table have at most one element then the grammar is called strong-LL(1) Every LL(1) grammar is also a strong-LL(1) grammar When a parse table entry contains more than one entry then it is a LL(1) conflict A parser with LL(1) conflict is not deterministic and thus is less efficient LL(1) versus strong-LL(1) (2) LL(1) conflicts can be solved by Left recursion elimination Left factoring Conflict resolvers Why increasing the power of LL(k) parsers? Let’s have a following grammar where idf produces identifiers: This fragment defines expression elements like x, sin(0.41), T[2,3] First token is common for all expressions, but the second token distinguish between alternatives Look ahead of only one token is not enough It would be handy to increase power of deterministic LL parsing. LL(k) parsers It is sometimes handy to look ahead of k symbols, where k>1 Need to define FIRSTk sets Let’s have sequential form x. FIRSTk(x) is a set of terminals where: where y is some sequential form LL(k) parsers (2) Assume that we have following rule in a grammar G Grammar G is a LL(k) grammar iff the sets FIRSTk(a1x#k) … FIRSTk(akx#k) are pairwise disjoint. Symbol #k represents the number of look-ahead symbols It is obvious that every LL(k) grammar is a subset of LL(k+1) grammars, this does not hold vice versa. LL(k) parsers (3) Similarly as by LL(1) parsers, producing parse tables for LL(k) parsers is difficult FOLLOW set for a LL(k) grammar is defined as an union of FIRSTk(x#k) for any prediction Ax#k As by LL(1) parsers, the parse table will be indexed by with a pair consisting of a non-terminal symbol and string of terminals with the length equal to k If a parse table has for every entry at most one element then the grammar denoted by this parse table is strong-LL(k) For k > 1 there are grammars that are LL(k) but not strongLL(k) LL(k) parsers (4) Strong-LL(k) parsers are only seldom used in practice Similar effect can be obtained by using conflict resolvers Linear-approximate LL(k) parsing Difficult constructing of LL(k) parse tables can be avoided by a simple trick In addition to FIRST set, introduce SECOND, THIRD etc set The size complexity is reduced from O(tk) to k tables of O(t), where k is the number of sets Linear-approximate LL(k) grammar is weaker than LL(k) grammar because it breaks the relationship between tokens Let’s assume that we have LL(2) grammar that has look ahead sets of { ab, cd }{ad, cb } Linear-approximate LL(2) grammar has FIRST set { ac } and SECOND {bd} – there are not disjoint LL -regular parsing LL(k) provides bounded look-ahead There are grammars where a discriminating token can be arbitrarily far away Unbounded look-ahead is needed Unbounded look-ahead forms its own context-free grammar Context-free grammar can be approximated by regular grammar There is no algorithm to approximate context-free grammar, but there are several heuristics Parse tree grammar from LL(1) A straightforward process Basic idea is to create new rule for every prediction The non-terminals are numbered by an increasing global counter Then are inserted into prediction stack New created rules forms parse tree grammar As far as the parser is deterministic, the parse tree grammar is obtained instead of parse forest grammar Parse tree grammar from LL(1) (2) Extended LL(1) grammars Some parsers accept Extended LL(1) grammars instead of ordinary one To accept Extended LL(1) grammar parser must transform it to ordinary one without introducing LL(1) conflicts An advantage of Extended LL(1) grammars is that they allow a more efficient implementation in recursive descent parsers Conclusion LL(1) is very intuitive, makes its steps according to prediction of one token There are situations where look-ahead only of one symbol is not sufficient The power of LL parsers can be improved by extending the bounding look-ahead to a bounded length resulting in LL(k) parsing a unbounded length resulting in LL – regular parsing Linear-approximate LL(2) parsing is a convenient and simplified form of a LL(2) parsing Thank you for your attention