-Mandakinee Singh (11CS10026) What is parsing? ◦ Discovering the derivation of a string: If one exists. ◦ Harder than generating strings. Two major approaches ◦ Top-down parsing ◦ Bottom-up parsing A parser is top-down if it discovers a parse tree top to bottom. ◦ A top-down parse corresponds to a preorder traversal(preorder expansion) of the parse tree. ◦ A leftmost derivation is applied at each derivation step. Start at the root of the parse tree and grow toward leaves. Pick a production & try to match the input. Bad “pick” may need to backtrack. Top Down Parser –LL(1) Grammar LL(1) parsers ◦ Left-to-right input ◦ Leftmost derivation ◦ 1 symbol of look-ahead Grammars that this can handle are called LL(1) grammars Preorder Expansion: The Leftmost non terminal production will occur first. Start with the root of the parse tree ◦ Root of the tree: node labeled with the start symbol. Algorithm: ◦ Repeat until the fringe of the parse tree matches input string. ◦ Declare a pointer which will represent the current position of the parser in string. ◦ Start scanning character by character from left to right from the parse tree and match it with input string . If the scanned symbol is: ◦ Terminal: Increase the pointer by one. ◦ Non-Terminal: Go for a production. Add a child node for each symbol of chosen production. If a terminal symbol is added that doesn’t match, backtrack. Find the next node to be expanded (a non-terminal) Repeat The process. Done when: ◦ Leaves of parse tree match input string (success) ◦ All productions exhausted in backtracking (failure) Grammar E E+T(rule 1) | E-T(2) | T(3) T T*F(4) | T/F (5)| F(6) F number(7) | Id(8) Input String:x-2*y Rule Sentential form Input string 1 3 6 8 - E E + T T+ T F+T <Id> + T <id,x> + T x-2*y x-2*y x–2*y x–2*y x–2*y x–2*y Problem: ◦ Can’t match next terminal ◦ We guessed wrong at step 2 E T T F x + T Rule Sentential form Input string 1 3 6 8 ? E E + T T+ T F + T <Id> + T <Id,x> + T x x x x x x – – – – 2 2 2 2 2 2 * * * * * * Go for next production. y y y y y y Undo all these productions Rule Sentential form Input string 2 3 6 8 6 7 E E - T T - T F - T <Id> - T <Id,x> - T <Id,x> - F <Id,x> - <num> Problem: x x x x x x x x – – – – – – 2 2 2 2 2 2 2 2 * * * * * * * * E y y y y y y y y ◦ More input to read ◦ Another cause of backtracking E - T T F F 2 x Rule Sentential form - E 2 3 6 8 4 6 7 8 E - T T - T For - T <id> - T <id,x> - T <id,x> - T * F <id,x> - F * F <id,x> - <num> * F <id,x> - <num,2> * F <id,x> - <num,2> * <id> Input string x x x x x x x x x x x – – – – – – – – – 2 2 2 2 2 2 2 2 2 2 2 * * * * * * * * * * * y y y y y y y y y y y All terminals matches- we are done. E E - T T T F F x 2 * F y If we see it carefully then there is one more possibility Rule Sentential form 2 2 2 2 E E+T E+T+T E+T+T+T E+T+T+T+T Input string x - 2 x - 2 x – 2 x – 2 x – 2 * * * * * Problem: Termination ◦ Wrong choice leads to infinite expansion (More importantly: without consuming any input!) ◦ May not be as obvious as this ◦ Our grammar is left recursive y y y y y Formally, A grammar is left recursive if a non-terminal A such that A → A a |b (for some set of symbols a ) A →AA a A→AAA a ……………… A→AAAAAAAAA a A→bAAAAAA……AAAAAAAa How to remove it: A →b A’ A’→ a A’| e Up Next: Predictive Parser