COP4020 Programming Languages Parsing

COP4020 Programming Languages Parsing Prof. Xin Yuan Overview    Top-down and bottom-up parsing Recursive descent parsing Table driven LL(1) parsing 5/29/2016 COP4020 Spring 2014 2  Parsing:  The process to determine whether the start symbol can derive the program.    If successful, the program is a valid program. If failed, the program is invalid. Two approaches in general.   Expanding from the start symbol to the whole program (top down) Reduction from the whole program to start symbol (bottom up). <expression> <expression> <operator> <expression> <expression> <operator> <expression> identifier * identifier + identifier  Top-down parsing  build the parse tree from root to leave (using leftmost derivation, why?).  Recursive descent parser  LL parser   First L – left to right scan Second L – left most derivation  Recursive descent parsing associates a procedure with each nonterminal in the grammar, it may require backtracking of the input string.  Example: <type>-><simple> | ^ id | array [<simple>] of <type> <simple> ->integer | char | num dotdot num void type() { if (lookahead == INTEGER || lookahead == CHAR || lookahead==NUM) simple(); else if (lookahead == ‘^’) { match (‘^’); match(ID); } else if (lookahead == ARRAY) { match (ARRAY); match(‘[‘); simple(); match (‘]’); match (OF); type(); } else error(); }  Example: <type>-><simple> | ^ id | array [<simple>] of <type> <simple> ->integer | char | num dotdot num void simple() { if (lookahead == INTEGER) match (INTEGER); else if (lookahead == CHAR) match (CHAR); else if (lookahead == NUM) { match(NUM); match(DOTDOT); match(NUM); } else error(); } void match(token t) { if (lookahead == t) {lookahead = nexttoken();} else error(); }  Recursive descent parsing may require backtracking of the input string    try out all productions, backtrack if necessary. E.g S->cAd, A->ab | a input string cad A special case of recursive-descent parser that needs no backtracking is called a predictive parser.  Look at the input string, must predict the right production every time to avoid backtracking – LL(1)  Needs to know what first symbols can be generated by the right side of a production only lookahead for one token  Non recursive predictive parsing (table driven LL(1) parsing)  Predictive parser can be implemented by recursive-descent parsing   Requirement: by looking at the first terminal symbol that a nonterminal symbol can derive, we should be able to choose the correct production to expand the nonterminal symbol. If the requirement is met, the parser easily be implemented using a non-recursive scheme by building a parsing table.  A parsing table example (1) E->TE’ (2) E’->+TE’ (3) E’-> (4) T->FT’ (5) T’->*FT’ (6) T’-> (7) F->(E) (8) F->id   E E’ T T’ F id (1) + * ( (1) (2) (4) $ (3) (3) (6) (6) (4) (6) (8) ) (5) (7)  Using the parsing table, the predictive parsing program works like this:      A stack of grammar symbols ($ on the bottom) A string of input tokens ($ at the end) A parsing table, M[NT, T] of productions Algorithm: put ‘$ Start’ on the stack ($ is the end of input string). 1) if top == input == $ then accept 2) if top == input then pop top of the stack; advance to next input symbol; goto 1; 3) If top is nonterminal if M[top, input] is a production then replace top with the production; goto 1 else error 4) else error  Example: (1) E->TE’ (2) E’->+TE’ (3) E’-> (4) T->FT’ (5) T’->*FT’ (6) T’-> (7) F->(E) (8) F->id   Stack $E $E’T $E’T’F $E’T’id $E’T’ …... E E’ T T’ F id (1) + * ( (1) (2) (4) input id+id*id$ id+id*id$ id+id*id$ id+id*id$ +id*id$ $ (3) (3) (6) (6) (4) (6) (8) ) (5) (7) production E->TE’ T->FT’ F->id This produces leftmost derivation: E=>TE’=>FT’E’=>idT’E’=>….=>id+id*id

COP4020 Programming Languages Parsing

Related documents

Products

Support

COP4020 Programming Languages Parsing

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib