ch3

Chapter 3 Chang Chi-Chung 2015.05.18 The Role of the Parser Source Program Lexical Analyzer Token Parser getNextToken Parse tree Symbol Table Rest of Front intermediate representation End 如何表示程式語言的文法? 使用 Context Free Grammar，簡稱 CFG CFG 比起 Regular Expression 更有威力 (powerful notation than RE) Context-Free Grammar  Context-free grammar is a 4-tuple G = < T, N, P, S> where T is a finite set of tokens (terminal symbols) N is a finite set of nonterminals is a finite set of productions of the form  where   N and   (NT)* P S  N is a designated start symbol Derivations  The one-step derivation is defined by A where A   is a production in the grammar  In addition, we define  is leftmost  lm if  does not contain a nonterminal   is rightmost  rm if  does not contain a nonterminal  Transitive closure  * (zero or more steps)  Positive closure  + (one or more steps) Example of the Derivations list  list + digit  list - digit + digit  digit - digit + digit  9 - digit + digit  9 - 5 + digit 9-5+2  Production  list  list + digit  list  list – digit  list  digit  digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Leftmost derivation    replaces the leftmost nonterminal (underlined) in each step. Rightmost derivation  replaces the rightmost nonterminal in each step. Example of the Parser Tree  Parse tree of the string 9-5+2 using grammar G list list list digit digit digit 9 - 5 + 2 The sequence of leafs is called the yield of the parse tree Sentence and Language  Sentential form  If  Sentence A  S *  in the grammar G, then  is a sentential form of G sentential form of G has no nonterminals. Language  The language generated by G is it’s set of sentences.  The language generated by G is defined by L(G) = { w  T* | S * w }  A language that can be generated by a grammar is said to be a Context-Free language.  If two grammars generate the same language, the grammars are said to be equivalent. An Example  Expr  Op (a + b) x c  Expr  Expr Op c  ( Expr )  ( Expr ) Op c | Expr Op name | name  (Expr Op b) Op c  + | | x | / ( a Op b ) Op c (a + b) Op c (a + b) x c Ambiguity  A grammar that produces more than one parse tree for some sentence is said to be ambiguous.  Example  id + id * id E → E + E | E * E | ( E ) | id EE+E  id + E  id + E * E  id + id * E  id + id * id EE*E E+E*E  id + E * E  id + id * E  id + id * id Example  Consider the following context-free grammar G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>  This grammar is ambiguous, because more than one parse tree represents the string 9-5+2 P = string  string + string | string - string | 0 | 1 | … | 9 Example string string string 9 string string string - 5 string string + 2 9 string - 5 string + 2 Ambiguity  Dangling-else Grammar stmt  if expr then stmt | if expr then stmt else stmt | other if E1 then S1 else if E2 then S2 else S3 Eliminating Ambiguity(2) if E1 then if E2 then S1 else S2 Parsing    The process of determining if a string of terminals (tokens) can be generated by a grammar. Time complexity:  For any CFG there is a parser that takes at most O(n3) time to parse a string of n terminals.  Linear algorithms suffice to parse essentially all languages that arise in practice. Two kinds of methods  Top-down: constructs a parse tree from root to leaves  Bottom-up: constructs a parse tree from leaves to root 兩種語法分析方式  Top-down Parsing 最左推導  不可以有左遞迴  不可以有左因子  明確性文法   RG LL(1) Bottom-up Parsing 最右推導  不可以有右遞迴  不可以有右因子  明確性文法  LR(1) CFG Notational Conventions       Terminals  a, b, c, …  T  example: 0, 1, +, *, id, if Nonterminals  A, B, C, …  N  example: expr, term, stmt Grammar symbols  X, Y, Z  (N  T) Strings of terminals  u, v, w, x, y, z  T* Strings of grammar symbols (sentential form)  , ,   (N  T)* The head of the first production is the start symbol, unless stated. Top-down Parsing  recursive-descent parsing  LL(1)  Left-to-right, Leftmost derivation  Creating the nodes of the parse tree in preorder ( depth-first ) Grammar ET+T T(E) T-E T  id E Leftmost derivation E lm T + T lm id + T lm id + id E E T T + T id + E T T T id + id Recursive Descent Parsing  Every nonterminal has one (recursive) procedure responsible for parsing the nonterminal’s syntactic category of input tokens  When a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input lookahead information Recursive Descent Parsing void A() { Choose an A-Production, AX1X2…Xk; for (i = 1 to k) { if ( Xi is a nonterminal) call procedure Xi(); else if ( Xi = current input symbol a ) advance the input to the next symbol; else } } /* an error has occurred */ Conclusion: Parsing and Translation Scheme  Complete import java.io.*; class Parser { static int lookahead; public Parser() throws IOException { lookahead = System.in.read(); } void expr() { term(); while ( true ) { if ( lookahead == ‘+’ ) { match(‘+’); term(); System.out.write(‘+’); continue; } else if (lookahead == ‘-’) { match(‘-’); term(); System.out.write(‘-’); continue; } else return; } void term() throws IOException { if (Character.isDigit((char)lookahead){ System.out.write((char)lookahead); match(lookahead); } else throw new Error(“syntax error”); } void match(int t) throws IOException { if ( lookahead == t ) lookahead = System.in.read(); else throw new Error(“syntax error”); } } LL(1) LL(1) Grammar  Predictive parsers, that is, recursive-descent parsers needing no backtracking, can be constructed for a class of grammars called LL(1)  First “L” means the input from left to right.  Second “L” means leftmost derivation.  “1” for using one input symbol of lookahead at each step tp make parsing action decisions.  No left-recursive.  No ambiguous. FIRST and FOLLOW S a A α c β γ c is in FIRST(A) a is in FOLLOW(A) FIRST and FOLLOW The constructed of both top-down and bottomup parsers is aided by two functions, FIRST and FOLLOW, associated with a grammar G.  During top-down parsing, FIRST and FOLLOW allow us to choose which production to apply.  During panic-mode error recovery, sets of tokens produced by FOLLOW can be used as synchronizing tokens.  FIRST  FIRST()  The set of terminals that begin all strings derived from   FIRST(a) = { a } if a  T  FIRST() = {  }  FIRST(A) = A FIRST () for A  P  FIRST(X1X2…Xk) = if   FIRST (Xj) for all j = 1, …, i-1 then add non- in FIRST(Xi) to FIRST(X1X2…Xk) if   FIRST (Xj) for all j = 1, …, k then add  to FIRST (X1X2…Xk) FIRST(1)  By definition of the FIRST, we can compute FIRST(X)  If XT, then FIRST(X) = {X}.  If XN, X→, then add  to FIRST(X). XN, and X → Y1 Y2 . . . Yn, then add all non- elements of FIRST(Y1) to FIRST(X), if FIRST(Y1), then add all non- elements of FIRST(Y2) to FIRST(X), ..., if FIRST(Yn), then add  to FIRST(X).  If FOLLOW  FOLLOW(A)  the set of terminals that can immediately follow nonterminal A  FOLLOW(A) = for all (B   A )  P do add FIRST()-{} to FOLLOW(A) for all (B   A )  P and   FIRST() do add FOLLOW(B) to FOLLOW(A) for all (B   A)  P do add FOLLOW(B) to FOLLOW(A) if A is the start symbol S then add $ to FOLLOW(A) FOLLOW(1)  By definition of the FOLLOW, we can compute FOLLOW(X)  Put $ into FOLLOW(S). each A B, add all non- elements of FIRST() to FOLLOW(B).  For each A B or A B, where FIRST(), add all of FOLLOW(A) to FOLLOW(B).  For Example  Give a Grammar G E → T E’ E’ → + T E’ | ε T → F T’ FIRST E ( E’ + T ( T’ * F ( T’ → * F T’ | ε F → ( E ) | id id  id  id FOLLOW E E’ T T’ F $ ) + * $ $ + + ) ) $ ) $ ) Using FIRST and FOLLOW to Write a Recursive Descent Parser rest() { if (lookahead in FIRST(+ term rest) ) { match(‘+’); term(); rest() } else if (lookahead in FIRST(- term rest) ) { match(‘-’); term(); rest() } else if (lookahead in FOLLOW(rest) ) return else error() expr  term rest rest  + term rest | - term rest |  term  id } FIRST(+ term rest) = { + } FIRST(- term rest) = { - } FOLLOW(rest) = { $ }

ch3

Related documents

Products

Support

ch3

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib