Part 4 Syntax Analysis E.g. <sentence> <Subject><Predicate> <Subject> <adjective><noun> <Subject> <noun> <Predicate> <verb><Object> <Object> <adjective><noun> <Object> <noun> “young men like pop music” Lexical Analyzer “(adjective, ) (noum, ) (verb, ) (adjective, ) (noum, )” ???? <sentence> <Subject><Predicate> Leftmost Rightmost <adjective><noun><Predicate> Derivation Reduction <adjective><noun>< verb><object> <adjective><noun>< verb>< adjective><noun> Leftmost <adjective><noun>< verb>< adjective><noun> Reduction <Subject >< verb>< adjective><noun> <Subject >< verb>< object> <Subject><Predicate> <sentence> How can I design and code the “derivation” or “reduction”? 0 Approaches to implement a Syntax analyzer 1、The syntax description of programming language constructs – Context-free grammars What is the definition of Context-free grammars? Please recall it! 6 2、Why a grammar is usually used to describe the syntax of a programming language? – A grammar gives a precise ,yet easy-tounderstand, syntactic specification of a programming language – From certain classes of grammar we can automatically construct an efficient parser that determines if a source program is syntactically well formed – A properly designed grammar imparts a structure to a programming language that is useful for the translation of source programs into correct object code and for the detection of errors – The evolved constructs can be added to a language more easily 3、Approached to implement a syntax analyzer – Manual construction – Construction by tools 4.1 The Role of the Parser 1、 Main task – Obtain a string of tokens from the lexical analyzer – Verify that the string can be generated by the grammar of related programming language – Report any syntax errors in an intelligible fashion – Recover from commonly occurring errors so that it can continue processing the remainder of its input 2、Position of parser in compiler model Source Lexical program analyzer token Rest of Intermediate Parse Parser front end representation tree Get next token Symbol table 3、Parsing methods (1)Top-Down (2)Bottom-Up 4、Syntax Error handling 1) Error levels – Lexical, such as misspelling an identifier, keyword, or operator – Syntactic, such as an arithmetic expression with unbalanced parentheses – Semantic, such as an operator applied to an incompatible operand – Logical, such as an infinitely recursive call 2) Simple-to-state goals of the error handler – It should report the presence of errors clearly and accurately – It should recover from each error quickly enough to be able to detect subsequent errors – It should not significantly slow down the processing of correct programs 3) Error-recovery strategies – Panic mode • Discard input symbols one at a time until one of a designated set of synchronizing tokens is found – Phrase level • Replace a prefix of the remaining input by some string that allows the parser to continue Simple Instruction of Top-Down and Bottom-Up E.g. 1) S xAy 2) A ** 3)A *, and Verify “x*y” Top-Down: (1)Left-most derivation SxAy x*y Parser Tree Bottom-Up: (1)Left-most reduction x*y xAy Parser Tree S x A y * How codes? PDA Model x*y # x*y # S # controller rules Sentential form ? controller rules output # 1,3 String 1) S xAy 2) A ** 3)A *, output 3,1 ? Controller T-D PDA Controller IF “x” is the top symbol of the stack and is non-terminal,then find a production rule as “x……” randomly,replace “x” with the right of the rule,and output the No of the rule—— derivation。 IF “x” is the top symbol of the stack and is same to that under the reading point,then… —— Matching。 IF (2) fail, then make a backtracking action to the scene before the last derivation and select a new rule —backtracking IF there no new rule, fail IF there is only “#” in the stack, and “#” is under the reading point ,success B-U PDA Controller • IF the several top symbols in stack is a Handling, then reduction, • else if “#” is under the reading point then fail, else Move the symbol under reading point into stack. • IF there is only #S in the stack, and “#” is under the reading point ,success E.G. x*y # x*y # S # controller rules output controller rules # output E.G. x*y # x*y # x A y # controller rules 1 x # controller rules output E.G. x*y # x*y # A y # controller rules 1 * x # controller rules output E.G. x*y # x*y # * * y # controller rules 1,2 A x # controller rules 3 E.G. x*y # x*y # * y # controller rules 1,2 y A x # controller rules 3 E.G. x*y # x*y # A y # controller rules 1 S # controller rules 3,1 E.G. x*y # x*y # * y # controller rules 1,3 S # controller rules 3,1 E.G. x*y # x*y # y # controller rules 1,3 S # controller rules 3,1 E.G. x*y # x*y # # controller rules 1,3 S # controller rules 3,1 Discussion Flaw of T-D Left Recursion Infinite loop Eliminating Left Recursion Backtracking inefficient 1. Methods: Predictive and Eliminating Ambiguity 2. Left common factor Flaw of B-U • Next 4. 2 TOP-DOWN PARSING 1、Ideas Find a leftmost derivation for an input string E (E) (E+E) (E*E+E) ( i*E+E) ( i*i+E) ( i* i+ i) E Construct a parse tree for the input starting from the root and creating the nodes of the parse tree in preorder. E i ( E ) E + E * E i i 2、Main methods – Predictive parsing (no backtracking) – Recursive descent (involve backtracking) 3、Recursive descent – A deducing procedure, which construct a parse tree for the string top-down from S. When there is any mismatch, the program go back to the nearest non-terminal, select another production to construct the parse tree – If you produce a parse tree at last, then the parsing is success, otherwise, fail. Grammar for Parsing Example Start Expr Expr Expr + Term Expr Expr - Term Expr Term Term Term * Int Term Term / Int Term Int • Set of tokens is { +, -, *, /, Int }, where Int = [0-9][0-9]* Parsing Example Parse Tree Remaining Input Start <int,><-,><int, ><*,><int, > Sentential form Start Applied Production Current Position in Parse Tree Parsing Example Parse Tree Start Remaining Input <int,><-,><int, ><*,><int, > Expr Sentential Form Expr Current Position in Parse Tree Start Expr Parsing Example Parse Tree Remaining Input Start <int,><-,><int, ><*,><int, > Expr Expr - Sentential Form Term Expr - Term Expr Expr + Term Expr Expr - Term Expr Term Applied Production Expr Expr - Term Parsing Example Parse Tree Remaining Input Start <int,><-,><int, ><*,><int, > Expr Expr Term - Sentential Form Term Term - Term Expr Expr + Term Expr Expr - Term Expr Term Applied Production Expr Term Parsing Example Parse Tree Remaining Input Start <int,><-,><int, ><*,><int, > Expr Expr Term Int - Sentential Form Term Int - Term Applied Production Term Int Parsing Example Parse Tree Remaining Input Match Input <int,><-,><int, ><*,><int, > Token! Sentential Form Start Expr Expr Term Int - Term Int - Term Parsing Example Parse Tree Match Input Token! Start Expr Expr Term Int - Remaining Input <-,><int, ><*,><int, > Sentential Form Term - Term Parsing Example Parse Tree Match Input Token! Start Expr Expr Term Int - Remaining Input <int, ><*,><int, > Sentential Form Term Term Parsing Example Parse Tree Remaining Input Start <int, ><*,><int, > Expr Expr Term Int Term Sentential Form Term * Term*Int Int Applied Production Term Term * Int Parsing Example Parse Tree Remaining Input Start <int, ><*,><int, > Expr Expr Term Int Term Int Sentential Form Term * Int * Int Int Applied Production Term Int Parsing Example Parse Tree Match Input Token! Start Expr Expr Term Int Term Int Remaining Input <int, ><*,><int, > Sentential Form Term * Int* Int Int Parsing Example Parse Tree Match Input Token! Start Expr Expr Term Int Term Int Remaining Input <*,><int, > Sentential Form Term * * Int Int Parsing Example Parse Tree Match Input Token! Start Expr Expr Term Int Term Int Remaining Input <int, > Sentential Form Term * Int Int Parsing Example Parse Tree Parse Complete! Start Expr Expr Term Int Term Int Remaining Input <int, > Sentential Form Term * Int Backtracking Example Start <int,><-,><int, ><*,><int, > Start Backtracking Example Start <int,><-,><int, ><*,><int, > Expr Start Expr Backtracking Example Parse Tree Start <int,><-,><int, ><*,><int, > Expr Sentential Form Expr + Term Expr + Term Expr Expr + Term Backtracking Example Parse Tree Remaining Input Start <int,><-,><int, ><*,><int, > Expr Sentential Form Expr Term + Term Term + Term Applied Production Expr Term Backtracking Example Parse Tree Remaining Input Match Input <int,><-,><int, ><*,><int, > Token! Sentential Form Start Expr Expr Term Int + Term Int + Term Applied Production Term Int Backtracking Example Parse Tree Can’t Match Input Token! Start Expr Expr Term Int 2 + Term Remaining Input <-,><int, ><*,><int, > Sentential Form Int - Term Applied Production Term Int Backtracking Example Parse Tree Start Remaining Input So <int,><-,><int, ><*,><int, > Backtrack! Expr Sentential Form Expr Applied Production Start Expr Backtracking Example Parse Tree Remaining Input Start <int,><-,><int, ><*,><int, > Expr Sentential Form Expr - Term Expr - Term Applied Production Expr Expr - Term Backtracking Example Parse Tree Remaining Input Start <int,><-,><int, ><*,><int, > Expr Sentential Form Expr Term - Term Term - Term Applied Production Expr Term Backtracking Example Parse Tree Remaining Input Start <-,><int, ><*,><int, > Expr Sentential Form Expr Term Int - Term Int - Term Applied Production Term Int Backtracking Example Parse Tree Match Input Token! Start Expr Expr Term Int - Term Remaining Input <-,><int, ><*,><int, > Sentential Form Int - Term Backtracking Example Parse Tree Match Input Token! Start Expr Expr Term Int - Term Remaining Input <int, ><*,><int, > Sentential Form Int - Term How to code that? PDA models a+b……# S # Control part Production rules 输出带 Runing (1)if “x” is the top symbol of the stack and is nonterminal,then find a production rule as “x……”randomly,replace “x” with the right of the rule,and output the No of the rule——derivation。 (2) if “x” is the top symbol of the stack and is same to that under the reading point,then… ——Matching。 (3)if (2) fail, then make a backtracking action to the scene before the last derivation and select a new rule — backtracking (4) If there no new rule, fail (5)if there is only “#” in the stack,and “#” is under the reading point ,success E.g. 1) S xAy 2) A ** 3)A * x*y # S # controller Production rules output E.g. 1) S xAy 2) A ** 3)A * x*y # S # 1) S xAy 2) A ** 3)A * x*y # x A y # 1 1) S xAy 2) A ** 3)A * x*y # A y # 1 1) S xAy 2) A ** 3)A * x*y # * * y # 1,2 1) S xAy 2) A ** 3)A * x*y # * y # 1,2 1) S xAy 2) A ** 3)A * x*y # A y # 1 1) S xAy 2) A ** 3)A * x*y # * y # 1,3 1) S xAy 2) A ** 3)A * x*y # y # 1,3 Left Recursion + Top-Down Parsing = Infinite Loop • Example Production: Term Term*Num • Potential parsing steps: Term Term Term Term * Num Term * Term * Num Num Backtracking parsers are not seen frequently, because: • Backtracking is not very efficient. Why backtracking occurred? A left-recursive grammar can cause a recursive-descent parser to go into an infinite loop. An ambiguity grammar can cause backtracking Left factor can also cause a backtracking 4、Elimination of Left Recursion 1)Basic form of left recursion Left recursion is the grammar contains the following kind of productions. • P P| Immediate recursion or • P Aa , APb Indirect recursion 2)Strategy for elimination of Left Recursion Convert left recursion into the equivalent right recursion P P| => P->* => P P’ P’ P’| 3)Algorithm (1) Elimination of immediate left recursion P P| => P->* => P P’ P’ P’| (2) Elimination of indirect left recursion Convert it into immediate left recursion first according to specific order, then eliminate the related immediate left recursion Algorithm: – (1)Arrange the non-terminals in G in some order as P1,P2,…,Pn, do step 2 for each of them. – (2) for (i=1,i<=n,i++) {for (k=1,k<=i-1,k++) {replace each production of the form Pi Pk by Pi 1 | 2 |……| ,n ; where Pk 1| 2|……| ,n are all the current Pk -productions } change Pi Pi1| Pi2|…. | Pim|1| 2|….| n into Pi 1 Pi `| 2 Pi `|……| n Pi ` Pi`1Pi`|2Pi`|……| mPi`| } /*eliminate the immediate left recursion*/ (3)Simplify the grammar. E.g. Eliminating all left recursion in the following grammar: (1) S Qc|c (2)Q Rb|b (3) R Sa|a Answer: 1)Arrange the non-terminals in the order:R,Q,S 2)for R: no actions. for Q:Q Rb|b Q Sab|ab|b for S: S Qc|c S Sabc|abc|bc|c; then get S (abc|bc|c)S` S` abcS`| 3) Because R,Q is not reachable, so delete them so, the grammar is : S (abc|bc|c)S` S` abcS`| 5、Eliminating Ambiguity of a grammar – Rewriting the grammar stmtif expr then stmt|if expr then stmt else stmt|other ==> stmt matched-stmt|unmatched-stmt matched-stmt if expr then matched-stmt else matched-stmt|other unmatched-stmt if expr then stmt|if expr then matched-stmt else unmatched-stmt 6、Left factoring – A grammar transformation that is useful for producing a grammar suitable for predictive parsing – Rewrite the productions to defer the decision until we have seen enough of the input to make right choice If the grammar contains the productions like A1| 2|…. | n Chang them into AA` A`1|2|…. |n 7、Predictive Parsers Methods – Transition diagram based predictive parser – Non-recursive predictive parser 8、 Transition diagram based Predictive Parsers 1) Transition diagram – create an initial and final(return) state – for each production AX1X2…Xn, create a path from initial to the final state, with edges labeled X1,X2,..,Xn Note: (1)There is one diagram for each nonterminal; (2)The labels of edges are tokens or nonterminals; (3)If the edge is labeled by a non-terminal A, the parser instead goes to the start state for A, without moving the input cursor (4)When an edge labeled by a nonterminal is followed, a potentially recursive procedure call is made 2) Transition diagram based predictive parsing • Begins in the start state for the start symbol; • When it is in state s with an edge labeled by terminal a to state t, and the next input symbol is a, then the parser moves the input cursor and goes to state t • When it is in state s with an edge labeled by nonterminal A to state t, then the parser instead goes to the start state for A, without moving the input cursor. If it ever reaches the final state for A, it immediately goes to state t, in effect having read A from the input during the time it moved from state s to t. 9、Non-recursive Predictive Parsing 1) key problem in predictive parsing • Determining the production to be applied for a non-terminal 2)Basic idea of the parser Table-driven and use stack 3) Model of a non-recursive predictive parser Input a+b……# Stack S # Predictive Parsing Program Parsing Table M Output 4) Predictive Parsing Program X: the symbol on top of the stack; a: the current input symbol If X=a=#, the parser halts and announces successful completion of parsing; If X=a!=#, the parser pops X off the stack and advances the input pointer to the next input symbol; If X is a non-terminal, the program consults entry M[X,a] of the parsing table M. This entry will be either an X-production of the grammar or an error entry. E.g. Consider the following grammar, and parse the string id+id*id# 1.E TE` 2.E` +TE` 3.E` 4.T FT` 5.T` *FT` 6.T` 7.F id 8.F (E) Parsing table M id E ( TFT` # E`ε E`ε T`ε T`ε TFT` T`ε F i ) ETE` E` +TE` T` F * ETE` E` T + T` *FT` F (E) id+id*id# E # Predictive Parsing Program Parsing Table M Please Write down the procedure of analysis! 10、Construction of a predictive parser 1) FIRST & FOLLOW FIRST: • If is any string of grammar symbols, let FIRST() be the set of terminals that begin the string derived from . + , then is also in FIRST() • If • That is : V*, First()={a| a……,a VT } FOLLOW: • For non-terminal A, to be the set of terminals a that can appear immediately to the right of A in some sentential form. • That is: Follow(A)={a|S …Aa…,a VT } If S…A, then # FOLLOW(A)。 2) Computing FIRST() (1)to compute FIRST(X) for all grammar symbols X • If X is terminal, then FIRST(X) is {X}. • If X is a production, then add to FIRST(X). • If Xa is a production, then add a to FIRST(X). • If X is non-terminal, and X Y1Y2…Yk,Yj(VNVT),1j k, then { j=1; FIRST(X)={}; //initiate while ( j<k and FIRST(Yj)) { FIRST(X)=FIRST(X)(FIRST(Yj)-{}) j=j+1 } IF (j=k and FIRST(Yk)) FIRST(X)=FIRST(X) {} } (2)to compute FIRST for any string =X1X2…Xn, Xi(VNVT),1i n {i=1; FIRST()={}; //initiate repeat { FIRST()=FIRST()(FIRST(Xi)-{}) i=i+1 } until (i=n and FIRST(Xj)) IF (i=n and FIRST(Xn)) FIRST()=FIRST(){} } 3) Computing FOLLOW(A) (1) Place # in FOLLOW(S), where S is the start symbol and # is the input right endmarker. (2)If there is A B in G, then add First()-{}to Follow(B). (3)If there is A B, or AB where FIRST() contains ,then add Follow(A) to Follow(B). E.g. Consider the following Grammar, construct FIRST & FOLLOW for each nonterminals 1.E TE` 2.E` +TE` 3.E` 4.T FT` 5.T` *FT` 6.T` 7.F i 8.F (E) Answer: First(E)=First(T)=First(F)={(, i} First(E`)={+, } First(T`)={*, } Follow(E)= Follow(E`)={),#} Follow(T)= Follow(T`)={+,),#} Follow(F)={*,+,),#} 4) Construction of Predictive Parsing Tables Main Idea: Suppose A is a production with a in FIRST(). Then the parser will expand A by when the current input * , we should again symbol is a. If expand A by if the current input symbol is in FOLLOW(A), or if the # on the input has been reached and # is in FOLLOW(A). – Input. Grammar G. – Output. Parsing table M. Method. 1. For each production A , do steps 2 and 3. 2. For each terminal a in FIRST(), add A to M[A,a]. 3. If is in FIRST(), add A to M[A,b] for each terminal b in FOLLOW(A). If is in FIRST() and # is in FOLLOW(A), add A to M[A,#]. 4.Make each undefined entry of M be error. E.g. Consider the following Grammar, construct predictive parsing table for it. 1.E TE` 2.E` +TE` 3.E` 4.T FT` 5.T` *FT` 6.T` 7.F i 8.F (E) Answer: First(E)=First(T)=First(F)={(, i} First(E`)={+, } First(T`)={*, } Follow(E)= Follow(E`)={),#} Follow(T)= Follow(T`)={+,),#} Follow(F)={*,+,),#} i E ( TFT` # E`ε E`ε T`ε T`ε TFT` T`ε F i ) ETE` E` +TE` T` F * ETE` E` T + T` *FT` F (E) 11、LL(1) Grammars E.g. Consider the following Grammar, construct predictive parsing table for it. S iEtSS` |a S` eS | E b a S b S a i t # S iEtSS` S` E e S` eS S` E b S`ε 1)Definition A grammar whose parsing table has no multiplydefined entries is said to be LL(1). The first “L” stands for scanning the input from left to right. The second “L” stands for producing a leftmost derivation “1” means using one input symbol of look-ahead s.t each step to make parsing action decisions. Note: (1)No ambiguous can be LL(1). (2)Left-recursive grammar cannot be LL(1). (3)A grammar G is LL(1) if and only if whenever A | are two distinct productions of G: 1). For no terminal a do both and derive strings beginning with a. 2). At most one of and can derive the empty string. * then does not derive any string 3). If , beginning with a terminal in FOLLOW(A). 12、Transform a grammar to LL(1) Grammar – Eliminating all left recursion – Left factoring 13、Error recovery in predictive parsing Panic-mode error recovery Phrase-level recovery 4. 3 BOTTOM-UP Parsing 1、Basic idea of bottom-up parsing Shift-reduce parsing – Operator-precedence parsing • An easy-to-implement form – LR parsing • A much more general method • Used in a number of automatic parser generators 2、Basic concepts in Shift-reducing Parsing – Handles – Handle Pruning 3、Stack implementation of Shift-Reduce parsing Input ……# Stack # Parsing Program Parsing Table M Output 4. 4 Operator-precedence parsing 1、The definition of an operator grammar – The grammar has the property that no production right side is or has two adjacent non-terminals. – E.g. E E+E|E-E|E*E|E/E|(E)|i 2、Precedence relations – Three disjoint precedence relations , between certain pairs of terminals. Terminals a,b, with the following forms:“…ab…”, “…aQb…”, and Q if non-terminal. Then the relationship of a and b is: 1) a b a yields precedence to b 2) a b a has the same precedence as b 3) a b a takes precedence over b 4) for some terminals,we might have none of these relations. RS LS + * ( ) id + * ( ) id # Related Grammar: EE+F|F F F*G|G G (E)|id # 3、Using Operator-Precedence Relations Delimit the handle of a right sentential form, with marking the left end, appearing in the interior of the handle, and marking the right end. • Let’s analyze id+id+id*id# according to Operator-Precedence Relations. 4、Operator-precedence parsing Algorithm – Input. An input string w and a table of precedence relations. – Output. If w is well formed , a skeletal parse tree, with a placeholder nonterminal E labeling all interior nodes; otherwise, an error indication. – Method. Initially, the stack contains # and the input buffer the string w#. Algorithm Set ip to point to the first symbol of w#; While (1) { if (# is on top of the stack an ip points to #) /*success*/ return; else { let a be the topmost terminal symbol on the stack; let b be the symbol pointed to by ip; if (a b || a b) /*Shift*/ { push b onto the stack; advance ip to the next input symbol; } Algorithm else if a b /*reduce*/ do { pop the stack} while the top stack terminal is not related by to the terminal most recently popped else error() } } 5、Construct the operator-precedence relationship table – Construct the FIRSTVT and LASTVT for each non-terminals in the grammar. – Find out the relations between each of the terminals. FIRSTVT(P)= { a|P a…or P Qa…,a VT; P,Q VN} LASTVT(P)= { a|P … a or P … aQ,a VT; P,Q VN} Construct FIRSTVT(P) (1) If the productions are like P a… or P Qa… , then a FIRSTVT(P) (2) If a FIRSTVT(Q), and there is a production like P Q… in the grammar, then a FIRSTVT(P) – If there is such string as …aP…at the right side of a production, for each of the terminals belong to FIRSTVT(P), the relation is a b; – If there is such string as …Pb… at the right side of a production, for each of the terminals belong to LASTVT(P), the relation is a b. – If there is such string as …aPb… or …ab… at the right side of a production, then a b. Notes: We assume the precedence of a unary operator is always higher than that of a binary operator E.g. Construct the operator-precedence relationship table S if Eb then E else E E E+T|T T T*F|F F i Eb b Answer: add a production S’#S# FIRSTVT(S)={if} FIRSTVT(E)={+,*,i} FIRSTVT(T)={*,i} FIRSTVT(F)={i} FIRSTVT(Eb )={b} LASTVT(S)={else,+,*,i} LASTVT(E)={+,*,i} LASTVT(T)={*,i} LASTVT(F)={i} LASTVT(Eb)={b} if then else if then else + * i b # + * i b # 6、Advantages of Operator-precedence parsing – Simplicity, easy to construct by hand 7、Disadvantages of Operator-precedence parsing – It is hard to handle tokens like the unary operators – Since the relationship between a grammar for the language being parsed and the operatorprecedence parser itself is tenuous, one cannot always be sure the parser accepts exactly the desired language. – Only a small class of grammars can be parsed using operator-precedence techniques. Exercises: 4.14, 4.27 4. 5 LR parsers 1、LR parser – An efficient, bottom-up syntax analysis technique that can be used to parse a large class of context-free grammars – LR(k) • L: left-to-right scan • R: construct a rightmost derivation in reverse • k: the number of input symbols of look ahead 2、Advantages of LR parser – It can recognize virtually all programming language constructs for which context-free grammars can be written – It is the most general non backtracking shift-reduce parsing method – It can parse more grammars than predictive parsers can – It can detect a syntactic error as soon as it is possible to do so on a left-to-right scan of the input 3、Disadvantages of LR parser – It is too much work to construct an LR parser by hand – It needs a specialized tool,YACC, help it to generate a LR parser 4、Three techniques for constructing an LR parsing – SLR: simple LR – LR(1): canonical LR – LALR: look ahead LR 5、The LR Parsing Model input a+b……# LR Parsing Program S0 stack Parsing table output Note: 1)The driver program is the same for all LR parsers; only the parsing table changes from one parser to another 2)The parsing program reads characters from an input buffer one at a time 3)Si is a state, each state symbol summarizes the information contained in the stack below it 4)The current input symbol are used to index the parsing table and determine the shiftreduce parsing decision 5)In an implementation, the grammar symbols need not appear on the stack 6、The parsing table state 0 1 2 3 4 5 i S5 + S6 r2 r4 ACTION * ( S4 ) # accept S7 r4 S5 r2 r4 r2 r4 S4 r6 GOTO E T F 1 2 3 r6 8 r6 r6 2 3 – Action: a parsing action function • Action[S,a]: S represent the state currently on top of the stack, and a represent the current input symbol. So Action[S,a] means the parsing action for S and a. Action: a parsing action function • Shift – The next input symbol is shifted onto the top of the stack – Shift S, where S is a state • Reduce – The parser knows the right end of the handle is at the top of the stack, locates the left end of the handle within the stack and decides what nonterminal to replace the handle. Reduce by a grammar production A • Accept – The parser announces successful completion of parsing. • Error – The parser discovers that a syntax error has occurred and calls an error recovery routine. Goto: a goto function that takes a state and grammar symbol as arguments and produces a state E.g. the parsing action and goto functions of an LR parsing table for the following grammar. E E+T E T T T*F T F F (E) Fi state 0 1 2 3 4 5 6 7 8 9 10 11 i S5 + S6 r2 r4 ACTION * ( S4 ) # accept S7 r4 S5 r2 r4 r2 r4 S4 r6 r6 S5 S5 8 r6 r3 r5 S11 r1 r3 r5 2 3 9 3 10 r6 S4 S4 S6 r1 r3 r5 GOTO E T F 1 2 3 r1 r3 r5 1)Sj means shift and stack state j, and the top of the stack change into(j,a); 2)rj means reduce by production numbered j; 3)Accept means accept 4)blank means error Moves of LR parser on i*i+i State stack Sym bol stack input action 0 # i*i+ i# Shift 05 #i *i+ i# R educe by 6 03 #F *i+ i# R educe by 4 02 #T *i+ i# Shift 027 #T * i+ i# Shift 0275 #T *i + i# R educe by 6 02710 #T *F + i# R educe by 3 02 #T + i# R educe by 2 01 #E + i# Shift 016 #E + i# Shift 0165 #E + i # R educe by 6 0163 #E + F # R educe by 4 0169 #E + T # R educe by 1 01 #E # A ccept Action conflict • Shift/reduce conflict – Cannot decide whether to shift or to reduce • Reduce/reduce conflict – Cannot decide which of several reductions to make Notes: An ambiguous grammar can cause conflicts and can never be LR,e.g. If_stmt syntax (if expr then stmt [else stmt]) 7、The algorithm – The next move of the parser is determined by reading the current input symbol a, and the state S on top of the stack,and then consulting the parsing action table entry action[S,a]. – If action[Sm,ai]=shift S`,the parser executes a shift move ,enter the S` into the stack,and the next input symbol ai+1 become the current symbol. – If action[Sm,ai]=reduce A , then the parser executes a reduce move. If the length of is , then delete states from the stack, so that the state at the top of the stack is Sm- . Push the state S’=GOTO[Sm- ,A] and non-terminal A into the stack. The input symbol does not change. – If action[Sm,ai]=accept, parsing is completed. – If action[Sm,ai]=error, the parser has discovered an error and calls an error recovery routine. 8、LR Grammars – A grammar for which we can construct a parsing table is said to be an LR grammar. 9、The difference between LL and LR grammars – LR grammars can describe more languages than LL grammars 10、Types of LR grammars – LR(0), SLR, LR(1), LALR – Note:the LR parsing algorithm is the same,but parsing table is different. Discussion • Can we regard a parsing table as a FA. • What is the FA doing? State? Action? 11、Canonical LR(0) 1)LR(0) item – An LR(0) item of a grammar G is a production of G with a dot at some position of the right side. • Such as: A XYZ yields the four items: – A•XYZ . We hope to see a string derivable from XYZ next on the input. – AX•YZ . We have just seen on the input a string derivable from X and that we hope next to see a string derivable from YZ next on the input. – AXY•Z – AX YZ• • The production A generates only one item, A•. • Each of this item is a viable prefixes 2) Construct the canonical LR(0) collection (1)Define a augmented grammar • If G is a grammar with start symbol S,the augmented grammar G` is G with a new start symbol S`, and production S` S • The purpose of the augmented grammar is to indicate to the parser when it should stop parsing and announce acceptance of the input. (2)the Closure Operation • If I is a set of items for a grammar G, then closure(I) is the set of items constructed from I by the two rules: – Initially, every item in I is added to closure(I). – If A•B is in CLOSURE(I), and B is a production, then add the item B• to CLOSURE(I); Apply this rule until no more new items can be added to CLOSURE(I). (3)the Goto Operation • Form: goto(I, X),I is a set of items and X is a grammar symbol • goto(I, X)is defined to be the CLOSURE(J), X ( VN VT), J={all items like AX•| A•XI}。 3)The Sets-of-Items Construction void ITEMSETS-LR0() { C:={CLOSURE(S` •S)} /*initial*/ do { for (each set of items I in C and each grammar symbol X ) IF (Goto(I,X) is not empty and not in C) {add Goto(I,X) to C} }while C is still extending } e.g. construct the canonical collection of sets of LR(0) items for the following augmented grammar. S` E E aA|bB A cA|d B cB|d Answer:1、the items are: 1. S` •E 2. S` E• 3. E •aA 4. E a•A 5. E aA• 6. A •cA 7. A c•A 8. A cA • 9. A •d 10. A d• 11. E •bB 12. E b•B 13. E bB• 14. B •cB 15. B c•B 16.B cB• 17. B •d 18. B d• c c 2:Ea•A A •cA A •dc a 0: S`•E E •aA E •bB 4:Ac•A A •cA A •d E b d d A 8:Ac A • 10:A d • 6:EaA • 1: S` E • 3: Eb•B B •cB B •d B 5: Bc•B B •cB B •d d 11:B d • B 9:BcB • c c A 7:EbB• d 12、SLR Parsing Table Algorithm – Input. An augmented grammar G` – Output. The SLR parsing table functions action and goto for G` – Method. – (1) Construct C={I0,I1,…In}, the collection of sets of LR(0) items for G`. – (2) State i is constructed from Ii. The parsing actions for state i are determined as follows: (a) If [A•a] is in Ii and goto(Ii,a)= Ij, then set ACTION[i,a]=“Shift j”, here a must be a terminal. (b) If [A• ]Ik, then set ACTION[k,a]=rj for all a in follow(A); here A may not be S`, and j is the No. of production A . – (3) The goto transitions for state I are constructed for all non terminals A using the rule: if goto (Ii,A)= Ij, then goto[i,A]=j – (4) All entries not defined by rules 2 and 3 are made “error” – (5) The initial state of the parser is the one constructed from the set of items containing [S` S•]. – If any conflicting actions are generated by the above rules, we say the grammar is not SLR(1). e.g. construct the SLR(1) table for the following grammar 0. S` E 1. E E+T 2. E T 3. T T*F 4.T F 5. F (E) 6. F i i I0:S’E T I2:E T E E+T T T*F E T T T*F E I1:S’ E E E+T T F F (E) ( F i I4:F’(E) E E+T F i E T i T T*F I5:F i T F F (E) F I3:T F F i ( T I2 I5 * I7:T T*F F I10:T T*F F (E) ( I4 F i * I9:E E+T I : E E+T + 6 T TT * F T T*F ( T F F (E) F i E I8:F (E) E E+T ) F I3 i I5 I11:F (E) state 0 1 2 3 4 5 6 7 8 9 10 11 i S5 + S6 r2 r4 ACTION * ( S4 ) # accept S7 r4 S5 r2 r4 r2 r4 S4 r6 r6 S5 S5 8 r6 r3 r5 S11 r1 r3 r5 2 3 9 3 10 r6 S4 S4 S6 r1 r3 r5 GOTO E T F 1 2 3 r1 r3 r5 E.G. 1. S` S 2. S L=R 3. S R 4. L *R 5. L i 6. R L 0: S`•S S •L=R S •R L •*R L •I R •L S L 1: S`S• 2: SL•=R R L• R * i 3:SR• 4:L*•R R •L * L •*R L •i 7:L*R• R L 8:RL• i 5:Li • i 6: SL=•R = R •L L •*R L •i * L R 9:SL=R• state = 0 1 2 3 4 5 6 7 8 9 ACTION i * S5 S4 S 1 R 3 8 7 8 9 acc r6 r3 S6/ r6 S5 S4 r5 r5 S5 r4 r6 # GOTO L 2 S4 r4 r6 r2 Notes: In the above grammar , the shift/reduce conflict arises from the fact that the SLR parser construction method is not powerful enough to remember enough left context to decide what action the parser should take on input = having seen a string reducible to L. That is “R=“ cannot be a part of any right sentential form. So when “L” appears on the top of stack and “=“ is the current character of the input buffer , we can not reduce “L” into “R”. • 在SLR方法中,若I中有A,当读头为 aFollow(A),但是也不一定能够采用A 归约,因为为栈顶时,栈里也可能有 viable prefix “”,而“”作为活前缀未 必允许归约为A,因为可能没有一个句型 含有Aa, • 例如“R=”不是任何活前缀。 Method-LR(1) • 每个LR(0)项目添加展望信息:句柄之后可 能跟的k个终结符。 • (A•, a)的含义:预期当栈顶句柄 形成后,在读头下读到a。此时, 在栈内,还未入栈,即它展望了句柄 后的一个符号。 • 若存在规范推导S`A,其中 称规范句型的活前缀(记作),且 aFirst(),则LR(1)项目(A•,a)对于活 前缀是有效的。注:1)如果bFirst(),即使 bFollow(A),项目( A •,a)也是无效的。 13、LR(1) item • How to rule out invalid reductions? – By splitting states when necessary, we can arrange to have each state of an LR parser indicate exactly which input symbols can follow a handle for which there is a possible reduction to A. • Item (A•,a) is an LR(1) item, “1” refers to the length of the second component, called the look-ahead of the item. Note: 1)The look-ahead has no effect in an item of the form (A•,a), where is not ,but an item of the form (A•,a) calls for a reduction by A only if the next input symbol is a. 2)The set of such a’s will always be a proper subset of FOLLOW(A). Why? 14、Valid LR(1) item Formally, we say LR(1) item (A•,a) is valid for a viable prefix if there is a derivation S`A, where – = ,and – Either a is the first symbol of , or is and a is #. 15、Construction of the sets of LR(1) items – Input. An augmented grammar G` – Output. The sets of LR(1) items that are the set of items valid for one or more viable prefixes of G`. – Method. The procedures closure and goto and the main routine items for constructing the sets of items. function closure(I); { do { for (each item (A•B,a) in I, each production B in G`, and each terminal b in FIRST(a) such that (B• ,b) is not in I ) add (B• ,b) to I; }while there is still new items add to I; return I } function goto(I, X); { let J be the set of items (AX•,a) such that (A• X ,a) is in I ; return closure(J) } Void items (G`); {C={closure({ (S`•S,#)})}; do { for (each set of items I in C and each grammar symbol X such that goto(I, X) is not empty and not in C ) add goto(I, X) to C } while there is still new items add to C; } e.g.compute the items for the following grammar: 1. S` S 2. S CC 3. C cC|d Answer: the initial set of items is I0: I0 S` •S,# S•CC,# C•cC, c|d C•d,c|d Now we compute goto(I0,X) for the various values of X. And then get the goto graph for the grammar. I0: S' -> •S, # I6: C -> c•C, # S -> •CC, # C -> •cC, # C -> •cC, c/d C -> •d, # C -> •d, c/d I1: S' -> S•, # I8: C -> cC•, c/d I2: S -> C•C, # C -> •cC, # C -> •d, # I3: C -> c•C, c/d C -> •cC, c/d C -> •d, c/d I5: S -> CC•, # I7: C -> d•, # I9: C -> cC•, # I4: C -> d•, c/d s C C c c C d c d c d C d 16、Construction of the canonical LR parsing table – Input. An augmented grammar G` – Output. The canonical LR parsing table functions action and goto for G` – Method. (1) Construct C={I0,I1,…In}, the collection of sets of LR(1) items for G`. (2) State i is constructed from Ii. The parsing actions for state i are determined as follows: a) If [A•a,b] is in Ii and goto(Ii,a)= Ij, then set ACTION[i,a]=“Shift j”, here a must be a terminal. b) If [A• ,a]Ii, A!=S`,then set ACTION[i,a]=rj; j is the No. of production A . c) If [S`•S,#]is in Ii, then set ACTION[i,#] to “accept” (3) The goto transitions for state i are determined as follows: if goto (Ii,A)= Ij, then goto[i,A]=j. (4) All entries not defined by rules 2 and 3 are made “error” (5) The initial state of the parser is the one constructed from the set of items containing [S`•S,#]. – If any conflicting actions are generated by the above rules, we say the grammar is not LR(1). E.g.construct the canonical parsing table for the following grammar: 1. S` S 2. S CC 3. C cC 4. C d state 0 1 2 3 4 5 6 7 8 9 c S3 Action d S4 goto # S 1 C 2 acc S6 S3 r3 S7 S4 r3 5 8 r1 S6 S7 9 r3 r2 r2 r2 Notes: 1)Every SLR(1) grammar is an LR(1) grammar 2)The canonical LR parser may have more states than the SLR parser for the same grammar. 17、LALR(lookahead-LR) 1)Basic idea Merge the set of LR(1) items having the same core (1)When merging, the GOTO sub-table can be merged without any conflict, because GOTO function just relies on the core (2) When merging, the ACTION sub-table can also be merged without any conflicts, but it may occur the case of merging of error and shift/reduce actions. We assume non-error actions (3)After the set of LR(1) items are merged, an error may be caught lately, but the error will eventually be caught, in fact, it will be caught before any more input symbols are shifted. (4)After merging, the conflict of reduce/reduce may be occurred. 2)The sets of LR(1) items having the same core – The states which have the same items but the look-ahead symbols are different, then the states are having the same core. Notes: We may merge these sets with common cores into one set of items. 18、An easy, but space-consuming LALR table construction • Input. An augmented grammar G` • Output. The LALR parsing table functions action and goto for G` • Method. – (1) Construct C={I0,I1,…In}, the collection of sets of LR(1) items. – (2) For each core present among the set of LR(1) items, find all sets having that core, and replace these sets by their union. – (3) Let C`={J0,J1,…Jm}be the resulting sets of LR(1) items. The parsing actions for state I are constructed from Ji. If there is a parsing action conflict, the algorithm fails to produce a parser, and the grammar is not a LALR. – (4) The goto table is constructed as follows. – If J is the union of one or more sets of LR(1) items, that is , J= I1I2 … Ik then the cores of goto(I1,X), goto(I2,X),…, goto(Ik,X)are the same, since I1,I2,…In all have the same core. Let K be the union of all sets of items having the same core as goto (I1,X). then goto(J,X)=k. If there is no parsing action conflicts , the given grammar is said to be an LALR(1) grammar sta te 0 1 2 3 4 5 6 7 8 9 Action goto c d # S C S3 S4 1 2 acc S6 S7 5 S3 S4 8 r3 r3 r1 S6 S7 9 r3 r2 r2 r2 Parsing string ccd 4. 6 Using ambiguous grammars 1、Using Precedence and Associativity to Resolve Parsing Action Conflicts Grammar: EE+E|E*E|(E)|i E E+T|T T T*F|F F (E)|i i+i+i*i+i With LR idea,according other conditions, analyze ambiguity Grammar。Steps: 1、Construct LR(0) parsing table; 2、if Conflicts happens, solve them with SLR ; 3、The rest conflicts are solved by other conditions E.g: E` E E E+E|E*E|(E)|I 1) LR(0) Parsing Table 2) SLR E.G I1: E` E• E E•+E E E•*i Re-Shift conflict 3)Other conflicts E.g:I7, E` E+E• E E•+E E E•*E Re-Shift conflict state 0 1 2 3 4 5 6 7 8 9 i S3 + S4 S3 r4 S3 S3 S4 r1/S4 r2/S4 r3 ACTION * ( S2 S5 S2 r4 S2 S2 S5 S5/r1 r2/S5 r3 ) # GOTO S 1 acc 6 r4 r4 7 8 S9 r1 r2 r3 r1 r2 r3 For ACTION[7, *], reduction or shift? “Shift” because “*” is superior For ACTION[7, +] reduction or shift? “Shift” because the left “*” is superior 状态 0 1 2 3 4 5 6 7 8 9 i S3 S3 S3 S3 ACTION * + ( S2 S4 S5 S2 r4 r4 S2 S2 S4 S5 r1(S4) S5 (r1) r2(S4) r2(S5) r3 r3 ) # GOTO S 1 acc 6 r4 r4 7 8 S9 r1 r2 r3 r1 r2 r3 2、The “Dangling-else” Ambiguity Grammar: S’S S if expr then stmt else stmt |if expr then stmt |other S’S S iSeS|iS|a state 0 1 2 3 4 5 6 ACTION i S2 e a S3 # GOTO S 1 acc S2 r4 r3 S2 r1 r4 S5/r3 r1 S3 r4 r3 S3 r1 4 r4 6 state 0 1 2 3 4 5 6 ACTION i S2 e a S3 # GOTO S 1 acc S2 r4 r3 S2 r1 r4 S5/r3 r1 S3 r4 r3 S3 r1 4 r4 6 4. 7 Parser Generator Yacc 1、Creating an input/output translator with Yacc Yacc specification translate.y y.tab.c Yacc y.tab.c Compiler C a.out Compiler input a.out output 2、Three parts of a Yacc source program declaration %% translation rules %% supporting C-routines Notes: The form of a translation rule is as followings: <Left side>: <alt> {semantic action} Syntax Analysis Context-Free Grammar Push-down Automation Specification Tool Top-down DerivationMatching Recursivedescent Table-driven Top-down, Skill Bottom-UP Methods Bottom-Up Shift-Reducing Predictive Precedence First,Follow FIRSTVT LASTVT LR Parsing Layered Automation SLR(1) LR(1) LALR(1) Recursive Descent Analyses Advantages: Easy to write programs Disadvantages: Backtracking, poor efficiency a Skills : First, Follow Disadvantages: More preprocesses(Elimination of left recursions , Extracting maximum common left factors) A ………. Predictive Analyses : predict the production which is used when a non-terminated occurs on top of the analyses stack Controller LL(1) Parse Table First() A Follow(A) A Bottom-up ---Operator Precedence Analyses Skills : Shift– Reduce , FIRSTVT, LASTVT Disadvantages: Strict grammar limitation, poor reduce mechanism b Simple LR Analyses : based on determined FA, state stack and symbol stack (two stacks) E a Controller Skills : LR item and Follow(A) …. Disadvantages: cannot solve the problems of shift-reduce conflict and reduce-reduce conflict OP Parse Table FIRSTVT() A LR(1) analyses LASTVT() A SLR(1) Parser: b a i …. # 0 symbol state Controller SLR(1) Parse Table LR items (Shift items, Reducible items) LR item –extension (AB) (B) Follow(A) A Canonical LR Analyses(LR(1)) Skills : LR(1) item and Look-ahead symbol Disadvantages: more states LALR(1) Skills : Merge states with the same core Disadvantages: maybe cause reduce-reduce conflict LR(1) Parser: b a i …. # 0 symbol state Controller LR(1) Parse Table LR items (Shift items, Reducible items) LR item –extension (AB,a) (B,first(a) ) Generation of Parse Tree E.g. construct the parse tree for the string “i+i*i” under SLR(1) of the following grammar 0. S` E 1. E E+T 2. E T 3. T T*F 4.T F 5. F (E) 6. F i state 0 1 2 3 4 5 6 7 8 9 10 11 i S5 + S6 r2 r4 ACTION * ( S4 ) # accept S7 r4 S5 r2 r4 r2 r4 S4 r6 r6 S5 S5 8 r6 r3 r5 S11 r1 r3 r5 2 3 9 3 10 r6 S4 S4 S6 r1 r3 r5 GOTO E T F 1 2 3 r1 r3 r5 E E T T T F F F i + i * i Exercises • Constructing the related LL(1) parsing table. Pb S d SS ; A|A AB|C Ba CD|D e A DE B Ei F t Fb • Please show that the following operator grammar is whether an operator precedence grammar by constructing the related parsing table. SS ; G|G GG(T)|H Ha|(S) TT+S|S • Please construct a LR(1) parsing table for the following two ambiguous grammar with the additional conditions:. Sif S else S|if S|S;S|a that else dangles with the closest previous unmatched if , ; has the property of left associative CC and C|C or C|b that or has higher precedence than that of and, and has the property of right associative, or has the property of right associative.