Syntax Analysis & Parsing Techniques

Syntax Analysis(Parsing) Parser ● Parser is a compiler that is used to break the data into smaller elements coming from the lexical analysis phase. ● A parser takes input in the form of a sequence of tokens and produces output in the form of a parse tree. ● Parsing is of two types: top down parsing and bottom up parsing. Recursive Descent Parsing Recursive descent is a top-down parsing technique that constructs the parse tree from the top and the input is read from left to right. It uses procedures for every terminal and non-terminal entity. This parsing technique recursively parses the input to make a parse tree, which may or may not require back-tracking. But the grammar associated with it (if not left factored) cannot avoid back-tracking. A form of recursive-descent parsing that does not require any back-tracking is known as predictive parsing. Back-tracking 1 Top- down parsers start from the root node (start symbol) and match the input string against the production rules to replace them (if matched). To understand this, take the following example of CFG: S → rXd | rZd X → oa | ea Z → ai For an input string: read, a top-down parser, will behave like this: It will start with S from the production rules and will match its yield to the left-most letter of the input, i.e. ‘r’. The very production of S (S → rXd) matches with it. So the top-down parser advances to the next input letter (i.e. ‘e’). The parser tries to expand non-terminal ‘X’ and checks its production from the left (X → oa). It does not match with the next input symbol. So the top-down parser backtracks to obtain the next production rule of X, (X → ea). Now the parser matches all the input letters in an ordered manner. The string is accepted. Predictive Parser Predictive parser is a recursive descent parser, which has the capability to predict which production is to be used to replace the input string. The predictive parser does not suffer from backtracking. To accomplish its tasks, the predictive parser uses a look-ahead pointer, which points to the next input symbols. To make the parser back-tracking free, the predictive parser puts some constraints on the grammar and accepts only a class of grammar known as LL(k) grammar. 2 Predictive parsing uses a stack and a parsing table to parse the input and generate a parse tree. Both the stack and the input contains an end symbol $ to denote that the stack is empty and the input is consumed. The parser refers to the parsing table to take any decision on the input and stack element combination. 3 In recursive descent parsing, the parser may have more than one production to choose from for a single instance of input, whereas in predictive parser, each step has at most one production to choose. There might be instances where there is no production matching the input string, making the parsing procedure to fail. LL Parser An LL Parser accepts LL grammar. LL grammar is a subset of context-free grammar but with some restrictions to get the simplified version, in order to achieve easy implementation. LL grammar can be implemented by means of both algorithms namely, recursive-descent or table-driven. LL parser is denoted as LL(k). The first L in LL(k) is parsing the input from left to right, the second L in LL(k) stands for left-most derivation and k itself represents the number of look aheads. Generally k = 1, so LL(k) may also be written as LL(1). LL Parsing Algorithm We may stick to deterministic LL(1) for parser explanation, as the size of table grows exponentially with the value of k. Secondly, if a given grammar is not LL(1), then usually, it is not LL(k), for any given k. Given below is an algorithm for LL(1) Parsing: FIRST(X) for a grammar symbol X is the set of terminals that begin the strings derivable from X. Rules to compute FIRST set: 1. If x is a terminal, then FIRST(x) = { ‘x’ } 2. If x-> Є, is a production rule, then add Є to FIRST(x). 4 3. If X->Y1 Y2 Y3….Yn is a production, 1. FIRST(X) = FIRST(Y1) 2. If FIRST(Y1) contains Є then FIRST(X) = { FIRST(Y1) – Є } U { FIRST(Y2) } 3. If FIRST (Yi) contains Є for all i = 1 to n, then add Є to FIRST(X). Example 1: Production Rules of Grammar E -> TE’ E’ -> +T E’|Є T -> F T’ T’ -> *F T’ | Є F -> (E) | id FIRST sets FIRST(E) = FIRST(T) = { ( , id } FIRST(E’) = { +, Є } FIRST(T) = FIRST(F) = { ( , id } FIRST(T’) = { *, Є } FIRST(F) = { ( , id } Example 2: Production Rules of Grammar S -> ACB | Cbb | Ba A -> da | BC B -> g | Є C -> h | Є 5 FIRST sets FIRST(S) = FIRST(ACB) U FIRST(Cbb) U FIRST(Ba) = { d, g, h, b, a, Є} FIRST(A) = { d } U FIRST(BC) = { d, g, h, Є } FIRST(B) = { g , Є } FIRST(C) = { h , Є } Follow(X) to be the set of terminals that can appear immediately to the right of Non-Terminal X in some sentential form. Example: S ->Aa | Ac A ->b S / A S \ / a A | | b b \ C Here, FOLLOW (A) = {a, c} Rules to compute FOLLOW set: 1) FOLLOW(S) = { $ } // where S is the starting Non-Terminal 2) If A -> pBq is a production, where p, B and q are any grammar symbols, then everything in FIRST(q) except Є is in FOLLOW(B). 3) If A->pB FOLLOW(B). is a production, then 6 everything in FOLLOW(A) is in 4) If A->pBq is a production and FIRST(q) contains Є, then FOLLOW(B) contains { FIRST(q) – Є } U FOLLOW(A) Example 1: Production Rules: E -> TE’ E’ -> +T E’|Є T -> F T’ T’ -> *F T’ | Є F -> (E) | id FIRST set FIRST(E) = FIRST(T) = { ( , id } FIRST(E’) = { +, Є } FIRST(T) = FIRST(F) = { ( , id } FIRST(T’) = { *, Є } FIRST(F) = { ( , id } FOLLOW Set FOLLOW(E) = { $ , ) } // Note FOLLOW(E’) = FOLLOW(E) = { FOLLOW(T) } $, ) } // See 1st production rule = { FIRST(E’) – Є } U FOLLOW(E’) U FOLLOW(E) = { + , $ , ) FOLLOW(T’) = FOLLOW(T) = FOLLOW(F) ) } ')' is there because of 5th rule = { FIRST(T’) – { + , $ , ) } Є } U FOLLOW(T’) U FOLLOW(T) = { *, +, $, Example 2: Production Rules: 7 S -> aBDh B -> cC C -> bC | Є D -> EF E -> g | Є F -> f | Є FIRST set FIRST(S) = { a } FIRST(B) = { c } FIRST(C) = { b , Є } FIRST(D) = FIRST(E) U FIRST(F) = { g, f, Є } FIRST(E) = { g , Є } FIRST(F) = { f , Є } FOLLOW Set FOLLOW(S) = { $ } FOLLOW(B) = { FIRST(D) – Є } U FIRST(h) = { g , f , h } FOLLOW(C) = FOLLOW(B) = { g , f , h } FOLLOW(D) = FIRST(h) = { h } FOLLOW(E) = { FIRST(F) – Є } U FOLLOW(D) = { f , h } FOLLOW(F) = FOLLOW(D) = { h } Example 3: Production Rules: S -> ACB|Cbb|Ba A -> da|BC 8 B-> g|Є C-> h| Є FIRST set FIRST(S) = FIRST(A) U FIRST(B) U FIRST(C) = { d, g, h, Є, b, a} FIRST(A) = { d } U {FIRST(B)-Є} U FIRST(C) = { d, g, h, Є } FIRST(B) = { g, Є } FIRST(C) = { h, Є } FOLLOW Set FOLLOW(S) = { $ } FOLLOW(A) = { h, g, $ } FOLLOW(B) = { a, $, h, g } FOLLOW(C) = { b, g, $, h } 9 Bottom Up parser/ Shift- Reduce Parser/ LR( K ) parser __________________________________________ ● Working of LR ( K ) Parser ● Meaning of LR ( K ) L: Left to Right scanning of input string R : Reverse of Rightmost Derivation K : k input symbol is used to choose alternative production ● Types of LR( K ) parser 10 Start Symbol -> -> -> Input String ----- Top Down Input string -> -> -> Start Symbol --------Bottom Up ● Example of LR( K ) parsing Input:Grammar Input String : set of terminal symbols Find substring matching completely with RHS of any production. If it is matching replace matched substring with LHS symbol Input String: begin print num = num ; end Input String begin print num = num ; end Action Reduce E-> num=num begin print E; end begin print E; end Reduce S-> print E begin S; end Reduce S’ -> S begin S’ ; end No successful parsing. Go for another option begin S’ ;ε end Reduce L -> ε begin S; L end Reduce L -> S;L begin L end Reduce S -> begin L end S S’ Reduce S’ -> S begin print E ; end -> begin S; end -> begin L end -> S ->S’ 11 Shift Reduce Parsing: 1. E -> T + E 2. E ->T 3. T -> int * T 4. T -> int 5. T-> ( E ) γ = int * int + int A -> β = T -> int E T+E T+T T + int int * T + int int * int + int int * int + int int * T + int T +int 12 T+T T+E E Input String : int * int + int ● Leftmost Derivation -> Top Down Parsing E -> T + E -> int * T + E -> int * int +E -> int * int + T -> int * int + int ● Reverse of Rightmost Derivation -> Bottom up Parsing E ->T+ E ->T+T ->T +int ->int * T + int ->int * int + int ● Sentential Form Steps in derivation from start symbol to input string. ● Right sentential form Steps in rightmost derivation ● Handle and Viable Prefix The handle of right sentential form γ is a production A ->β with the position of β in γ in such a way that the replacement of β by A in γ produces the previous sentential form in the rightmost derivation of γ. A viable prefix of a right sentential form is that prefix that contains a handle but no symbol to the right of the handle. Example: Find Handle and viable prefix Consider the grammar 13 S-> 0A1 | 0B A -> 0A | 0 Input String “00001” Handle information: To find handle derive the string using Rightmost derivation in reverse S -> 0A1 -> 00A1 -> 000A1 -> 00001 RMD Handle 00001$ A -> 0 following the third ‘0’ 000A1$ A -> 0A Following the second ‘0’ 00A1$ A -> 0A Following the first ‘0’ 0A1$ S -> 0A1 preceding $ S$ Viable prefix: Use shift reduce method 1. Initially the stack contains $ and the buffer contains a string ending with $. The first step is to shift 2. If the top of stack contains handle then reduce; otherwise continue to shift till we obtain the handle on top of stack 3. At the end we obtain the start symbol in the stack and the $ in the buffer the string is accepted. When we obtain a handle on top of the stack it is called a viable prefix. Stack Action Buffer $ $0 $00 $000 $0000 $000A $00A $0A Shift 0 Shift 0 Shift 0 Shift 0 Reduce with A -> 0 Reduce with A -> 0A Reduce with A -> 0A Shift 1 00001$ 001$ 001$ 01$ 1$ 1$ 1$ 1$ 14 Viable Prefix 0000 000A 00A $0A1 $S Reduce with S -> 0A1 Accepted $ $ 15 0A1 ● Generalized steps of construction of LR(k) parsing table 1. For the given grammar take the augmented grammar. 2. Create a canonical collection of LR(k) items. 3. Draw the DFA and prepare the parsing table. ● Augmented grammar Add one more production S’ - > S ------ New production S-> AB A ->a B -> b { ab } ● LR(0) items A production with dot( .) at any point on the RHS is called an LR( 0 ) item. Example. A -> abc A -> .abc A -> a .bc (indicates viable prefix a) A -> ab .c ( indicates viable prefix ab) A -> abc . ( Final item / Non Kernel Item Indicates handle abc) ● Canonical collection C = { I0, I1, I2,…. In} is the canonical collection of items To get it we consider two functions 1. CLOSURE( I ) If A -> α .Bβ is in I and B ->γ is in grammar G then add B -> .γ to the closure of I. Apply above steps in recursive manner̄ 2. GOTO( I , X) 16 It is a closure of A -> αX .β such that A -> α .Xβ is in I Example: E -> E + T |T T -> T * F | F F -> id Step 1 : Augmented grammar Add one production E’ -> E Step 2: Create Canonical collection of LR(0) items Input string : id + id E -> E + T -> E + F -> E + id -> T + id -> F + id -> id + id 17 LR (0)Parsing table: Reduce entry is placed in all the columns of terminal symbols. Table contains Shift /reduce conflict -> Given grammar is not LR( 0 ) grammar. Check given grammar is LR(0) Shift Reduce I0 I1 I2 I3 I4 I5 I6 I7 I8 id S4 R2 R4 R5 S4 S4 R1 R3 “+” S5 R2 R4 R5 R1 R3 “*” “$” S6/R2 R4 R5 Accept R2 R4 R5 S6/R1 R3 E 1 T 2 F 3 7 3 8 R1 R3 Initial Configuration Stack contents: $ first state Input Pointer point to leftmost symbol ● LR( 0 ) Parsing Example 1 Input string : id Stack Input Action $0 id$ Table M[0,id] -> S4 Shift id onto stack Shift state no 4 onto stack Increment input pointer $0 id 4 $ Table M [4,$] -> R5 Reduce Production no 5 : F -> id Remove 2 topmost symbol from stack id, 4 Push LHS symbol onto stack: F Table M [0, F] -> 3 18 Push 3 onto stack $0F3 $ Table M [3,$] -> R4 Reduce production no 4 : T -> F Remove 2 topmost symbol from stack F, 3 Push LHS symbol onto stack: T Table M[ 0, T] -> 2 Push 2 onto stack $0T2 $ Table M [2, $] -> R2 Reduce production no 2: E -> T Remove 2 topmost symbol from stack T, 2 Push LHS symbol onto stack: E Table M [0 , E] -> 1 Push 1 onto stack $0E1 $ Table M [1, $] -> Accept Stop Successful parsing ● LR( 0 ) Parsing Example 1 Input string : id + id Stack Input Action $0 id + id$ Table M[0,id] entry-> S4 Push id onto stack Shift state 4 onto top of stack Increment input pointer $0 id 4 + id $ Table M [4, +] entry -> R5 Reduce Production no. 5 F -> id Remove symbol : id,4 Push F Goto Table M[0, F] -> 3 and push it Push 3 $0F3 + id Goto Table M[3,+] -> R4 19 Reduce Production no 4 , T-> F Remove Symbol F, 3 Push T Goto Table M[0,T]-> 2 and Push it Push 2 $0 T 2 + id $ Goto Table M [2, +] -> R2 Reduce Production no 2 E -> T Remove Symbol T, 2 Push E Goto Table M [0, E] -> 1 Push 1 on Stack $0E1 + id $ Goto Table M [1, +] -> S5 Shift + and 5 onto stack Increment input pointer $0 E1 + 5 id $ Goto Table M [5, id] -> S4 Shift id and 4 onto stack Increment input pointer $ 0 E 1 + 5 id 4 $ Goto Table M [4, $] -> R5 Reduce Production no 5 F -> id Remove symbol id, 4 Push F onto stack Goto Table M [5, F] -> 3 Push 3 onto stack $0 E1+5 F3 $ Goto Table M [3, $] -> R4 Reduce Production No 4 , T -> F Remove symbol F, 3 Push T onto stack Goto Table M [5, T] -> 7 Push 7 onto stack $0 E1+5 T7 $ Goto Table M [7, $] -> R1 Reduce Production No 1 , E -> E + T Remove symbol E ,1 ,+ 5 , T, 7 Push E onto stack Goto Table M [0, E] -> 1 20 Push 1 onto stack $0 E1 $ Goto Table M [1, $] -> Accept $0 E ● LR(K) Parsing algorithm Example 2: Grammar S -> ( L ) | a L -> L , S | s 21 Answer: Step 1: S’ -> S since S is start symbol Non Symbols Terminal Symbols States/Symbo l I0 I1 I2 I3 I4 I5 I6 I7 I8 ( S2 ) , a S3 $ S 1 L 5 4 Accept S2 S3 R2 R4 R1 S2 R3 R2 S6 R4 R1 R3 R2 S7 R4 R1 R3 R2 R2 R4 R1 S3 R3 R4 R1 8 R3 Input String : ( a ) Stack terminal Input Action 22 $0 ( a )$ Table M[ 0 ,( ] -> S2 Shift ( onto stack Shift 2 into stack Increment input pointer $0 ( 2 a )$ Table M [ 2, a] -> S3 Shift a onto stack Shift 3 onto stack Increment input pointer $0 ( 2 a 3 )$ Table M [3, )] -> R2 Reduce production no 2 : S ->a Remove 2 topmost symbol from stack: a ,3 Push LHS symbol onto stack : S Table M [2, S] -> 5 Push 5 $0 ( 2 S 5 )$ Table M [5, )] -> R4 Reduce production no 4 : L ->S Remove 2 topmost symbol from stack: S ,5 Push LHS symbol onto stack : L Table M [2, L] -> 4 Push 4 $0 ( 2 L 4 )$ Table M[ 4 ,) ] -> S6 Shift ) onto stack Shift 6 into stack Increment input pointer $0 ( 2 L 4 ) 6 $ Table M [6, $] -> R1 Reduce production no 1 : S ->( L ) Remove 6 topmost symbol from stack: Push LHS symbol onto stack : S Table M [0, S] -> 1 Push 1 $0S1 $ Accept 23 ● SLR(1) Parser All the steps of construction of the parsing table are the same except for the reduced entry. Rule for making reduce entry: 1. Goto each state 2. Check if it has final item 3. If the final item is A -> Ba. then calculate FOLLOW( A ) 4. Add reduce entry only in the column of terminal symbol which are present in FOLLOW( A ) Example 1: Grammar:E -> E + T | T T -> T * F | F F -> id Step 1 : Augmented Grammar E’ -> E E -> E + T |T T -> T * F | F F -> id Step 2 : Canonical Collection of LR(0) items 3. Construct Parsing table R2 :E -> T. Follow( E ) ={ $, +} R4: T -> F. Follow( T ) = { *, $, + } R5 : F -> id. Follow( F ) = {*, $, + } R1 : E -> E + T . Follow( E ) ={ $, +} R3 : T -> T * F . Follow ( T ) ={ *, $, + } 24 I0 I1 I2 I3 I4 I5 I6 I7 I8 id S4 “+” S5 R2 R4 R5 “*” “$” S6 R4 R5 Accept R2 R4 R5 S4 S4 R1 R3 S6 R3 R1 R3 Example 2: Grammar S-> Aa | bAc | Bc | bBa A -> d B -> d Answer: State 5: R5: A ->d. Follow( A )= { a, c} R6: B ->d. Follow ( B ) = { a, c} State 6: R1: S-> Aa. Follow ( S ) = { $ } State I9: R3: S-> Bc. Follow ( S ) = { $ } 25 E 1 T 2 F 3 7 3 8 State 110: R2: S -> bAc. Follow ( S ) = { $ } State 111: R4: S -> bBa. Follow ( S ) = { $ } a I0 I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 b S3 c d S5 $ S 1 A 2 B 4 7 8 Accept S6 S5 R5/R6 S9 R5/R6 R1 S10 S11 R3 R2 R4 SLR(1) parsing table Given parser has reduce/reduce conflict. So the given grammar is not SLR(1) grammar. LR(1) parser 1. It uses LR(1) items LR(0) item + Lookahead symbol = LR( 1) item Eg. 2. It has two types a. Canonical LR(1) parser (CLR(1)) 26 b. Lookahead LR(1) parser (LALR(1)) Fig. Types of LR(1) Parser Types of Conflict: 1. Shift/Reduce Conflict A -> α.a β, a B -> γ. , a 2. Reduce/ Reduce Conflict A -> α . , a R2 B -> γ. , a R3 Rules for construction of CLR(1)/LALR(1) Parsing Table 1. Add everything from input to output. 2. If A -> α. Bβ, $ is in Ii B -> γ is in the grammar then Add B -> .γ, FIRST(β $) to the closure if Ii. 3. Repeat step 2 for every newly added LR(1) item. 4. Construct the parsing table. It it is free from Shift/Reduce and Reduce/ Reduce conflict the given grammar is CLR(1) or LALR(1) 27 28 Example: Grammar: CLR(1) Parsing Table. I0 a b S3 S4 I1 $ S A 1 2 Accept I2 S6 S7 5 I3 S3 S4 8 I4 R3 R3 I5 I6 R1 S6 S7 I7 I8 9 R3 R2 R2 29 I9 R2 LALR(1) Parser In CLR(1) we can find some states with common LR(0) items differ by follow symbols. In this parser the states with common LR(0) items differ by follow symbol/s are combined. The number of entries in the parsing table are reduced. a I0 R3 b R4 I1 I2 $ S A 1 2 Accept S4 I3 S5 I4 R3 6 30 B 3 I5 R4 I6 I7 7 S8 S9 I8 R1 I9 R2 31 LALR Parser (with Examples) LALR Parser : LALR Parser is a lookahead LR parser. It is the most powerful parser which can handle large classes of grammar. The size of the CLR parsing table is quite large as compared to other parsing tables. LALR reduces the size of this table.LALR works similar to CLR. The only difference is , it combines the similar states of the CLR parsing table into one single state. The general syntax becomes [A->∝.B, a ] where A->∝.B is production and a is a terminal or right end marker $ LR(1) items=LR(0) items + look ahead How to add lookahead with the production? CASE 1 –A->∝.BC, a Suppose this is the 0th production.Now, since ‘ . ‘ precedes B,so we have to write B’s productions as well. B->.D [1st production] Suppose this is B’s production. The look ahead of this production is given as- we look at previous production i.e. – 0th production. Whatever is after B, we find FIRST(of that value) , that is the 32 lookahead of 1st production. So, here in 0th production, after B, C is there. Assume FIRST(C)=d, then 1st production become. B->.D, d CASE 2 – Now if the 0th production was like this, A->∝.B, a Here,we can see there’s nothing after B. So the lookahead of 0th production will be the lookahead of 1st production. ie- B->.D, a CASE 3 – Assume a production A->a|b A->a,$ [0th production] A->b,$ [1st production] Here, the 1st production is a part of the previous production, so the lookahead will be the same as that of its previous production. Steps for constructing the LALR parsing table : 33 1. Writing augmented grammar 2. LR(1) collection of items to be found 3. Defining 2 functions: goto[list of terminals] and action[list of non-terminals] in the LALR parsing table EXAMPLE Construct CLR parsing table for the given context free grammar S-->AA A-->aA|b Solution: STEP1- Find augmented grammar The augmented grammar of the given grammar is:S'-->.S ,$ [0th production] S-->.AA ,$ [1st production] A-->.aA ,a|b [2nd production] A-->.b ,a|b [3rd production] 34 Let’s apply the rule of lookahead to the above productions. ● The initial look ahead is always $ ● Now,the 1st production came into existence because of ‘ . ‘ before ‘S’ in 0th production.There is nothing after ‘S’, so the lookahead of 0th production will be the lookahead of 1st production. i.e. : S–>.AA ,$ ● Now,the 2nd production came into existence because of ‘ . ‘ before ‘A’ in the 1st production. After ‘A’, there’s ‘A’. So, FIRST(A) is a,b. Therefore, the lookahead of the 2nd production becomes a|b. ● Now,the 3rd production is a part of the 2nd production.So, the look ahead will be the same. STEP2 – Find LR(0) collection of items Below is the figure showing the LR(0) collection of items. We will understand everything one by one. 35 The terminals of this grammar are {a,b} The non-terminals of this grammar are {S,A} RULES – 1. If any non-terminal has ‘ . ‘ preceding it, we have to write all its production and add ‘ . ‘ preceding each of its production. 2. from each state to the next state, the ‘ . ‘ shifts to one place to the right. ● In the figure, I0 consists of augmented grammar. ● Io goes to I1 when ‘ . ‘ of 0th production is shifted towards the right of S(S’->S.). This state is the accept state . S is seen by the compiler. Since I1 is a part of the 0th production, the lookahead is same i.e. $ ● Io goes to I2 when ‘ . ‘ of 1st production is shifted towards right (S->A.A) . A is seen by the compiler. Since I2 is a part of the 1st production, the lookahead is same i.e. $. ● I0 goes to I3 when ‘ . ‘ of 2nd production is shifted towards the right (A->a.A) . a is seen by the compiler.since I3 is a part of 2nd production, the lookahead is same i.e. a|b. ● I0 goes to I4 when ‘ . ‘ of 3rd production is shifted towards right (A->b.) . b is seen by the compiler. Since I4 is a part of 3rd production, the lookahead is same i.e. a|b. ● I2 goes to I5 when ‘ . ‘ of 1st production is shifted towards right (S->AA.) . A is seen by the compiler. Since I5 is a part of the 1st production, the lookahead is same i.e. $. ● I2 goes to I6 when ‘ . ‘ of 2nd production is shifted towards the right (A->a.A) . A is seen by the compiler. Since I6 is a part of the 2nd production, the lookahead is same i.e. $. ● I2 goes to I7 when ‘ . ‘ of 3rd production is shifted towards right (A->b.) . A is seen by the compiler. Since I6 is a part of the 3rd production, the lookahead is same i.e. $. ● I3 goes to I3 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is seen by the compiler. Since I3 is a part of the 2nd production, the lookahead is same i.e. a|b. ● I3 goes to I8 when ‘ . ‘ of 2nd production is shifted towards the right (A->aA.) . A is seen by the compiler. Since I8 is a part of the 2nd production, the lookahead is same i.e. a|b. 36 ● I6 goes to I9 when ‘ . ‘ of 2nd production is shifted towards the right (A->aA.) . A is seen by the compiler. Since I9 is a part of the 2nd production, the lookahead is same i.e. $. ● I6 goes to I6 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is seen by the compiler. Since I6 is a part of the 2nd production, the lookahead is same i.e. $. ● I6 goes to I7 when ‘ . ‘ of the 3rd production is shifted towards right (A->b.) . b is seen by the compiler. Since I6 is a part of the 3rd production, the lookahead is same i.e. $. STEP 3 – Defining 2 functions: goto[list of terminals] and action[list of non-terminals] in the parsing table.Below is the CLR parsing table Once we make a CLR parsing table, we can easily make a LALR parsing table from it. In the step2 diagram, we can see that 37 ● I3 and I6 are similar except their lookaheads. ● I4 and I7 are similar except their lookaheads. ● I8 and I9 are similar except their lookaheads. In LALR parsing table construction , we merge these similar states. ● Wherever there is 3 or 6, make it 36(combined form) ● Wherever there is 4 or 7, make it 47(combined form) ● Wherever there is 8 or 9, make it 89(combined form) Below is the LALR parsing table. Now we have to remove the unwanted rows ● As we can see, 36 row has same data twice, so we delete 1 row. ● We combine two 47 row into one by combining each value in the single 47 row. ● We combine two 89 row into one by combining each value in the single 89 row. The final LALR table looks like the below. 38 Classification of Parsers 39 Operator Precedence Parser: This is an example of operator grammar: E->E+E/E*E/id E->EAE/id A -> +/* Language should be same id+id, id*id, id+id*id However, the grammar given below is not an operator grammar because two non-terminals are adjacent to each other: S->SAS/a A->bSb/b Language: (ab)*a, a(ba)* We can convert it into an operator grammar, though: S->SbSbS/SbS/a A->bSb/b Language: (ab)*a, a(ba)* Operator precedence parser – An operator precedence parser is a bottom-up parser that interprets an operator grammar. This parser is only used for operator grammars. Ambiguous grammars are not allowed in any parser except operator precedence parse This parser relies on the following three precedence relations: ⋖, ≐, ⋗ a ⋖ b This means a “yields precedence to” b. a ⋗ b This means a “takes precedence over” b. a ≐ b This means a “has same precedence as” b. 40 E->E+E/E*E/id $ having least precedence, id has higher precedence than other operator + * Id $ + > < < > * > > < > Id > > -- > $ < < < Accept E->E+E/E*E/id There is not any relation between id and id as id will not be compared and two variables can not come side by side. There is also a disadvantage of this table – if we have n operators then size of table will be n*n and complexity will be 0(n2). In order to decrease the size of table, we use operator function table. Operator precedence parsers usually do not store the precedence table with the relations; rather they are implemented in a special way. Operator precedence parsers use precedence functions that map terminal symbols to integers, and the precedence relations between the symbols are implemented by numerical comparison. The parsing table can be encoded by two precedence functions f and g that map terminal symbols to integers. We select f and g such that: 1. f(a) < g(b) whenever a yields precedence to b 2. f(a) = g(b) whenever a and b have the same precedence 3. f(a) > g(b) whenever a takes precedence over b fid -> g* -> f+ ->g+ -> f$ gid -> f* -> g* ->f+ -> g+ ->f$ fid ->-> f$, f+ -> f$, f* -> f$ gid -> f$, g+ -> f$, g* -> f$ 41 function relation TABLE id + * $ f 4 2 4 0 g 5 1 3 0 42 Semantic Analysis Type checkingGrammar + semantic rule/semantic action= Semantic Analysis There are two notations for attaching semantic rules: 1. Syntax Directed Definitions.-> High-level specification hiding many implementation details (also called Attribute Grammars). 2. Syntax DirectedTranslation Schemes. More implementation oriented: Indicate the order in which semantic rules are to be evaluated. Syntax Directed Translation(SDT) Grammar + Semantic Action = SDT Syntax Directed Definition(SDD) Grammar + Semantic Rule = SDD Semantic Rule: Assigns value to node Semantic Action: Embeds program fragments SDD and SDT for infix to postfix Computation SDT Scheme Syntax Directed Translation E -> E+T { print ‘+’} E -> E-T { print ‘-’} E ->T T ->0 { print ‘0’} T ->1 { print ‘1’} … T ->9 { print ‘9’} Syntax Directed Definition E -> E+T { E.code = E.code ||T.code||’+’} E -> E-T { E.code = E.code ||T.code||’-’} E ->T { E.code = T.code} T ->0 { T.code =’0’} T ->1 { T.code =’1’} … T ->9 { T.code ‘9’} 43 · SDT/SDD can be used apart from parsing 1. to store type information in the symbol table 2. to build syntax tree 3. to issue error messages. 4. to perform continuous checks like type checking. 5. to generate intermediate code or target code Annoted parse tree A parse tree which stores attribute value at each and every node is called as annotated parse tree. Example: E -> E+T | T T -> T *F | F F -> num Input string w= 1+2*3 44 Classification of attribute Based on the process of evaluation the attributes are classified into two types: 1. Synthesized attribute 2. Inherited attribute Synthesized attribute Value of a node is evaluated in terms of its children node Eg. A -> XYZ A.i = F (X.i | Y.i |Z.i) Inherited attribute Value of a node is evaluated in terms either from parent or siblings node Eg. A -> XYZ X.i = F (A.i | Y.i |Z.i) Y.i = F(A.i | X.i) 45 Types of SDD 1. S-Attributed Definitions 2. L-attributed Definitions ● S-Attributed Definitions Definition. An S-Attributed Definition is a Syntax Directed Definition that uses only synthesized attributes. • Evaluation Order. Semantic rules in a S-Attributed Definition can be evaluated by a bottom-up, or PostOrder, traversal of the parse-tree. • Example. The above arithmetic grammar is an example of an S-Attributed Definition. The annotated parse-tree for the input 3*5+4 is Example: L ->E { Print (E.val)} E -> E1 + T { E.val = E1.val+ T.val } E -> T { E.val = T .val} T -> T 1 * F { T.val = T 1.val + F.val} T -> F { T.val = F.val} F -> id { F.val = id.lexval} ● L-Attributed Definitions Definition. An L-Attributed Definition is a Syntax Directed Definition that uses synthesized attributes and inherited attributed. Example of L –attributed SDD D -> TL L.in = T.type T -> int T.type =integer T->real T.type=real L->L1,id L1.in = L.in addtype(id.entry,L.in) L->id addtype(id.entry,L.in) addtype is a function which will add datatype of variable into the symbol table 46 Example for construction of abstract syntax tree: Mknode(value, leftchild pointer,rightchild pointer) Mkleaf(num,num.val) E -> E1 + T E.nptr = mknode(‘+’, E1.nptr, T.nptr) E -> E1 - T E.nptr = mknode(‘-’, E1.nptr, T.nptr) E -> T E.nptr = T.nptr T -> (E) T.nptr = E.nptr T ->id T.nptr = mkleaf (id,id.entry) T-> num T.nptr = mkleaf(num, num.val) mkleaf is a function creating leaf node with value equal to id.entry mknode is a function creating node with two children 1st argument=value, 2nd argument = left child, 3rd argument = right child 7 + 8 -6 47 48 Intermediate Code Need of Intermediate code 49 50 51 52 53 1. i=1 2. j=1 3. t1= 10 * i 4. t2 = t1 + j t2= 20 +2=22 5. t3 = 8 * t2 t3 = 8 * 22 = 176 6. t4 = t3 -88 t4 = 176-88 =88 7. a[t4]= 0.0 8. t1= 20 j=j+1 54 9. if j <=10 goto 3 10. i= i + 1 11. if i <=10 goto 2 12. i = 1 13. t5 = i -1 t5 =0 14. t6 = 88 * t5 15. a[t6]= 1.0 16. i = i + 1 17. if i < = 10 goto 13 Initialization of identity matrix a[13]= a(base address) + 13(index). Why 8= Size of each element in an array,10, 88, 0.0, 1 Row Major order, Column major order int a[2][2]; int required 4 bytes and memory is byte addressable a[1][1]-> a[0][0] a[2][2]-> a[1][1] a[0][0] a[0][1] a[1][0] a[1][1] Row major order 10 20 30 40 a[0][0] a[0][1] a[0][2] A[0][3] 50 60 70 55 80 90 100 110 120 A[0][9] A[1][0] A[1][1] 1000, 1001, 1002, 1003 1008 1016 1024 1032 1040 1048 1056 1064 1072 Column Major order a[0][0] a[1][0] a[0][1] a[1][1] Scanf(“%d”,&a) printf(“%d”, b); printf(“%d”, a[1][]); 1 d array Base address + index *size of each element 1000 + 3*4= 1000 + 12= 1012 2 d array Base address + ((rownumber * number of columns)+column number) * size of each element 1000 + ((1*2)+1)*4= 1000 + 3 * 4 = 1012 56 1080 10881096 57 Code Optimization 58 59 BASIC BLOCKS AND FLOW GRAPHS 60 61 62 63 Code Optimization Techniques 1. 2. 3. 4. 5. 6. 7. 8. Constant Folding Constant Propagation Common Subexpression elimination Copy/Variable Propagation Code movement Loop invariant computation Strength reduction Dead code elimination Peephole Optimization In while loop c is live . Out of while loop c is dead. Data flow analysis # include <stdio.h> void main() { float k =20; int i =10, j= 20; while (i) { int c=10; sum= sum + c * i*j; } prrint (“%d”,c); } common subexpression t1 = 4 * i t2 = i * 5 64 ● Data Flow analysis ● Uses of Data Flow analysis 65 ● Data flow analysis schema 66 ● Reaching Definition problem 67 68 Node GEN KILL B1 {} {} B2 {} {} B3 {} {} B4 {} {} B5 {} {} Computing IN and OUT Initially IN= Φ OUT = GEN 69 Initial Node IN OUT B1 Φ {1,2} B2 Φ {3} B3 Φ {4} B4 Φ {5} B5 Φ {6} First Iteration IN( B2) = OUT ( B1 ) U OUT ( B5) = {1,2} U {6}= { 1,2,6} IN( B3 ) = OUT (B2) OUT ( B3 ) = IN (B3) - KILL( B3) U GEN (B3) = { 2,3} -{2,5} U {4} = {3,4} IN (B4) = OUT ( B3) OUT ( B4) = IN (B4) - KILL( B4) U GEN (B4) = { 3,4} -{2,4} U {5 } = { 3,5} 70 OUT ( B5) = IN (B5) - KILL( B5) U GEN (B5) = { 3,4,5} - {1,3} U { 6 } = {4,5, 6} Node IN OUT B1 {Φ} {Φ}-{3,4,5,6}U{1,2} B2 {1,2,6} {1,2,6}-{1,6} U{3}={ 2,3} B3 {2,3} { 3,4} B4 { 3,4} { 3,5} B5 { 3,4,5} {4,5, 6} Second Iteration Node IN OUT IN OUT B1 B2 B3 B4 B5 Third Iteration Node B1 B2 B3 B4 71 B5 Fourth Iteration Node IN OUT B1 B2 B3 B4 B5 72 73 80 to 20 % principle Money 80% of the total money possessed by 20 % of the people across globe. Program 80% of the total execution time consumed by 20% of the code(Loops). Flow graph -> loop Reducible flow graph Loop optimization: Loop unrolling loop invariant statement Sorting Techniques,Matrix Multiplication, Longest common subsequence, Job scheduling, Knapsack Problem, Rod cutting,Searching technique, Travelling salesman,Graph coloring, Minimum Spanning trees, Shortest Path, 74 Any Array: Sort Yes and no Optimization: A ---- B Shoretest Input : Graph Output: Whether there is a path between A and B? Answer: Yes/No Problem 1.Decidable Problem eg. Sorting, finding a path between two node. 2. Undecidable Problem Algorithm is absent Algorithm is present but computation cost is too high. A problem whose output cannot be determined. Hamiltonian cycle, 1,00,000 4 colorable eg. 4 /8 queen problems eg. Checking grammar is ambiguous. eg.Travelling salesman problem Program - P -> How many registers will be required to execute the code? Register - Values of variables Code has 100 Variables: Processor: 8085/ 8086 Own memory 75 Memory Hierarchy Heuristic Based Algorithm - A* Greedy Best first search. Fast Look up -> Speed increases Register-------cache --------- Main Memory MA MM > MA C > MMR Instruction Pipelining Jump -> Instruction stall 76

Syntax Analysis & Parsing Techniques

Related documents

Products

Support

Syntax Analysis & Parsing Techniques

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib