Uploaded by TUSHAR MALI

Compiler Notes

advertisement
Syntax Analysis(Parsing)
Parser
● Parser is a compiler that is used to break the data into smaller elements coming from the
lexical analysis phase.
● A parser takes input in the form of a sequence of tokens and produces output in the form of a
parse tree.
● Parsing is of two types: top down parsing and bottom up parsing.
Recursive Descent Parsing
Recursive descent is a top-down parsing technique that constructs the parse tree from the top and
the input is read from left to right. It uses procedures for every terminal and non-terminal entity. This
parsing technique recursively parses the input to make a parse tree, which may or may not require
back-tracking. But the grammar associated with it (if not left factored) cannot avoid back-tracking. A
form of recursive-descent parsing that does not require any back-tracking is known as predictive
parsing.
Back-tracking
1
Top- down parsers start from the root node (start symbol) and match the input string against the
production rules to replace them (if matched). To understand this, take the following example of CFG:
S → rXd | rZd
X → oa | ea
Z → ai
For an input string: read, a top-down parser, will behave like this:
It will start with S from the production rules and will match its yield to the left-most letter of the
input, i.e. ‘r’. The very production of S (S → rXd) matches with it. So the top-down parser advances to
the next input letter (i.e. ‘e’). The parser tries to expand non-terminal ‘X’ and checks its production
from the left (X → oa). It does not match with the next input symbol. So the top-down parser
backtracks to obtain the next production rule of X, (X → ea).
Now the parser matches all the input letters in an ordered manner. The string is accepted.
Predictive Parser
Predictive parser is a recursive descent parser, which has the capability to predict which production is
to be used to replace the input string. The predictive parser does not suffer from backtracking.
To accomplish its tasks, the predictive parser uses a look-ahead pointer, which points to the next
input symbols. To make the parser back-tracking free, the predictive parser puts some constraints on
the grammar and accepts only a class of grammar known as LL(k) grammar.
2
Predictive parsing uses a stack and a parsing table to parse the input and generate a parse tree. Both
the stack and the input contains an end symbol $ to denote that the stack is empty and the input is
consumed. The parser refers to the parsing table to take any decision on the input and stack element
combination.
3
In recursive descent parsing, the parser may have more than one production to choose from for a
single instance of input, whereas in predictive parser, each step has at most one production to
choose. There might be instances where there is no production matching the input string, making the
parsing procedure to fail.
LL Parser
An LL Parser accepts LL grammar. LL grammar is a subset of context-free grammar but with some
restrictions to get the simplified version, in order to achieve easy implementation. LL grammar can be
implemented by means of both algorithms namely, recursive-descent or table-driven.
LL parser is denoted as LL(k). The first L in LL(k) is parsing the input from left to right, the second L in
LL(k) stands for left-most derivation and k itself represents the number of look aheads. Generally k =
1, so LL(k) may also be written as LL(1).
LL Parsing Algorithm
We may stick to deterministic LL(1) for parser explanation, as the size of table grows exponentially
with the value of k. Secondly, if a given grammar is not LL(1), then usually, it is not LL(k), for any
given k.
Given below is an algorithm for LL(1) Parsing:
FIRST(X) for a grammar symbol X is the set of terminals that begin the strings derivable from
X.
Rules to compute FIRST set:
1. If x is a terminal, then FIRST(x) = { ‘x’ }
2. If x-> Є, is a production rule, then add Є to FIRST(x).
4
3. If X->Y1 Y2 Y3….Yn is a production,
1. FIRST(X) = FIRST(Y1)
2. If FIRST(Y1) contains Є then FIRST(X) = { FIRST(Y1) – Є } U { FIRST(Y2) }
3. If FIRST (Yi) contains Є for all i = 1 to n, then add Є to FIRST(X).
Example 1:
Production Rules of Grammar
E
-> TE’
E’ -> +T E’|Є
T
-> F T’
T’ -> *F T’ | Є
F
-> (E) | id
FIRST sets
FIRST(E) = FIRST(T) = { ( , id }
FIRST(E’) = { +, Є }
FIRST(T) = FIRST(F) = { ( , id }
FIRST(T’) = { *, Є }
FIRST(F) = { ( , id }
Example 2:
Production Rules of Grammar
S -> ACB | Cbb | Ba
A -> da | BC
B -> g | Є
C -> h | Є
5
FIRST sets
FIRST(S) = FIRST(ACB) U FIRST(Cbb) U FIRST(Ba)
= { d, g, h, b, a, Є}
FIRST(A) = { d } U FIRST(BC)
= { d, g, h, Є }
FIRST(B) = { g , Є }
FIRST(C) = { h , Є }
Follow(X) to be the set of terminals that can appear immediately to the right of Non-Terminal X
in some sentential form.
Example:
S ->Aa | Ac
A ->b
S
/
A
S
\
/
a
A
|
|
b
b
\
C
Here, FOLLOW (A) = {a, c}
Rules to compute FOLLOW set:
1) FOLLOW(S) = { $ }
// where S is the starting Non-Terminal
2) If A -> pBq is a production, where p, B and q are any grammar
symbols,
then everything in FIRST(q) except Є is in FOLLOW(B).
3) If A->pB
FOLLOW(B).
is
a
production,
then
6
everything in FOLLOW(A) is in
4) If A->pBq is a production and FIRST(q) contains Є,
then FOLLOW(B) contains { FIRST(q) – Є } U FOLLOW(A)
Example 1:
Production Rules:
E -> TE’
E’ -> +T E’|Є
T -> F T’
T’ -> *F T’ | Є
F -> (E) | id
FIRST set
FIRST(E) = FIRST(T) = { ( , id }
FIRST(E’) = { +, Є }
FIRST(T) = FIRST(F) = { ( , id }
FIRST(T’) = { *, Є }
FIRST(F) = { ( , id }
FOLLOW Set
FOLLOW(E)
= { $ , ) }
// Note
FOLLOW(E’) = FOLLOW(E) = {
FOLLOW(T)
}
$, ) }
// See 1st production rule
= { FIRST(E’) – Є } U FOLLOW(E’) U FOLLOW(E) = { + , $ , )
FOLLOW(T’) = FOLLOW(T) =
FOLLOW(F)
) }
')' is there because of 5th rule
= { FIRST(T’) –
{ + , $ , ) }
Є } U FOLLOW(T’) U FOLLOW(T) = { *, +, $,
Example 2:
Production Rules:
7
S -> aBDh
B -> cC
C -> bC | Є
D -> EF
E -> g | Є
F -> f | Є
FIRST set
FIRST(S) = { a }
FIRST(B) = { c }
FIRST(C) = { b , Є }
FIRST(D) = FIRST(E) U FIRST(F) = { g, f, Є }
FIRST(E) = { g , Є }
FIRST(F) = { f , Є }
FOLLOW Set
FOLLOW(S) = { $ }
FOLLOW(B) = { FIRST(D) – Є } U FIRST(h) = { g , f , h }
FOLLOW(C) = FOLLOW(B) = { g , f , h }
FOLLOW(D) = FIRST(h) = { h }
FOLLOW(E) = { FIRST(F) – Є } U FOLLOW(D) = { f , h }
FOLLOW(F) = FOLLOW(D) = { h }
Example 3:
Production Rules:
S -> ACB|Cbb|Ba
A -> da|BC
8
B-> g|Є
C-> h| Є
FIRST set
FIRST(S) = FIRST(A) U FIRST(B) U FIRST(C) = { d, g, h, Є, b, a}
FIRST(A) = { d } U {FIRST(B)-Є} U FIRST(C) = { d, g, h, Є }
FIRST(B) = { g, Є }
FIRST(C) = { h, Є }
FOLLOW Set
FOLLOW(S) = { $ }
FOLLOW(A)
= { h, g, $ }
FOLLOW(B) = { a, $, h, g }
FOLLOW(C) = { b, g, $, h }
9
Bottom Up parser/ Shift- Reduce Parser/ LR( K ) parser
__________________________________________
● Working of LR ( K ) Parser
● Meaning of LR ( K )
L: Left to Right scanning of input string
R : Reverse of Rightmost Derivation
K : k input symbol is used to choose alternative production
● Types of LR( K ) parser
10
Start Symbol -> -> -> Input String ----- Top Down
Input string -> -> -> Start Symbol --------Bottom Up
● Example of LR( K ) parsing
Input:Grammar
Input String : set of terminal symbols
Find substring matching completely with RHS of any production.
If it is matching replace matched substring with LHS symbol
Input String: begin print num = num ; end
Input String
begin print num = num ; end
Action
Reduce E-> num=num
begin print E; end
begin print E; end
Reduce S-> print E
begin S; end
Reduce S’ -> S
begin S’ ; end
No successful parsing. Go for another option
begin S’ ;ε end
Reduce L -> ε
begin S; L end
Reduce L -> S;L
begin L end
Reduce S -> begin L end
S
S’
Reduce S’ -> S
begin print E ; end -> begin S; end -> begin L end -> S ->S’
11
Shift Reduce Parsing:
1. E -> T + E
2. E ->T
3. T -> int * T
4. T -> int
5. T-> ( E )
γ = int * int + int
A -> β = T -> int
E
T+E
T+T
T + int
int * T + int
int * int + int
int * int + int
int * T + int
T +int
12
T+T
T+E
E
Input String : int * int + int
● Leftmost Derivation -> Top Down Parsing
E
-> T + E
-> int * T + E
-> int * int +E
-> int * int + T
-> int * int + int
● Reverse of Rightmost Derivation -> Bottom up Parsing
E
->T+ E
->T+T
->T +int
->int * T + int
->int * int + int
● Sentential Form
Steps in derivation from start symbol to input string.
● Right sentential form
Steps in rightmost derivation
● Handle and Viable Prefix
The handle of right sentential form γ is a production A ->β with the position of β in γ in such a
way that the replacement of β by A in γ produces the previous sentential form in the rightmost
derivation of γ.
A viable prefix of a right sentential form is that prefix that contains a handle but no symbol to
the right of the handle.
Example:
Find Handle and viable prefix
Consider the grammar
13
S-> 0A1 | 0B
A -> 0A | 0
Input String “00001”
Handle information:
To find handle derive the string using Rightmost derivation in reverse
S -> 0A1 -> 00A1 -> 000A1 -> 00001
RMD
Handle
00001$
A -> 0 following the third ‘0’
000A1$
A -> 0A Following the second ‘0’
00A1$
A -> 0A Following the first ‘0’
0A1$
S -> 0A1 preceding $
S$
Viable prefix: Use shift reduce method
1. Initially the stack contains $ and the buffer contains a string ending with $. The first step is to
shift
2. If the top of stack contains handle then reduce; otherwise continue to shift till we obtain the
handle on top of stack
3. At the end we obtain the start symbol in the stack and the $ in the buffer the string is
accepted.
When we obtain a handle on top of the stack it is called a viable prefix.
Stack
Action
Buffer
$
$0
$00
$000
$0000
$000A
$00A
$0A
Shift 0
Shift 0
Shift 0
Shift 0
Reduce with A -> 0
Reduce with A -> 0A
Reduce with A -> 0A
Shift 1
00001$
001$
001$
01$
1$
1$
1$
1$
14
Viable
Prefix
0000
000A
00A
$0A1
$S
Reduce with S -> 0A1
Accepted
$
$
15
0A1
● Generalized steps of construction of LR(k) parsing table
1. For the given grammar take the augmented grammar.
2. Create a canonical collection of LR(k) items.
3. Draw the DFA and prepare the parsing table.
● Augmented grammar
Add one more production
S’ - > S ------ New production
S-> AB
A ->a
B -> b
{ ab }
● LR(0) items
A production with dot( .) at any point on the RHS is called an LR( 0 ) item.
Example.
A -> abc
A -> .abc
A -> a .bc
(indicates viable prefix a)
A -> ab .c
( indicates viable prefix ab)
A -> abc .
( Final item / Non Kernel Item Indicates handle abc)
● Canonical collection
C = { I0, I1, I2,…. In} is the canonical collection of items
To get it we consider two functions
1. CLOSURE( I )
If A -> α .Bβ is in I
and B ->γ is in grammar G
then add B -> .γ to the closure of I.
Apply above steps in recursive manner̄
2. GOTO( I , X)
16
It is a closure of A -> αX .β such that A -> α .Xβ is in I
Example:
E -> E + T |T
T -> T * F | F
F -> id
Step 1 : Augmented grammar
Add one production
E’ -> E
Step 2: Create Canonical collection of LR(0) items
Input string : id + id
E
-> E + T
-> E + F
-> E + id
-> T + id
-> F + id
-> id + id
17
LR (0)Parsing table:
Reduce entry is placed in all the columns of terminal symbols.
Table contains Shift /reduce conflict -> Given grammar is not LR( 0 ) grammar.
Check given grammar is LR(0)
Shift Reduce
I0
I1
I2
I3
I4
I5
I6
I7
I8
id
S4
R2
R4
R5
S4
S4
R1
R3
“+”
S5
R2
R4
R5
R1
R3
“*”
“$”
S6/R2
R4
R5
Accept
R2
R4
R5
S6/R1
R3
E
1
T
2
F
3
7
3
8
R1
R3
Initial Configuration
Stack contents: $ first state
Input Pointer point to leftmost symbol
● LR( 0 ) Parsing Example 1
Input string : id
Stack
Input
Action
$0
id$
Table M[0,id] -> S4
Shift id onto stack
Shift state no 4 onto stack
Increment input pointer
$0 id 4
$
Table M [4,$] -> R5
Reduce Production no 5 : F -> id
Remove 2 topmost symbol from stack
id, 4
Push LHS symbol onto stack: F
Table M [0, F] -> 3
18
Push 3 onto stack
$0F3
$
Table M [3,$] -> R4
Reduce production no 4 : T -> F
Remove 2 topmost symbol from stack
F, 3
Push LHS symbol onto stack: T
Table M[ 0, T] -> 2
Push 2 onto stack
$0T2
$
Table M [2, $] -> R2
Reduce production no 2: E -> T
Remove 2 topmost symbol from stack
T, 2
Push LHS symbol onto stack: E
Table M [0 , E] -> 1
Push 1 onto stack
$0E1
$
Table M [1, $] -> Accept
Stop
Successful parsing
● LR( 0 ) Parsing Example 1
Input string : id + id
Stack
Input
Action
$0
id + id$
Table M[0,id] entry-> S4
Push id onto stack
Shift state 4 onto top of stack
Increment input pointer
$0 id 4
+ id $
Table M [4, +] entry -> R5
Reduce Production no. 5 F -> id
Remove symbol : id,4
Push F
Goto Table M[0, F] -> 3 and push it
Push 3
$0F3
+ id
Goto Table M[3,+] -> R4
19
Reduce Production no 4 , T-> F
Remove Symbol F, 3
Push T
Goto Table M[0,T]-> 2 and Push it
Push 2
$0 T 2
+ id $
Goto Table M [2, +] -> R2
Reduce Production no 2 E -> T
Remove Symbol T, 2
Push E
Goto Table M [0, E] -> 1
Push 1 on Stack
$0E1
+ id $
Goto Table M [1, +] -> S5
Shift + and 5 onto stack
Increment input pointer
$0 E1 + 5
id $
Goto Table M [5, id] -> S4
Shift id and 4 onto stack
Increment input pointer
$ 0 E 1 + 5 id 4
$
Goto Table M [4, $] -> R5
Reduce Production no 5 F -> id
Remove symbol id, 4
Push F onto stack
Goto Table M [5, F] -> 3
Push 3 onto stack
$0 E1+5 F3
$
Goto Table M [3, $] -> R4
Reduce Production No 4 , T -> F
Remove symbol F, 3
Push T onto stack
Goto Table M [5, T] -> 7
Push 7 onto stack
$0 E1+5 T7
$
Goto Table M [7, $] -> R1
Reduce Production No 1 , E -> E + T
Remove symbol E ,1 ,+ 5 , T, 7
Push E onto stack
Goto Table M [0, E] -> 1
20
Push 1 onto stack
$0 E1
$
Goto Table M [1, $] -> Accept
$0 E
● LR(K) Parsing algorithm
Example 2:
Grammar
S -> ( L ) | a
L -> L , S | s
21
Answer:
Step 1:
S’ -> S since S is start symbol
Non
Symbols
Terminal Symbols
States/Symbo
l
I0
I1
I2
I3
I4
I5
I6
I7
I8
(
S2
)
,
a
S3
$
S
1
L
5
4
Accept
S2
S3
R2
R4
R1
S2
R3
R2
S6
R4
R1
R3
R2
S7
R4
R1
R3
R2
R2
R4
R1
S3
R3
R4
R1
8
R3
Input String : ( a )
Stack
terminal
Input
Action
22
$0
( a )$
Table M[ 0 ,( ] -> S2
Shift ( onto stack
Shift 2 into stack
Increment input pointer
$0 ( 2
a )$
Table M [ 2, a] -> S3
Shift a onto stack
Shift 3 onto stack
Increment input pointer
$0 ( 2 a 3
)$
Table M [3, )] -> R2
Reduce production no 2 : S ->a
Remove 2 topmost symbol from stack: a ,3
Push LHS symbol onto stack : S
Table M [2, S] -> 5
Push 5
$0 ( 2 S 5
)$
Table M [5, )] -> R4
Reduce production no 4 : L ->S
Remove 2 topmost symbol from stack: S ,5
Push LHS symbol onto stack : L
Table M [2, L] -> 4
Push 4
$0 ( 2 L 4
)$
Table M[ 4 ,) ] -> S6
Shift ) onto stack
Shift 6 into stack
Increment input pointer
$0 ( 2 L 4 ) 6
$
Table M [6, $] -> R1
Reduce production no 1 : S ->( L )
Remove 6 topmost symbol from stack:
Push LHS symbol onto stack : S
Table M [0, S] -> 1
Push 1
$0S1
$
Accept
23
● SLR(1) Parser
All the steps of construction of the parsing table are the same except for the reduced entry.
Rule for making reduce entry:
1. Goto each state
2. Check if it has final item
3. If the final item is A -> Ba. then calculate FOLLOW( A )
4. Add reduce entry only in the column of terminal symbol which are present in FOLLOW( A )
Example 1:
Grammar:E -> E + T | T
T -> T * F | F
F -> id
Step 1 : Augmented Grammar
E’ -> E
E -> E + T |T
T -> T * F | F
F -> id
Step 2 : Canonical Collection of LR(0) items
3. Construct Parsing table
R2 :E -> T. Follow( E ) ={ $, +}
R4: T -> F. Follow( T ) = { *, $, + }
R5 : F -> id. Follow( F ) = {*, $, + }
R1 : E -> E + T . Follow( E ) ={ $, +}
R3 : T -> T * F . Follow ( T ) ={ *, $, + }
24
I0
I1
I2
I3
I4
I5
I6
I7
I8
id
S4
“+”
S5
R2
R4
R5
“*”
“$”
S6
R4
R5
Accept
R2
R4
R5
S4
S4
R1
R3
S6
R3
R1
R3
Example 2:
Grammar
S-> Aa | bAc | Bc | bBa
A -> d
B -> d
Answer:
State 5:
R5: A ->d. Follow( A )= { a, c}
R6: B ->d. Follow ( B ) = { a, c}
State 6:
R1: S-> Aa. Follow ( S ) = { $ }
State I9:
R3: S-> Bc. Follow ( S ) = { $ }
25
E
1
T
2
F
3
7
3
8
State 110:
R2: S -> bAc. Follow ( S ) = { $ }
State 111:
R4: S -> bBa. Follow ( S ) = { $ }
a
I0
I1
I2
I3
I4
I5
I6
I7
I8
I9
I10
I11
b
S3
c
d
S5
$
S
1
A
2
B
4
7
8
Accept
S6
S5
R5/R6
S9
R5/R6
R1
S10
S11
R3
R2
R4
SLR(1) parsing table
Given parser has reduce/reduce conflict. So the given grammar is not SLR(1) grammar.
LR(1) parser
1. It uses LR(1) items
LR(0) item + Lookahead symbol = LR( 1) item
Eg.
2. It has two types
a. Canonical LR(1) parser (CLR(1))
26
b. Lookahead LR(1) parser (LALR(1))
Fig. Types of LR(1) Parser
Types of Conflict:
1. Shift/Reduce Conflict
A -> α.a β, a
B -> γ. , a
2. Reduce/ Reduce Conflict
A -> α . , a
R2
B -> γ. , a
R3
Rules for construction of CLR(1)/LALR(1) Parsing Table
1. Add everything from input to output.
2. If A -> α. Bβ, $ is in Ii
B -> γ is in the grammar then
Add B -> .γ, FIRST(β $) to the closure if Ii.
3. Repeat step 2 for every newly added LR(1) item.
4. Construct the parsing table. It it is free from Shift/Reduce and Reduce/ Reduce conflict the given grammar is
CLR(1) or LALR(1)
27
28
Example:
Grammar:
CLR(1) Parsing Table.
I0
a
b
S3
S4
I1
$
S
A
1
2
Accept
I2
S6
S7
5
I3
S3
S4
8
I4
R3
R3
I5
I6
R1
S6
S7
I7
I8
9
R3
R2
R2
29
I9
R2
LALR(1) Parser
In CLR(1) we can find some states with common LR(0) items differ by follow symbols.
In this parser the states with common LR(0) items differ by follow symbol/s are combined.
The number of entries in the parsing table are reduced.
a
I0
R3
b
R4
I1
I2
$
S
A
1
2
Accept
S4
I3
S5
I4
R3
6
30
B
3
I5
R4
I6
I7
7
S8
S9
I8
R1
I9
R2
31
LALR Parser (with Examples)
LALR Parser :
LALR Parser is a lookahead LR parser. It is the most powerful parser which can handle large
classes of grammar. The size of the CLR parsing table is quite large as compared to other parsing
tables. LALR reduces the size of this table.LALR works similar to CLR. The only difference is , it
combines the similar states of the CLR parsing table into one single state.
The general syntax becomes [A->∝.B, a ]
where A->∝.B is production and a is a terminal or right end marker $
LR(1) items=LR(0) items + look ahead
How to add lookahead with the production?
CASE 1 –A->∝.BC, a
Suppose this is the 0th production.Now, since ‘ . ‘ precedes B,so we have to write B’s productions
as well.
B->.D [1st production]
Suppose this is B’s production. The look ahead of this production is given as- we look at previous
production i.e. – 0th production. Whatever is after B, we find FIRST(of that value) , that is the
32
lookahead of 1st production. So, here in 0th production, after B, C is there. Assume FIRST(C)=d,
then 1st production become.
B->.D, d
CASE 2 –
Now if the 0th production was like this,
A->∝.B, a
Here,we can see there’s nothing after B. So the lookahead of 0th production will be the lookahead
of 1st production. ie-
B->.D, a
CASE 3 –
Assume a production A->a|b
A->a,$ [0th production]
A->b,$ [1st production]
Here, the 1st production is a part of the previous production, so the lookahead will be the same as
that of its previous production.
Steps for constructing the LALR parsing table :
33
1. Writing augmented grammar
2. LR(1) collection of items to be found
3. Defining 2 functions: goto[list of terminals] and action[list of non-terminals] in the
LALR parsing table
EXAMPLE
Construct CLR parsing table for the given context free grammar
S-->AA
A-->aA|b
Solution:
STEP1- Find augmented grammar
The augmented grammar of the given grammar is:S'-->.S ,$
[0th production]
S-->.AA ,$ [1st production]
A-->.aA ,a|b [2nd production]
A-->.b ,a|b [3rd production]
34
Let’s apply the rule of lookahead to the above productions.
● The initial look ahead is always $
● Now,the 1st production came into existence because of ‘ . ‘ before ‘S’ in 0th
production.There is nothing after ‘S’, so the lookahead of 0th production will be the
lookahead of 1st production. i.e. : S–>.AA ,$
● Now,the 2nd production came into existence because of ‘ . ‘ before ‘A’ in the 1st
production.
After ‘A’, there’s ‘A’. So, FIRST(A) is a,b. Therefore, the lookahead of the 2nd
production becomes a|b.
● Now,the 3rd production is a part of the 2nd production.So, the look ahead will be the
same.
STEP2 – Find LR(0) collection of items
Below is the figure showing the LR(0) collection of items. We will understand everything one by
one.
35
The terminals of this grammar are {a,b}
The non-terminals of this grammar are {S,A}
RULES –
1. If any non-terminal has ‘ . ‘ preceding it, we have to write all its production and add ‘ . ‘
preceding each of its production.
2. from each state to the next state, the ‘ . ‘ shifts to one place to the right.
● In the figure, I0 consists of augmented grammar.
● Io goes to I1 when ‘ . ‘ of 0th production is shifted towards the right of S(S’->S.). This
state is the accept state . S is seen by the compiler. Since I1 is a part of the 0th
production, the lookahead is same i.e. $
● Io goes to I2 when ‘ . ‘ of 1st production is shifted towards right (S->A.A) . A is seen by
the compiler. Since I2 is a part of the 1st production, the lookahead is same i.e. $.
● I0 goes to I3 when ‘ . ‘ of 2nd production is shifted towards the right (A->a.A) . a is seen
by the compiler.since I3 is a part of 2nd production, the lookahead is same i.e. a|b.
● I0 goes to I4 when ‘ . ‘ of 3rd production is shifted towards right (A->b.) . b is seen by
the compiler. Since I4 is a part of 3rd production, the lookahead is same i.e. a|b.
● I2 goes to I5 when ‘ . ‘ of 1st production is shifted towards right (S->AA.) . A is seen by
the compiler. Since I5 is a part of the 1st production, the lookahead is same i.e. $.
● I2 goes to I6 when ‘ . ‘ of 2nd production is shifted towards the right (A->a.A) . A is seen
by the compiler. Since I6 is a part of the 2nd production, the lookahead is same i.e. $.
● I2 goes to I7 when ‘ . ‘ of 3rd production is shifted towards right (A->b.) . A is seen by
the compiler. Since I6 is a part of the 3rd production, the lookahead is same i.e. $.
● I3 goes to I3 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is seen
by the compiler. Since I3 is a part of the 2nd production, the lookahead is same i.e. a|b.
● I3 goes to I8 when ‘ . ‘ of 2nd production is shifted towards the right (A->aA.) . A is seen
by the compiler. Since I8 is a part of the 2nd production, the lookahead is same i.e. a|b.
36
● I6 goes to I9 when ‘ . ‘ of 2nd production is shifted towards the right (A->aA.) . A is seen
by the compiler. Since I9 is a part of the 2nd production, the lookahead is same i.e. $.
● I6 goes to I6 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is seen
by the compiler. Since I6 is a part of the 2nd production, the lookahead is same i.e. $.
● I6 goes to I7 when ‘ . ‘ of the 3rd production is shifted towards right (A->b.) . b is seen
by the compiler. Since I6 is a part of the 3rd production, the lookahead is same i.e. $.
STEP 3 –
Defining 2 functions: goto[list of terminals] and action[list of non-terminals] in the parsing
table.Below is the CLR parsing table
Once we make a CLR parsing table, we can easily make a LALR parsing table from it.
In the step2 diagram, we can see that
37
● I3 and I6 are similar except their lookaheads.
● I4 and I7 are similar except their lookaheads.
● I8 and I9 are similar except their lookaheads.
In LALR parsing table construction , we merge these similar states.
● Wherever there is 3 or 6, make it 36(combined form)
● Wherever there is 4 or 7, make it 47(combined form)
● Wherever there is 8 or 9, make it 89(combined form)
Below is the LALR parsing table.
Now we have to remove the unwanted rows
● As we can see, 36 row has same data twice, so we delete 1 row.
● We combine two 47 row into one by combining each value in the single 47 row.
● We combine two 89 row into one by combining each value in the single 89 row.
The final LALR table looks like the below.
38
Classification of Parsers
39
Operator Precedence Parser:
This is an example of operator grammar:
E->E+E/E*E/id
E->EAE/id
A -> +/*
Language should be same
id+id, id*id, id+id*id
However, the grammar given below is not an operator grammar because two non-terminals are adjacent to each
other:
S->SAS/a
A->bSb/b
Language: (ab)*a, a(ba)*
We can convert it into an operator grammar, though:
S->SbSbS/SbS/a
A->bSb/b
Language: (ab)*a, a(ba)*
Operator precedence parser –
An operator precedence parser is a bottom-up parser that interprets an operator grammar. This parser is only used
for operator grammars. Ambiguous grammars are not allowed in any parser except operator precedence parse
This parser relies on the following three precedence relations: ⋖, ≐, ⋗
a ⋖ b This means a “yields precedence to” b.
a ⋗ b This means a “takes precedence over” b.
a ≐ b This means a “has same precedence as” b.
40
E->E+E/E*E/id
$ having least precedence, id has higher precedence than other operator
+
*
Id
$
+
>
<
<
>
*
>
>
<
>
Id
>
>
--
>
$
<
<
<
Accept
E->E+E/E*E/id
There is not any relation between id and id as id will not be compared and two variables can not come side by
side. There is also a disadvantage of this table – if we have n operators then size of table will be n*n and
complexity will be 0(n2). In order to decrease the size of table, we use operator function table.
Operator precedence parsers usually do not store the precedence table with the relations; rather they are
implemented in a special way. Operator precedence parsers use precedence functions that map terminal symbols
to integers, and the precedence relations between the symbols are implemented by numerical comparison. The
parsing table can be encoded by two precedence functions f and g that map terminal symbols to integers. We
select f and g such that:
1. f(a) < g(b) whenever a yields precedence to b
2. f(a) = g(b) whenever a and b have the same precedence
3. f(a) > g(b) whenever a takes precedence over b
fid -> g* -> f+ ->g+ -> f$
gid -> f* -> g* ->f+ -> g+ ->f$
fid ->-> f$, f+ -> f$, f* -> f$
gid -> f$, g+ -> f$, g* -> f$
41
function relation TABLE
id
+
*
$
f
4
2
4
0
g
5
1
3
0
42
Semantic Analysis
Type checkingGrammar + semantic rule/semantic action= Semantic Analysis
There are two notations for attaching semantic rules:
1. Syntax Directed Definitions.-> High-level specification hiding many implementation details (also called
Attribute Grammars).
2. Syntax DirectedTranslation Schemes. More implementation oriented: Indicate the order in which semantic
rules are to be evaluated.
Syntax Directed Translation(SDT)
Grammar + Semantic Action = SDT
Syntax Directed Definition(SDD)
Grammar + Semantic Rule = SDD
Semantic Rule: Assigns value to node
Semantic Action: Embeds program fragments
SDD and SDT for infix to postfix Computation
SDT Scheme
Syntax Directed Translation
E -> E+T { print ‘+’}
E -> E-T { print ‘-’}
E ->T
T ->0
{ print ‘0’}
T ->1
{ print ‘1’}
…
T ->9
{ print ‘9’}
Syntax Directed Definition
E -> E+T { E.code = E.code ||T.code||’+’}
E -> E-T { E.code = E.code ||T.code||’-’}
E ->T
{ E.code = T.code}
T ->0
{ T.code =’0’}
T ->1
{ T.code =’1’}
…
T ->9
{ T.code ‘9’}
43
· SDT/SDD can be used apart from parsing
1. to store type information in the symbol table
2. to build syntax tree
3. to issue error messages.
4. to perform continuous checks like type checking.
5. to generate intermediate code or target code
Annoted parse tree
A parse tree which stores attribute value at each and every node is called as annotated parse tree.
Example:
E -> E+T | T
T -> T *F | F
F -> num
Input string w= 1+2*3
44
Classification of attribute
Based on the process of evaluation the attributes are classified into two types:
1. Synthesized attribute
2. Inherited attribute
Synthesized attribute
Value of a node is evaluated in terms of its children node
Eg. A -> XYZ
A.i = F (X.i | Y.i |Z.i)
Inherited attribute
Value of a node is evaluated in terms either from parent or siblings node
Eg. A -> XYZ
X.i = F (A.i | Y.i |Z.i)
Y.i = F(A.i | X.i)
45
Types of SDD
1. S-Attributed Definitions
2. L-attributed Definitions
● S-Attributed Definitions
Definition. An S-Attributed Definition is a Syntax Directed Definition that uses only synthesized attributes.
• Evaluation Order. Semantic rules in a S-Attributed Definition can be evaluated by a bottom-up, or PostOrder,
traversal of the parse-tree.
• Example. The above arithmetic grammar is an example of an S-Attributed
Definition. The annotated parse-tree for the input 3*5+4 is
Example:
L ->E
{ Print (E.val)}
E -> E1 + T
{ E.val = E1.val+ T.val }
E -> T
{ E.val = T .val}
T -> T 1 * F
{ T.val = T 1.val + F.val}
T -> F
{ T.val = F.val}
F -> id
{ F.val = id.lexval}
●
L-Attributed Definitions
Definition. An L-Attributed Definition is a Syntax Directed Definition that uses synthesized attributes and
inherited attributed.
Example of L –attributed SDD
D -> TL
L.in = T.type
T -> int
T.type =integer
T->real
T.type=real
L->L1,id
L1.in = L.in
addtype(id.entry,L.in)
L->id
addtype(id.entry,L.in)
addtype is a function which will add datatype of variable into the symbol table
46
Example for construction of abstract syntax tree:
Mknode(value, leftchild pointer,rightchild pointer)
Mkleaf(num,num.val)
E -> E1 + T
E.nptr = mknode(‘+’, E1.nptr, T.nptr)
E -> E1 - T
E.nptr = mknode(‘-’, E1.nptr, T.nptr)
E -> T
E.nptr = T.nptr
T -> (E)
T.nptr = E.nptr
T ->id
T.nptr = mkleaf (id,id.entry)
T-> num
T.nptr = mkleaf(num, num.val)
mkleaf is a function creating leaf node with value equal to id.entry
mknode is a function creating node with two children
1st argument=value, 2nd argument = left child, 3rd argument = right child
7 + 8 -6
47
48
Intermediate Code
Need of Intermediate code
49
50
51
52
53
1.
i=1
2.
j=1
3.
t1= 10 * i
4.
t2 = t1 + j
t2= 20 +2=22
5.
t3 = 8 * t2
t3 = 8 * 22 = 176
6.
t4 = t3 -88
t4 = 176-88 =88
7.
a[t4]= 0.0
8.
t1= 20
j=j+1
54
9.
if j <=10 goto 3
10. i= i + 1
11. if i <=10 goto 2
12. i = 1
13. t5 = i -1 t5 =0
14. t6 = 88 * t5
15. a[t6]= 1.0
16. i = i + 1
17. if i < = 10 goto 13
Initialization of identity matrix
a[13]= a(base address) + 13(index). Why 8= Size of each element in an array,10, 88, 0.0, 1
Row Major order, Column major order
int a[2][2];
int required 4 bytes and memory is byte addressable
a[1][1]-> a[0][0]
a[2][2]-> a[1][1]
a[0][0]
a[0][1]
a[1][0]
a[1][1]
Row major order
10
20
30
40
a[0][0]
a[0][1]
a[0][2]
A[0][3]
50
60
70
55
80
90
100
110
120
A[0][9]
A[1][0]
A[1][1]
1000,
1001,
1002,
1003
1008
1016
1024
1032 1040
1048
1056
1064
1072
Column Major order
a[0][0]
a[1][0]
a[0][1]
a[1][1]
Scanf(“%d”,&a)
printf(“%d”, b);
printf(“%d”, a[1][]);
1 d array
Base address + index *size of each element
1000 + 3*4= 1000 + 12= 1012
2 d array
Base address + ((rownumber * number of columns)+column number) * size of each element
1000 + ((1*2)+1)*4= 1000 + 3 * 4 = 1012
56
1080
10881096
57
Code Optimization
58
59
BASIC BLOCKS AND FLOW GRAPHS
60
61
62
63
Code Optimization Techniques
1.
2.
3.
4.
5.
6.
7.
8.
Constant Folding
Constant Propagation
Common Subexpression elimination
Copy/Variable Propagation
Code movement
Loop invariant computation
Strength reduction
Dead code elimination
Peephole Optimization
In while loop c is live . Out of while loop c is dead.
Data flow analysis
# include <stdio.h>
void main()
{
float k =20;
int i =10, j= 20;
while (i)
{
int c=10;
sum= sum + c * i*j;
}
prrint (“%d”,c);
}
common subexpression
t1 = 4 * i
t2 = i * 5
64
● Data Flow analysis
● Uses of Data Flow analysis
65
● Data flow analysis schema
66
● Reaching Definition problem
67
68
Node
GEN
KILL
B1
{}
{}
B2
{}
{}
B3
{}
{}
B4
{}
{}
B5
{}
{}
Computing IN and OUT
Initially
IN= Φ
OUT = GEN
69
Initial
Node
IN
OUT
B1
Φ
{1,2}
B2
Φ
{3}
B3
Φ
{4}
B4
Φ
{5}
B5
Φ
{6}
First Iteration
IN( B2) = OUT ( B1 ) U OUT ( B5)
= {1,2} U {6}= { 1,2,6}
IN( B3 ) = OUT (B2)
OUT ( B3 ) = IN (B3) - KILL( B3) U GEN (B3)
= { 2,3} -{2,5} U {4} = {3,4}
IN (B4) = OUT ( B3)
OUT ( B4) = IN (B4) - KILL( B4) U GEN (B4)
= { 3,4} -{2,4} U {5 } = { 3,5}
70
OUT ( B5) = IN (B5) - KILL( B5) U GEN (B5)
= { 3,4,5} - {1,3} U { 6 } = {4,5, 6}
Node
IN
OUT
B1
{Φ}
{Φ}-{3,4,5,6}U{1,2}
B2
{1,2,6}
{1,2,6}-{1,6} U{3}={ 2,3}
B3
{2,3}
{ 3,4}
B4
{ 3,4}
{ 3,5}
B5
{ 3,4,5}
{4,5, 6}
Second Iteration
Node
IN
OUT
IN
OUT
B1
B2
B3
B4
B5
Third Iteration
Node
B1
B2
B3
B4
71
B5
Fourth Iteration
Node
IN
OUT
B1
B2
B3
B4
B5
72
73
80 to 20 % principle
Money 80% of the total money possessed by 20 % of the people across globe.
Program
80% of the total execution time consumed by 20% of the code(Loops).
Flow graph -> loop Reducible flow graph
Loop optimization:
Loop unrolling
loop invariant statement
Sorting Techniques,Matrix Multiplication, Longest common subsequence, Job scheduling, Knapsack Problem,
Rod cutting,Searching technique, Travelling salesman,Graph coloring, Minimum Spanning trees, Shortest
Path,
74
Any Array: Sort
Yes and no
Optimization: A ---- B Shoretest
Input : Graph
Output: Whether there is a path between A and B?
Answer: Yes/No
Problem
1.Decidable Problem
eg. Sorting, finding a path
between two node.
2. Undecidable Problem
Algorithm is absent
Algorithm is present but computation cost is too high.
A problem whose output cannot be determined.
Hamiltonian cycle,
1,00,000
4 colorable
eg. 4 /8 queen problems
eg. Checking grammar is ambiguous.
eg.Travelling salesman problem
Program - P -> How many registers will be required to execute the code?
Register - Values of variables
Code has 100 Variables:
Processor: 8085/ 8086
Own memory
75
Memory Hierarchy
Heuristic Based Algorithm - A* Greedy Best first search.
Fast Look up ->
Speed increases
Register-------cache --------- Main Memory
MA MM > MA C > MMR
Instruction Pipelining
Jump -> Instruction stall
76
Download