File

advertisement
Chapter 1
Language Processor
Introduction
• Semantic gap
• Solve by PL
Application
domain
– Design and coding
– PL implementation steps
•
•
Introduced new PL Domain
Specification Gap:
–
•
Semantic gap between two specification
of same task.
Execution Gap:
–
Gap between the semantics of the program
written in different programming language.
Execution
domain
Semantic Gap
Application
domain
Specification
gap
PL
Domain
Execution
domain
Execution
Gap
Language Processor
• Definition: LP is a software which bridges a specification or
execution gap.
• Parts of LP:
–
–
–
–
Language translator: bridges an execution gap like compiler, assembler
Detranslator
Preprocessor
language migrator
• Interpreter: is a language processor which bridges an
execution gap without generating m/c lang. program.
• Problem oriented lang.
– Less specification gap, more execution gap
• Procedure oriented lang.
– More specification gap, less specification gap
Language processing activities
• Program generation activity
Application
domain
Program
generator
domain
Target
PL
Domain
Specification
Gap
• Program Execution activity:
– Translation and Interpretation
Execution
Domain
• Program Translation
– Translate program from SL to m/c language.
• Characteristics
– A program must be translated before it can be executed.
– A translated program may saved in a file and saved program may
be executed repeatedly.
– A program must be retranslated following modifications.
• Program Interpretation:
– Reads the source program and stores in to memory.
– Determines it meaning and performs action.
Program interpretation and Execution
• Program Execution
– Fetch the instruction cycle
– Decode the instruction to determine the
operation.
– Execute the instruction
• Program Interpretation
– Fetch the statement
– Analyze the instruction to determine the meaning.
– Execute the statement
Comparison
• ?????
Fundamentals of language processing
• LP= Analysis of SP+ Synthesis of TP.
• Analysis of SP
– Lexical rule: valid lexical units
– Syntax rule: formation of valid statements
– Semantic rule: Associate mening with valid
statements.
Phases of LP
•
Source
program
Analysis
Phase
Errors
•
IR
Synthesis
Phase
Target
program
Errors
Forward Reference: A forward reference of a program entity is a reference to the
entity which precedes its definition in the program.
Ex. struct s
{ struct t *pt};
.
. struct t
{ struct s *ps };
•
Issues concerning memory requirements and organization of LP.
Passes of LP
• Language Processor pass: A language processor pass is the
processing of every statement in a source program, or its equivalent
representation, to perform language processing function.
– Pass-I: Perform Analysis of SP.
– Pass-II : Perform synthesis of TP.
Intermediate Representation of Programs
• Intermediate Representation:
An Intermediate
representation(IR) is a representation of a source program which reflects
the effect of some, but not all, analysis and synthesis tasks performed
during language processing.
SP
Front End
Back End
Intermediate
Representation(IR)
TP
IR and Semantic actions
• Properties of IR
– Ease of Use:
– Processing Efficiency:
– Memory efficiency: compact
• Semantic actions:
– All actions performed by the front end, except lexical and
syntax analysis are called semantic actions., which includes
• Checking semantic validity
• Determine the meaning
• Constructing IR
Toy Compiler
• Gcc or cc compiler- c or c++
• Toy compiler- ???
– Front End
• Lexical analysis
• Syntax analysis
• Semantic analysis
– Back End
• Memory Allocation
• Code Generation
Symbol
table
Generation
Symbol
Type
a
int
b
float
temp
int
length
address
Front End
• Lexical (scanning)
– Ex. a:=b+i ; id#2 op#5 id #3 op#3 id #1 op#10
• Syntax (Parsing)
real
a,b : real;
a:=b+i;
a
b
• Semantic: IC tree is generated
Back End
• Memory Allocation:
Symbol
Type
length
address
a
int
2
2000
b
float
4
2002
temp
int
2
2006
• Code Generation: Generating Assembly Lang.
– issues:
• Determine the places where IR should be kept.
• Determine which instructions should be used for type conversion.
• Determine which addressing mode should be used for accessing variables.
Fundamentals of Language Specification
• Programming language Grammars:
– Terminal symbols
• lowercase letters, punctuation marks, null
• Concatenation(.)
– Nonterminal symbols: name of syntax category of language
– Productions: called rewriting a rule, is a rule of the grammar
• NT = String of T’s and NT’s.
• Production form:
<article>= a/an/the
<Noun> =<boy ><apple>
<Noun phrase>= <artical><Noun>
Grammar
• Def: A grammar G of a language Lg is a quadruple (∑,SNT,S,P)
where,
–
–
–
–
∑ is the set of terminals
SNT is the set of NT’s
S is the distinguished symbol
P is the set of productions
• Ex: Derive a sentence “A boy ate an apple”
–
–
–
–
–
–
<sentence> = <Noun Phrase> <verb phrase>
<Noun phrase> =<article><Noun>
<verb phrase>=<verb ><noun phrase>
<Article> = a/an/the
<Noun> = boy/apple
<Verb> = ate
Grammar
• Derive a + b * c /5 and construct parse tree.(top down)
–
–
–
–
<exp>=<exp> + <term> | <term>
<term>=<term>*<factor> | <factor>
<factor>=<factor>/<number>
<number>=0/1/2/3/../9
• Classification of grammar:
–
–
–
–
Type-0: phrase structure grammar
Type-1 : context sensitive grammar
Type-2 : context free grammar
Type-3 : linear grammar or regular grammar
Binding
• Definition: A binding is the association of an
attribute of a program entity with a value.
– Static Binding: Binding is a binding performed before the
execution of a program begins.
– Dynamic Binding: Binding is a binding performed after the
execution of a program begins.
Chapter - 3
Unit-2
Scanning
and
Parsing
Role of lexical Analyzer
Scanning
• Definition:
•
•
•
•
Scanning is the process of recognizing the lexical
components in a source string.
Type-3 grammar or regular grammar
Regular grammar used to identify identifiers
Regular language obtained from the operation or , concatenation and Kleen*
Ex. Write a regular expression which used to identify strings which ends with
abb.
– (a+b)*abb.
•
Ex. Write a regular expression which used to identify strings which recognize
identifiers.
– R.E. = (letter)(letter/ digit)*
– Digit = 0/1/2/…/9
– Letter = a/b/c/…./z
Regular Expression and Meaning
r
String r
s
r.s and rs
(r)
r/s or (r/s)
String s
Concatenation of r and s
Same meaning as r
alternation (r or s)
(r)/(s)
[r]
(r)*
Alternation
An optional occurrence of r
0 or more occurrence of string r
(r)+
1 or more occurrence of string r
Examples of regular expression
•
•
•
•
Integer
Real
Real with optional fraction
Identifier
:[+/-](d)†
: [+/-](d)†.(d) †
: [+/-](d)†.(d) *
: l(l/d)*
Example of Regular expression
•
•
•
•
String ending with 0 : (0+1)*0
String ending with 11: (0+1)*11
String with 0 EVEN and 1 ODD. (0+1)*(01*01*)*11*0*
The language of all strings containing exactly two 0’s.
:1*01*01*
• The language of all strings that do not end with 01 :
^+1+(0+1)*+(0+1)*11
Finite state automaton
• FSA: is a triple (S,∑,T) where,
S is a finite set of states,
∑ is the alphabet of source symbols,
T is a finite set of state transitions
FSA
DFA
NFA
DFA from Regular Expression
(0+1)*0
(11+10)*
(0+1)*(1+00) (0+1)*
Transition table from DFA
(0+1)*0
(11+10)*
States/input
0
1
qo
q1
qo
q1
q1
qo
Transition Table
(0+1)*(1+00) (0+1)*
DFA and it’s transition Diagram
Check for the given string aabab
Types of Parser
Types of Parser
Top down Parser
Backtracking
Bottom Up Parser
Predictive Parser
LR Parser
SLR
LR
Shift Reduce
Parser
LALR
Example
• Expression grammar
(with precedence)
# Production rule
1
expr → expr + term
2
| expr - term
3
| term
4 term → term * factor
5
| term / factor
6
| factor
7 factor → number
8
• Input string x – 2 * y
32
| identifier
Example
Rule Sentential form
2
3
6
8
-
expr
expr + term
term + term
factor + term
<id> + term
<id,x> + term
Current position in
the input stream
Input string




x
x
x
x
x
x


–
–
–
–
2
2
2
2
2
2
*
*
*
*
*
*
y
y
y
y
y
y
expr
expr +
term
fact
• Problem:
– Can’t match next terminal
– We guessed wrong at step 2
33
x
term
Backtracking
Rule Sentential form
2
3
6
8
?
expr
expr + term
term + term
factor + term
<id> + term
<id,x> + term
Input string




x
x
x
x
x
x


–
–
–
–
2
2
2
2
2
2
*
*
*
*
*
*
y
y
y
y
y
y
Undo all these
productions
• Rollback productions
• Choose a different production for expr
• Continue
34
Retrying
Rule Sentential form
2
3
6
8
3
7
expr
expr - term
term - term
factor - term
<id> - term
<id,x> - term
<id,x> - factor
<id,x> - <num>
Input string




x
x
x
x
x
x
x
x

–
–
–
–
–
–


2
2
2
2
2
2
2
2

*
*
*
*
*
*
*
*
• Problem:
expr
y
y
y
y
y
y
y
y
expr -
term
term
fact
fact
2
x
– More input to read
– Another cause of backtracking
35
Successful Parse
Rule Sentential form
2
3
6
8
4
6
7
8
expr
expr - term
term - term
factor - term
<id> - term
<id,x> - term
<id,x> - term * fact
<id,x> - fact * fact
<id,x> - <num> * fact
<id,x> - <num,2> * fact
<id,x> - <num,2> * <id>
Input string




x
x
x
x
x
x
x
x
x
x
x

–
–
–
–
–
–
–
–
–



2
2
2
2
2
2
2
2
2
2
2

*
*
*
*
*
*
*
*
*
*
*

y
expr
y
y
y
y
y
y
y
y
y
y

• All terminals match – we’re finished
36
expr -
term
term
term
fact
fact
x
2
*
fact
y
Problems in Top down Parsing
• Backtracking( we have seen)
• Left recursion
• Left Factoring
Left Recursion
Rule Sentential form
2
2
2
2
Input string
 x - 2 *
expr
 x - 2 *
expr + term
 x – 2 *
expr + term + term
 x – 2 *
expr + term + term + term
expr + term + term + term + term  x – 2 *
• Problem: termination
– Wrong choice leads to infinite expansion
(More importantly: without consuming any input!)
– May not be as obvious as this
– Our grammar is left recursive
38
y
y
y
y
y
Rules for Left Recursion
• If A-> Aa1/Aa2/Aa3/………/Aan/b1/b2/…/bn
• After removal of left Recursion
A-> b1A’/b2A’/b3A’
A’-> a1A’/a2A’/є
• Ex. Apply for
• A-> Aa/Ab/c/d
• A-> Ac/Aad/bd/є
Removing Left Recursion
• Two cases of left recursion:
# Production rule
# Production rule
1
expr → expr + term
2
| expr - term
5
| term / factor
3
| term
6
| factor
4 term
→ term * factor
• Transform as follows
# Production rule
1 expr
# Production rule
→ term expr2
4 term
→ factor term2
2 expr2 → + term expr2
5 term2 → * factor term2
3
|
- term expr2
6
4
|
e
| / factor term2
| e
40
Left Factoring
• When the choice between two production is not clear, we
may be able to rewrite the productions to defer decisions is
called as left factoring.
Ex. Stmt-> if expr then stmt else stmt | if expr then stmt
Stmt-> if expr then stmt S’
S’-> if expr then stmt | є
• Rules: if A-> ab1/ab2 then
A-> aA’
A’-> b1/b2
Some examples for Left factoring
• S-> Assig_stmt/call_stmt/other
– Assig_stmt-> id=exp
– call_stmt->id(exp_list)
Recursive Descent Parsing
•
Example
–
Rule 1: S  a S b
Rule 2: S  b S a
Rule 3: S  B
Rule 4: B  b B
Rule 5: B  e
Parse: a a b b b
•
•
•
•
Has to use R1:
SaSb
Again has to use R1: a S b  a a S b b
Now has to use Rule 2 or 3, follow the order (always R2 first):
aaSbbaabSabbaabbSaabbaabbbSaaabb
–
•
Now cannot use Rule 2 any more:  a a b b b B a a a b b  a a b b b B a a a b b  incorrect,
backtrack
After some backtracking, finally tried
–
a S b  a a S b b  a a b B b b  a a b b b  worked
Predicative Parsing
• Need to immediately know which rule to apply
when seeing the next input character
– If for every non-terminal X
• We know what would be the first terminal of each X’s
production
• And the first terminal of each X’s production is different
– Then
• When current leftmost non-terminal is X
• And we can look at the next input character
•  We know exactly which production should be used next
to expand X
Predicative Parsing
• Need to immediately know which rule to apply
when seeing the next input character
– If for every non-terminal X
• We know what would be the first terminal of each X’s
production
First terminal is a
FirstX’s
terminal
is b
• And the first terminal of each
production
is different
– Example
Rule 1: S  a S b
Rule 2: S  b S a
Rule 3: S  B
Rule 4: B  b B
Rule 5: B  e
If next input is a, use R1
If next input is b, use R2
But, R3’s first terminal is also b
Won’t work!!!
Predicative Parsing
• Need to immediately know which rule to apply when
seeing the next input character
– If for every non-terminal X
• We know what would be the first terminal of each X’s production
• And the first terminal of each X’s production is different
– What grammar does not satisfy the above?
• If two productions of the same non-terminal have the same first
symbol (N or T), you can see immediately that it won’t work
– SbSa|bB
– SBa|BC
• If the grammar is left recursive, then it won’t work
– S  S a | b B, B  b B | c
– The left recursive rule of S can generate all terminals that the other
productions of S can generate
» S  b B can generate b, so, S  S a can also generate b
Predicative Parsing
• Need to rewrite the grammar
– Left recursion elimination
• This is required even for recursive descent parsing
algorithm
– Left factoring
• Remove the leftmost common factors
First()
• First() = { t |  * t }
– Consider all possible terminal strings derived
from 
– The set of the first terminals of those strings
• For all terminals t  T
– First(t) = {t}
First()
• For all non-terminals X  N
– If X  e  add e to First(X)
– If X  1 2 … n
• i is either a terminal or a non-terminal (not a string as usual)

• Add all terminals in First(1) to First(X)
– Exclude e
• If e  First(1)  …  e  First(i-1) then
add all terminals in First(i) to First(X)
• If e  First(1)  …  e  First(n) then
add e to First(X)
• Apply the rules until nothing more can be added
• For adding t or e: add only if t is not in the set yet
First()
• Grammar
E  TE’
E’  +TE’ | e
T  FT’
T’  *FT’ | e
F  (E) | id | num
• First
First(*) = {*}, First(+) = {+}, …
First(F) = {(, id, num}
First(T’) = {*, e}
First(T) = First(F) = {(, id, num}
First(E’) = {+, e}
First(E) = First(T) = {(, id, num}
First()
• Grammar
S  AB
A  aA | e
B  bB | e
• First
First(A) = {a, e}
First(B) = {b, e}
First(S) = First(A) ={a, e}
Is this complete?
If we see a If we see b If we see c If we see d
First()
• Grammar
When expanding S
Use R1
Use R2
Use R1
Use R2
When expanding A
Use R3
-
Use R4
-
When expanding B
-
Use R5
-
Use R6
S  AB | B (R1 | R2)
A  aA | c (R3 | R4)
B  bB | d (R5 | R6)
• First
Input: acbd
Expands S, seeing a, use R1: S  AB
Expands A, seeing a, use R3: AB  aAB
Expands A, seeing c, use R4: aAB  acB
Expands B, seeing b, use R5: acB  acbB
Expands B, seeing d, use R6: acbB  acbd
First(A) = {a, c}
First(B) = {b, d}
First(S) = First(A)  First(B) = {a, b, c, d}
• Productions
– First (R1) = {a, c}, First (R2) = {b, d}
– First (R3) = {a}, First (R4) = {c}
– First (R5) = {b}, First (R6) = {d}
If we see a
If we see b
If we see e
When expanding S
Use R1
Use R1
Use R1
When expanding A
Use R2
-
Use R3
When expanding B
-
Use R4
Use R5
First()
• Grammar
S  AB
(R1)
A  aA | e (R2 | R3)
B  bB | e (R4 | R5)
• First
Input: aabb
Use R1: S  AB
Expands A, seeing a, use R2: AB  aAB
Expands A, seeing a, use R2: aAB  aaAB
Expands A, seeing b, What to do? Not in
table!
First(A) = {a, e}
First(B) = {b, e}
First(S) = First(A)  First(B) ={a, b, e}
• Productions
– First (R1) = {a, b, e}
– First (R2) = {a}, First (R3) = {e}
– First (R4) = {b}, First (R5) = {e}
Follow()
• Follow() = { t | S * t }
– Consider all strings that may follow 
– The set of the first terminals of those strings
• Assumptions
– There is a $ at the end of every input string
– S is the starting symbol
• For all non-terminals only
– Add $ into Follow(S)
– If A  B  add First() – {e} into Follow(B)
– If A  B or
A  B and e  First()
 add Follow(A) into Follow(B)
Follow()
Grammar
S  AB
A  aA | e
B  bB | e
• First
First(A) = {a, e}
First(B) = {b, e}
First(S) = First(A) ={a, b, e}
• Productions
If we see a
If we see b
When expanding S
Use R1
Use R1
When expanding A
Use R2
?
-
Use R4
When expanding B
– First (R1) = {a, b, e}
If we see a
– First (R2) = {a}, First (R3) = {e}
– First (R4) = {b}, First (R5)
{e} S Use R1
When =
expanding
• Follow
(R1)
(R2 | R3)
(R4 | R5)
When expanding A
If we see b If we see $
Use R2
When expanding B
– Follow(S) = {$}
– Follow(B) = Follow(S) = {$}
– Follow(A) = First(B)  Follow(S) = {b, $}
-
• Since e  First(B), Follow(S) should be in Follow(A)
Use R1
Use R1
Use R3
Use R3
Use R4
Use R5
Construct a Parse Table
•
Construct a parse table M[N, T{$}]
– Non-terminals in the rows and terminals in the columns
•
For each production A  
– For each terminal a  First()
 add A   to M[A, a]
•
Meaning: When at A and seeing input a, A   should be used
•
Meaning: When at A and seeing input a, A   should be used
– If e  First() then for each terminal a  Follow(A)
 add A   to M[A, a]
– In order to continue expansion to e
– X  AC A  B B  b | e C  cc
– If e  First() and $  Follow(A)
 add A   to M[A, $]
•
Same as the above
First() and Follow() – another
example Grammar
–
–
–
–
–
–
First(*) = {*}
First(F) = {(, id, num}
First(T’) = {*, e}
First(T) = First(F) = {(, id, num}
First(E’) = {+, e}
First(E) = First(T) = {(, id, num}
– Follow(E) = {$, )}
– Follow(E’) = Follow(E) = {$, )}
– Follow(T) = {$, ), +}
E  TE’
E’  +TE’ | e
T  FT’
T’  *FT’ | e
F  (E) | id | num
•
•
Since we have TE’ from first two rules and E’ can be e
Follow(T) = (First(E’)–{e})  Follow(E’)
•
Follow(F) = (First(T’)–{e})  Follow(T’)
– Follow(T’) = Follow(T) = {$, ), +}
– Follow(F) = {*, $, ), +}
Construct a Parse Table
Grammar
E  TE’
E’  +TE’ | e
T  FT’
T’  *FT’ | e
F  (E) | id | num
First(*) = {*}
First(F) = {(, id, num}
First(T’) = {*, e}
First(T) {(, id, num}
First(E’) = {+, e}
First(E) {(, id, num}
Follow(E) = {$, )}
Follow(E’) = {$, )}
Follow(T) = {$, ), +}
Follow(T) = {$, ), +}
Follow(T’) = {$, ), +}
Follow(F) = {*, $, ), +}

FT’:
First(FT’)
={(,e:
id,
num} = {*}
E’ E’+TE’:
T=e:
First(+TE’)
Follow(E’)
T’ 
=={+}
T’
*FT’:
{$,)}
First(*FT’)
Follow(T’)
= {$, ), +}
E  TE’: First(TE’)
{(,
id, num}
E
id
num
E  TE’
E  TE’
*
E’  +TE’
T  FT’
T  FT’
F  id
F  num
)
$
E’  e
E’  e
T’  e
T’  e
T  FT’
T’  *FT’
T’
F
(
E  TE’
E’
T
+
T’  e
F  (E)
Pop F from stack
Remove id from input
Pop T’ from stack
Input unchanged
+TE’: Only TE’ in stack
Remove + from input
E
id
num
E  TE’
E  TE’
Stack
Input
Action
E$
id + num * id $
ETE’
T E’ $
id + num * id $
T FT’
F T’ E’ $
id + num * id $
F  id
T’ E’ $
+ num * id $
T’  e
E’ $
+ num * id $
E’  +TE’
T E’ $
num * id $
T FT’
F T’ E’ $
num * id $
F  num
T’ E’ $
* id $
T’  *FT’
F T’ E’ $
id $
F  id
T’ E’ $
$
T’  e
E’ $
$
E’  e
*$
(
E’  +TE’
T  FT’
T  FT’
F  id
F  num
Accept
)
$
E’  e
E’  e
T’  e
T’  e
T  FT’
T’  *FT’
T’
F
$
E  TE’
E’
T
+
T’  e
F  (E)
More about LL Grammar
a
b
$
• What grammar is not LL(1)?
S
SA S B SA
SB
SA|B
A A  aaA
Ae
A  aaA | e
B B  abB B  b
B  abB | b
• First(A) = {a, e}, First(B) = {a, b}, First(S) = {a, b, e}
• Follow(S) = {$}, Follow(A) = {$}, Follow(B) = {$}
– But this grammar is LL(2)
• If we lookahead 2 input characters, predictive parsing is possible
• First2(A) = {aa, e}, First2(B) = {ab, b$}, First2(S) = {aa, ab, b$, e}
aa
ab
b$
$
S
SA
SB
SB
SA
A
A  aaA
B
Ae
B  abB
Bb
ba, bb, a$
A Shift-Reduce Parser
E  E+T | T
T  T*F | F
F  (E) | id
Right-Most Derivation of id+id*id
E  E+T  E+T*F  E+T*id  E+F*id
 E+id*id  T+id*id  F+id*id  id+id*id
Right-Most Sentential Form
id+id*id
F+id*id
T+id*id
E+id*id
E+F*id
E+T*id
E+T*F
E+T
E
Reducing Production
F  id
TF
ET
F  id
TF
F  id
T  T*F
E  E+T
Handles are red and underlined in the right-sentential forms
A Stack Implementation of A ShiftReduce Parser
• There are four possible actions of a shift-parser action:
1. Shift : The next input symbol is shifted onto the top of the stack.
2. Reduce: Replace the handle on the top of the stack by the nonterminal.
3. Accept: Successful completion of parsing.
4. Error: Parser discovers a syntax error, and calls an error recovery
routine.
• Initial stack just contains only the end-marker $.
• The end of the input string is marked by the endmarker $.
A Stack Implementation of A ShiftReduce Parser
Stack
Input
Action
$
$id
$F
$T
$E
$E+
$E+id
$E+F
$E+T
$E+T*
$E+T*id
$E+T*F
$E+T
$E
id+id*id$shift
+id*id$
+id*id$
+id*id$
+id*id$
id*id$
*id$
*id$
*id$
id$
$
$
$
$
reduce by F  id
reduce by T  F
reduce by E  T
shift
shift
reduce by F  id
reduce by T  F
shift
shift
reduce by F  id
reduce by T  T*F
reduce by E  E+T
accept
Operator-Precedence Parser
• Operator grammar
– small, but an important class of grammars
– we may have an efficient operator precedence parser (a shift-reduce
parser) for an operator grammar.
• In an operator grammar, no production rule can
have:
– e at the right side
– two adjacent non-terminals at the right side.
•
Ex:
EAB
Aa
Bb
not operator grammar
EE+E |
E*E |
E/E | id
operator grammar
Precedence Relations
• In operator-precedence parsing, we define three disjoint
precedence relations between certain pairs of terminals.
a <. b
a =· b
a .> b
b has higher precedence than a
b has same precedence as a
b has lower precedence than a
• The determination of correct precedence relations
between terminals are based on the traditional notions of
associativity and precedence of operators. (Unary minus
causes a problem).
Using Operator-Precedence Relations
• The intention of the precedence relations is to find the
handle of a right-sentential form,
<. with marking the left end,
=· appearing in the interior of the handle, and
.> marking the right hand.
• In our input string $a1a2...an$, we insert the
precedence relation between the pairs of terminals
(the precedence relation holds between the terminals
in that pair).
Using Operator -Precedence Relations
E  E+E | E-E | E*E | E/E | E^E | (E) | -E | id
The partial operator-precedence
table for this grammar.
id +
id
*
$
.>
.>
.>
+
<.
.>
<.
.>
*
<.
.>
.>
.>
$
<. <. <.
• Then the input string id+id*id with the precedence
relations inserted will be:
$ <. id .> + <. id .> * <. id .> $
To Find The Handles
1. Scan the string from left end until the first .> is
encountered.
2. Then scan backwards (to the left) over any =· until a <.
is encountered.
3. The handle contains everything to left of the first .>
and to the right of the <. is encountered.
$ <. id .> + <. id .> * <. id .> $
$ <. + <. id .> * <. id .> $
$ <. + <. * <. id .> $
$ <. + <. * .> $
$ <. + .> $
$$
E  id
E  id
E  id
E  E*E
E  E+E
$ id + id * id $
$ E + id * id $
$ E + E * id $
$ E + E * .E $
$E+E$
$E$
Operator-Precedence Parsing
Algorithm -- Example
id + * $
.> .> .>
id
stack
input
$
$id
$
$+
$+id
$+
$+*
$+*id
$+*
$+
$
id+id*id$
+id*id$
+id*id$
id*id$
*id$
*id$
id$
$
$
$
$
action
$ <. id
shift
id .> +
shift
shift
id .> *
shift
shift
id .> $
* .> $
+ .> $
accept
reduceE  id
reduce
E  id
reduce
reduce
reduce
E  id
E  E*E
E  E+E
+
<.
.>
<.
.>
*
<.
.>
.>
.>
$
<. <. <.
Chapter - 6
Unit-6
Introduction to Compiler
Aspects of Compilation
• Compiler bridges semantic gap between a PL
domain and an execution domain.
• Two aspects of compilations are:– Generate code to implement meaning of a source program in
execution domain.
– Provide diagnostics for violations of PL semantics in a source program.
– Data Types
– Data Structures
– Scope rules
– Control Structures
Three address Code
• In three-address code, there is at most one operator on the right side of an
instruction; that is, no built-up arithmetic expressions are permitted.
• Example: A source-language expression x+y*z
might be translated into the sequence of three-address instructions below
where tl and tz are compiler-generated temporary names.
• Generate code for x=a+b+c+d
• Generate the code for x= -a *b + -a *b
Quadruple
OP
t1 = uminus a
t2 = t1*b
t3 = uminus a
t4 = t3*b
t5 = t2 + t4
X=t5
Arg1
(0)
uminus a
(1)
*
(2)
uminus
t1
Arg2
Result
t1
b
a
t2
t3
(3)
*
t3
b
t4
(4)
+
t2
t4
t5
(5)
=
t5
x
Triple
OP
t1 = uminus a
t2 = t1*b
t3 = uminus a
t4 = t3*b
t5 = t2 + t4
X=t5
Arg1
(0)
uminus a
(1)
*
(2)
uminus
(0)
Arg2
B
A
(3)
*
(2)
B
(4)
+
(1)
(3)
(5)
=
x
(4)
Indirect Triples
t1 = uminus a
t2 = t1*b
t3 = uminus a
t4 = t3*b
t5 = t2 + t4
X=t5
OP
Arg1
(0)
uminus
a
(1)
*
(0)
(2)
uminus
Arg2
B
A
statement
(0)
(11)
(1)
(12)
(2)
(13)
(3)
*
(2)
B
(3)
(14)
(4)
+
(1)
(3)
(4)
(15)
(5)
=
x
(4)
(5)
(16)
Example
• Construct Quadruple , Triple , Indirect Triple Representations of
a=b*-c+b*-c
Example
(c) Indirect triple
Aspects of compilation
• A compiler bridges a specification gap between a PL
domain and an execution domain.
• Generate code to implement meaning of a source program in
the execution domain.
• Provide diagnosis for violation of a PL semantics in a source
program
– PL features are:
• Data types: Specification of legal values for variables of the type
• Data structures:
• Scope rules: Accessibility of variables declared in different blocks
of a program.
• Control structures:
Memory Allocation
• Memory binding: is an association between
the ‘memory address’ attribute of a data item
and the address of memory area.
– Static memory Allocation:
• Allocates Before Execution
– Dynamic memory Allocation
• Allocates After Execution
Static memory Allocation
• Program consist Three units A,B,C.
Code(A)
Procedure A
Data(A)
B()
Procedure B
Data (B)
C()
Procedure C
Data (C)
Data(A)
Code(B)
Data(B)
Code(C)
Data(C)
• Advantage??????
Dynamic memory Allocation
• Program consist Three units A,B,C.
• Program A is active Data(A) is allocated
Code(A)
Procedure A
Data(A)
B()
Procedure B
Data (B)
C()
Procedure C
Data (C)
Code(B)
Code(C)
Data(A)
Dynamic memory Allocation
• Program consist Three units A,B,C.
• Pro. A calls B. and Data(B) gets allocated
Code(A)
Procedure A
Data(A)
B()
Procedure B
Data (B)
C()
Procedure C
Data (C)
Code(B)
Code(C)
Data(B)
Dynamic memory Allocation
• Program consist Three units A,B,C.
• Pro. B calls C. and Data(C) gets allocated
Code(A)
Procedure A
Data(A)
B()
Procedure B
Data (B)
C()
Procedure C
Data (C)
Code(B)
Code(C)
Data(C)
Dynamic memory Allocation
• Different Scenario….
Procedure A
Data(A)
B()
C()
Procedure B
Data (B)
Procedure C
Data (C)
Code(A)
Code(A)
Code(A)
Code(B)
Code(B)
Code(A)
Code(C)
Code(C)
Code(A)
Data(A)
Data(A)
Data(A)
Data(B)
Data(C)
(b)
(c)
(a)
• Memory allocation in Block structured language(same as above.)
• Advantage??????
Stack
• Last In, First Out (LIFO) data structure
main ()
{ a(0);
}
void a (int m)
{ b(1);
}
void b (int n)
{ c(2);
}
void c (int o)
{ d(3);
}
void d (int p)
{
}
stack
Stack Pointer
Stack Pointer
Stack Pointer
Stack Pointer
Stack Pointer
85
Stack
grows
down
• Activation Records:
• also called frames
• Information(memory) needed by a single execution of a
procedure
• A general activation record:
Store result of function call
Information of actual
Parameter
Non local data of other
Procedure
Return value
actual parameters
optional control link
optional access link
Store local data
machine status
Points to calling Procedure
Information of Program counter
local variables
temporaries
Store temporary value
Activation Record for Factorial
Program
main()
{
int f;
f=factorial(3);
}
int factorial(int n)
{
if(n==1)
{
return 1;
}
else{
return(n*factorial(n-1));
}
}
Activation Record for Factorial
Program
Activation Record for Factorial
Program
– Parameter passing
• The method to associate actual parameters with
formal parameters.
• The parameter passing method will effect the code
generated.
• Call-by-value:
– The actual parameters are evaluated and their r-values
are passed to the called procedure.
– Implementation:
» a formal parameter is treated like a local name, so
the storage for the formals is in the activation record
of the called procedure.
» The caller evaluates the actual parameters and places
their r-values in the storage for the formals.
– Call-by-reference:
• also called call-by address or call-by-location.
• The caller passes to the called procedure a pointer
to the storage address of each actual parameter.
– Actual parameter must have an address -- only variables
make sense, an expression will not (location of the
temporary that holds the result of the expression will be
passed).
– Copy-restore:
• A hybrid between call-by-value and call-byreference.
– The actual parameters are evaluated and its r-values are
passed to the called procedure as in call-by-value.
– When the control returns, the r-value of the formal
parameters are copied back into the l-value of the
actuals.
Download