PL&C Lab, DongGuk University

advertisement
Compiler Lecture Note, Intermediate Language
Page 1
컴파일러 입문
제 9 장
중 간 언어
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 2
Contents
• Introduction
• Polish Notation
• Three Address Code
• Tree Structured Code
• Abstract Machine Code
• Concluding Remarks
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 3
Introduction
• Compiler Model
Source
Program
Lexical Analyzer
tokens
Syntax Analyzer
AST
Back-End
Semantic Analyzer
Intermediate
Code Generator
IL
Code Optimizer
IC
Front-End
Target Code Generator
Object
Program
Front-End- language dependant part
Back-End - machine dependant part
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 4
• IL의 필요성
– Modular Construction
–
–
–
–
–
Automatic Construction
Easy Translation
Portability
Optimization
Bootstrapping
• IL의 분류
– Polish Notation
--- Postfix, IR
– Three Address Code --- Quadruple, Triple, Indirect triple
– Tree Structured Code --- PT, AST, TCOL
– Abstract Machine Code --- P-code, EM-code, U-code, Bytecode
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 5
• Two level Code Generation
Source
Front-End
ILS
ILS-ILT
ILT
Back-End
Target
• ILS
– 소스로부터 자동화에 의해 얻을 수 있는 형태
– 소스 언어에 의존적이며 high level이다.
• ILT
– 후단부의 자동화에 의해 목적기계로의 번역이 매우 쉬운 형태
– 목적기계에 의존적이며 low level이다.
• ILS to ILT
– ILS에서 ILT로의 번역이 주된 작업임.
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 6
Polish Notation
☞ Polish mathematician Lucasiewiez invented the parenthesis-free notation.
• Postfix(Suffix) Polish Notation
• earliest IL
• popular for interpreted language - SNOBOL, BASIC
– general form :
e1
e2 ... ek OP
(k ≥ 1)
where, OP : k_ary operator
ei : any postfix expression (1 ≤ i ≤ k)
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 7
– example :
if a then if c-d then a+c else a*c else a+b
〓〉a L1 BZ c d - L2 BZ a c + L3 BR
L2: a c * L3 BR L1: a b + L3:
– note
1) high level:
source to IL - fast & easy translation
IL to target - difficulty
2) easy evaluation - operand stack
3) optimization 부적당 - 다른 IL로의 translation 필요
4) parentheses free notation - arithmetic expression
– interpretive language에 적합
Source
Translator
Postfix
Evaluator
Result
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 8
• Internal Representation(IR)
– low-level prefix polish notation - addressing structure of target
machine
• compiler-compiler IL - table driven code generation
– IR program - a sequence of root-level IR expression
– IR expression:
OP e1 e2 ... ... ek (k ≥ 1)
where, OP: k-ary operator - 1-1 correspondence with target machine
instruction.
┌─ root-level operator - not appear in an operand
│
⇒ root-level IR expression.
└─ internal operator - appear in an operand
⇒ internal IR expression.
ei : operand --- single symbol or internal IR expression.
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 9
– example
D := E
⇔ := + d r ↑ + e r
where,
r
: local base register
d, e
: location of variable D and E
+
: additive operator
↑
: unary operator giving the value of the location
:=
: assignment operator(root-level)
– example
FOR D := E TO F DO Loop body;
D :=
TEMP
GOTO
1: Loop
D :=
2: IF D
E;
:= F;
2
body
D + 1;
<= TEMP THEN GOTO 1;
:= + d r ↑+ e r
:= + temp r ↑+ f r
j L2
:L1 Loop body
:= + d r + ↑+ d r 1
:L2 <= L1 ? ↑+ d r ↑+ temp r
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 10
– Note
1) Shift-reduce parser --- prefix : fewer states than postfix
2) Several addressing mode
┌─ prefix : operator만 보고 결정(no backup)
└─ postfix : backup 필요
ex) assumption: first operand computed in register r.
r.1 ::= (/ d. 1 r. 2)
r.1 ::= (+ r. 1 r. 2)
┌ prefix - [r -> / . d r]
│ first operand changed to d and continue
└ postfix - [r -> . d r /] [r -> . r r +]
shift r, shift r and block([r -> r r . +]) ⇒ backup
3) Easy translation
IR to target - easy
source to IR - difficulty
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 11
Three Address Code
• most popular IL, optimizing compiler
• General form:
A := B op C
where,
A : result address
B, C : operand addresses
op : operator
(1) Quadruple - 4-tuple notation
<operator>,<operand1>,<operand2>,<result>
(2) Triple - 3-tuple notation
<operator>,<operand1>,<operand2>
(3) Indirect triple - execution order table & triples
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 12
– example
•A ← B + C * D / E
•F ← C * D
Indirect Triple
Quadruple
Triple
Operations
Triple
* C D T1
(1) * C D
1.(1)
(1) * C D
/ T1 E T2
(2) / (1) D
2.(2)
(2) / (1) E
+ B T2 T3
(3) + B (2)
3.(3)
(3) + B (2)
 T3 A
(4)  A (3)
4.(4)
(4)  A (3)
* C D T4
(5) * C D
5.(1)
(5)  F (1)
 T4 F
(6)  F (5)
6.(5)
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 13
• Note
• Quadruple vs. Triple
– quadruple - optimization 용이
– triple removal of temporary addresses
⇒ Indirect Triple
• extensive code optimization 용이
–
IL rearrange 가능 (triple 제외)
• easy translation - source to IL
• difficult to generate good code
– quadruple to two-address machine
– triple to three-address machine
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 14
Tree Structured Code
• Abstract Syntax Tree
– parse tree에서 redundant한 information 제거.
• ┌ leaf node
--- variable name, constant
└ internal node --- operator
– [예제 8] --- Text p.377
{ x = 0;
y = z + 2 * y;
while ((x<n) and (v[x] != z)) x = x+1;
return x;
}
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 15
• Tree Structured Common Language(TCOL)
– Variants of AST - containing the result of semantic analysis.
– TCOL operator - type & context specific operator
– Context
┌ value
├ location
├ boolean
└ statement
ex)
-----------------
rhs of assignment statement
lhs of assignment statement
conditional control statement
statement
. : operand --result
---
while : operand --result ---
location
value
boolean, statement
statement
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 16
int a; float b;
...
b = a + 1;
Example)
AST:
assign
b
TCOL:
add
a
assign
b
float
1
addi
.
1
a
– Representation ----- graph orientation
┌ internal notation -----└ external notation ------
efficient
debug, interface
linear graph notation
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 17
• Note
– AST ----- automatic AST generation(output of parser)
Parser Generator ┌ leaf node specification
└ operator node specification
– TCOL ----- automatic code generation : PQCC
(1) intermediate level:
high level --- parse tree like notation
control structure
low level --- data access
(2) semantic specification: dereferencing, coercion, type specific
operator
dynamic subscript and type checking
(3) loop optimization ----high level control structure
easy reconstruction
(4) extensibility ----- define new TCOL operator
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 18
Abstract Machine Code
• Motivation
• ┌ rapid development of machine architectures
└ proliferation of programming languages
– portable & adaptable compiler design --- P_CODE
• porting --- rewriting only back-end
– compiler building system --- EM_CODE
M front-ends
+
N back-ends
M compilers for N target machines
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 19
• Model
source
program
front
-end
interface
abstract
machine
code
back
-end
target
code
target
machine
abstract machine
interpreter
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 20
• Pascal-P Code
• Pascal P Compiler --- portable compiler producing P_CODE
for an abstract machine(P_Machine).
• P_Machine ----- hypothetical stack machine designed for
Pascal language.
(1) Instruction --- closely related to the PASCAL language.
(2) Registers
┌
│
│
└
(3) Memory
┌ CODE --- instruction part
└ STORE --- data part(constant area, stack, heap)
PC --- program counter
NP --- new pointer
SP --- stack pointer
MP --- mark pointer
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 21
CODE
PC
STORE
MP current activation record
stack
SP
NP
heap
constant area
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 22
Ucode
 Ucode
 the intermediate form used by the Stanford Portable Pascal compiler.
 stack-based and is defined in terms of a hypothetical stack machine.
 Ucode Interpreter : Appendix B.
 Addressing
 stack addressing ===> a tuple : (B, O)
 B : the block number containing the address
 O : the offset in words from the beginning of the block,
offsets start at 1.
 label
 to label any Ucode instruction with a label field.
 All targets of jumps and procedures must be labeled.
 All labels must be unique for the entire program.
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 23
 Example :
 Consider the following skeleton :
program main
procedure P
procedure Q
var i : integer;
j : integer;
 block number
 main
 P
 Q
: 1
: 2
: 3
 variable addressing
 i
 j
:
:
(3,1)
(3,2)
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 24
 Ucode Operations(35개)
 Unary
 Binary
--- notop, neg
--- add, sub, mult, divop, modop, swp
andop, orop, gt, lt, ge, le, eq, ne
 Stack Operations
--- lod, str, ldr, ldp
 Immediate Operation --- ldc
 Control Flow
--- ujp, tjp, fjp, cal, ret
 Range Checking
--- chkh, chkl
 Indirect Addressing --- ixa, sta
 Procedure Specification
 Program Specification
--- proc, endop
--- bgn
 Procedure Calling Sequence
 Symbol Table Information
--- cal
--- sym
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 25
 Example :
 x = a + b * c;
lod 1 1
lod 1 2
lod 1 3
mult
add
str
14
/* a */
/* b */
/* c */
/* x */
 if (a>b) a = a + b;
lod 1 1 /*
lod 1 2 /*
gt
fjp next
lod 1 1 /*
lod 1 2 /*
add
str 1 1 /*
next
a */
b */
a */
b */
a */
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 26
 Indirect Addressing
 is used to access both array elements and var parameters.
 ixa --- indirect load
 replace stacktop by the value of the item at location stacktop.
 to retrieve A[i] :
lod
i
/* actually (Bi, Oi)) */
ldr
A
/* also (block number, offset) */
add
/* effective address */
ixa
/* indirect load gets contents of A[i] */
 to retrieve var parameter x :
lod
x
/* loads address of actual - since x is var */
ixa
/* indirect load */
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
• sta
Page 27
--- indirect store
– sta stores stacktop into the address at stack[stacktop-1],
both items are popped.
– A[i] = j;
lod i
ldr A
add
lod j
sta
– x := y, where x is a var parameter
lod x
lod y
sta
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 28
 Procedure Calling Sequence
 procedure definition :
 procedure A(var a : integer; b,c : integer);
 procedure call :
 A(x, expr1, expr2);
 calling sequence :
ldp
ldr
…
…
cal
x
/* load the address of actual for var parameter */
/* code to evaluate expr1 --- left on the stack */
/* code to evaluate expr2 --- left on the stack */
A
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 29
 Ucode Interpreter
 The Ucode interpreter is called ucodei, it’s source is on
plac.dongguk.ac.kr.
 The interpreter uses the following files :
 *.ucode : file containing the Ucode program.
 *.lst
: Ucode listing and output from the program.
 Ucode format
label-field
1-10
op-code
12-m
operand-field
m+2
 m is exactly enough to hold opcode.
 label field --- a 10 character label(make sure its 10 characters pad with blanks)
 op-code --- starts at 12 column.
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 30
Programming Assignment #3
• 부록 B에 수록된 Ucode 인터프리터를 각자 PC에
설치하고 100이하의 소수(prime number)를 구하는
프로그램을 Ucode로 작성하시오.
– 다른 문제의 프로그램을 작성해서 제출해도 됨.
– Ucode 인터프리터 출력 리스트를 제출.
• 참고 :
– #1 : recursive-decent parser
– #2 : MiniPascal LR parser
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Page 31
Concluding Remarks
• IL criteria
– intermediate level
– input language --- high level
– output machine --- low level
– efficient processing
– translation --- source to IL, IL to target
– interpretation
– optimization
– extensibility
– external representation
– clean separation
– language dependence & machine dependence
PL&C Lab, DongGuk University
Compiler Lecture Note, Intermediate Language
Polish
Notation
Page 32
Three Address
Code
Tree Structured
Code
AST
TCOL
Abstract
Machine
Code
B
C
A
B
B
B
A
B
C
A
B
B
C
A
A
B
B
B
B
C
C
A
C
B
A
C
A
A
B
external
representation
A
A
A
A
C
B
A
extensibility
A
A
A
A
A
A
B
clean separation
C
B
B
B
C
A
A
IL
Criteria
intermediate level
source to IL
transration
IL to target
translation
efficie
nt
proce
ssing interpretation
optimization
Post
IR
Quadra Triple
C
B
B
A
C
C
A : 좋다
B : 보통이다
C : 나쁘다
PL&C Lab, DongGuk University
Download