Compiler Lecture Note, Intermediate Language Page 1 컴파일러 입문 제 9 장 중 간 언어 PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 2 Contents • Introduction • Polish Notation • Three Address Code • Tree Structured Code • Abstract Machine Code • Concluding Remarks PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 3 Introduction • Compiler Model Source Program Lexical Analyzer tokens Syntax Analyzer AST Back-End Semantic Analyzer Intermediate Code Generator IL Code Optimizer IC Front-End Target Code Generator Object Program Front-End- language dependant part Back-End - machine dependant part PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 4 • IL의 필요성 – Modular Construction – – – – – Automatic Construction Easy Translation Portability Optimization Bootstrapping • IL의 분류 – Polish Notation --- Postfix, IR – Three Address Code --- Quadruple, Triple, Indirect triple – Tree Structured Code --- PT, AST, TCOL – Abstract Machine Code --- P-code, EM-code, U-code, Bytecode PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 5 • Two level Code Generation Source Front-End ILS ILS-ILT ILT Back-End Target • ILS – 소스로부터 자동화에 의해 얻을 수 있는 형태 – 소스 언어에 의존적이며 high level이다. • ILT – 후단부의 자동화에 의해 목적기계로의 번역이 매우 쉬운 형태 – 목적기계에 의존적이며 low level이다. • ILS to ILT – ILS에서 ILT로의 번역이 주된 작업임. PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 6 Polish Notation ☞ Polish mathematician Lucasiewiez invented the parenthesis-free notation. • Postfix(Suffix) Polish Notation • earliest IL • popular for interpreted language - SNOBOL, BASIC – general form : e1 e2 ... ek OP (k ≥ 1) where, OP : k_ary operator ei : any postfix expression (1 ≤ i ≤ k) PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 7 – example : if a then if c-d then a+c else a*c else a+b 〓〉a L1 BZ c d - L2 BZ a c + L3 BR L2: a c * L3 BR L1: a b + L3: – note 1) high level: source to IL - fast & easy translation IL to target - difficulty 2) easy evaluation - operand stack 3) optimization 부적당 - 다른 IL로의 translation 필요 4) parentheses free notation - arithmetic expression – interpretive language에 적합 Source Translator Postfix Evaluator Result PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 8 • Internal Representation(IR) – low-level prefix polish notation - addressing structure of target machine • compiler-compiler IL - table driven code generation – IR program - a sequence of root-level IR expression – IR expression: OP e1 e2 ... ... ek (k ≥ 1) where, OP: k-ary operator - 1-1 correspondence with target machine instruction. ┌─ root-level operator - not appear in an operand │ ⇒ root-level IR expression. └─ internal operator - appear in an operand ⇒ internal IR expression. ei : operand --- single symbol or internal IR expression. PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 9 – example D := E ⇔ := + d r ↑ + e r where, r : local base register d, e : location of variable D and E + : additive operator ↑ : unary operator giving the value of the location := : assignment operator(root-level) – example FOR D := E TO F DO Loop body; D := TEMP GOTO 1: Loop D := 2: IF D E; := F; 2 body D + 1; <= TEMP THEN GOTO 1; := + d r ↑+ e r := + temp r ↑+ f r j L2 :L1 Loop body := + d r + ↑+ d r 1 :L2 <= L1 ? ↑+ d r ↑+ temp r PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 10 – Note 1) Shift-reduce parser --- prefix : fewer states than postfix 2) Several addressing mode ┌─ prefix : operator만 보고 결정(no backup) └─ postfix : backup 필요 ex) assumption: first operand computed in register r. r.1 ::= (/ d. 1 r. 2) r.1 ::= (+ r. 1 r. 2) ┌ prefix - [r -> / . d r] │ first operand changed to d and continue └ postfix - [r -> . d r /] [r -> . r r +] shift r, shift r and block([r -> r r . +]) ⇒ backup 3) Easy translation IR to target - easy source to IR - difficulty PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 11 Three Address Code • most popular IL, optimizing compiler • General form: A := B op C where, A : result address B, C : operand addresses op : operator (1) Quadruple - 4-tuple notation <operator>,<operand1>,<operand2>,<result> (2) Triple - 3-tuple notation <operator>,<operand1>,<operand2> (3) Indirect triple - execution order table & triples PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 12 – example •A ← B + C * D / E •F ← C * D Indirect Triple Quadruple Triple Operations Triple * C D T1 (1) * C D 1.(1) (1) * C D / T1 E T2 (2) / (1) D 2.(2) (2) / (1) E + B T2 T3 (3) + B (2) 3.(3) (3) + B (2) T3 A (4) A (3) 4.(4) (4) A (3) * C D T4 (5) * C D 5.(1) (5) F (1) T4 F (6) F (5) 6.(5) PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 13 • Note • Quadruple vs. Triple – quadruple - optimization 용이 – triple removal of temporary addresses ⇒ Indirect Triple • extensive code optimization 용이 – IL rearrange 가능 (triple 제외) • easy translation - source to IL • difficult to generate good code – quadruple to two-address machine – triple to three-address machine PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 14 Tree Structured Code • Abstract Syntax Tree – parse tree에서 redundant한 information 제거. • ┌ leaf node --- variable name, constant └ internal node --- operator – [예제 8] --- Text p.377 { x = 0; y = z + 2 * y; while ((x<n) and (v[x] != z)) x = x+1; return x; } PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 15 • Tree Structured Common Language(TCOL) – Variants of AST - containing the result of semantic analysis. – TCOL operator - type & context specific operator – Context ┌ value ├ location ├ boolean └ statement ex) ----------------- rhs of assignment statement lhs of assignment statement conditional control statement statement . : operand --result --- while : operand --result --- location value boolean, statement statement PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 16 int a; float b; ... b = a + 1; Example) AST: assign b TCOL: add a assign b float 1 addi . 1 a – Representation ----- graph orientation ┌ internal notation -----└ external notation ------ efficient debug, interface linear graph notation PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 17 • Note – AST ----- automatic AST generation(output of parser) Parser Generator ┌ leaf node specification └ operator node specification – TCOL ----- automatic code generation : PQCC (1) intermediate level: high level --- parse tree like notation control structure low level --- data access (2) semantic specification: dereferencing, coercion, type specific operator dynamic subscript and type checking (3) loop optimization ----high level control structure easy reconstruction (4) extensibility ----- define new TCOL operator PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 18 Abstract Machine Code • Motivation • ┌ rapid development of machine architectures └ proliferation of programming languages – portable & adaptable compiler design --- P_CODE • porting --- rewriting only back-end – compiler building system --- EM_CODE M front-ends + N back-ends M compilers for N target machines PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 19 • Model source program front -end interface abstract machine code back -end target code target machine abstract machine interpreter PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 20 • Pascal-P Code • Pascal P Compiler --- portable compiler producing P_CODE for an abstract machine(P_Machine). • P_Machine ----- hypothetical stack machine designed for Pascal language. (1) Instruction --- closely related to the PASCAL language. (2) Registers ┌ │ │ └ (3) Memory ┌ CODE --- instruction part └ STORE --- data part(constant area, stack, heap) PC --- program counter NP --- new pointer SP --- stack pointer MP --- mark pointer PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 21 CODE PC STORE MP current activation record stack SP NP heap constant area PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 22 Ucode Ucode the intermediate form used by the Stanford Portable Pascal compiler. stack-based and is defined in terms of a hypothetical stack machine. Ucode Interpreter : Appendix B. Addressing stack addressing ===> a tuple : (B, O) B : the block number containing the address O : the offset in words from the beginning of the block, offsets start at 1. label to label any Ucode instruction with a label field. All targets of jumps and procedures must be labeled. All labels must be unique for the entire program. PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 23 Example : Consider the following skeleton : program main procedure P procedure Q var i : integer; j : integer; block number main P Q : 1 : 2 : 3 variable addressing i j : : (3,1) (3,2) PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 24 Ucode Operations(35개) Unary Binary --- notop, neg --- add, sub, mult, divop, modop, swp andop, orop, gt, lt, ge, le, eq, ne Stack Operations --- lod, str, ldr, ldp Immediate Operation --- ldc Control Flow --- ujp, tjp, fjp, cal, ret Range Checking --- chkh, chkl Indirect Addressing --- ixa, sta Procedure Specification Program Specification --- proc, endop --- bgn Procedure Calling Sequence Symbol Table Information --- cal --- sym PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 25 Example : x = a + b * c; lod 1 1 lod 1 2 lod 1 3 mult add str 14 /* a */ /* b */ /* c */ /* x */ if (a>b) a = a + b; lod 1 1 /* lod 1 2 /* gt fjp next lod 1 1 /* lod 1 2 /* add str 1 1 /* next a */ b */ a */ b */ a */ PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 26 Indirect Addressing is used to access both array elements and var parameters. ixa --- indirect load replace stacktop by the value of the item at location stacktop. to retrieve A[i] : lod i /* actually (Bi, Oi)) */ ldr A /* also (block number, offset) */ add /* effective address */ ixa /* indirect load gets contents of A[i] */ to retrieve var parameter x : lod x /* loads address of actual - since x is var */ ixa /* indirect load */ PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language • sta Page 27 --- indirect store – sta stores stacktop into the address at stack[stacktop-1], both items are popped. – A[i] = j; lod i ldr A add lod j sta – x := y, where x is a var parameter lod x lod y sta PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 28 Procedure Calling Sequence procedure definition : procedure A(var a : integer; b,c : integer); procedure call : A(x, expr1, expr2); calling sequence : ldp ldr … … cal x /* load the address of actual for var parameter */ /* code to evaluate expr1 --- left on the stack */ /* code to evaluate expr2 --- left on the stack */ A PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 29 Ucode Interpreter The Ucode interpreter is called ucodei, it’s source is on plac.dongguk.ac.kr. The interpreter uses the following files : *.ucode : file containing the Ucode program. *.lst : Ucode listing and output from the program. Ucode format label-field 1-10 op-code 12-m operand-field m+2 m is exactly enough to hold opcode. label field --- a 10 character label(make sure its 10 characters pad with blanks) op-code --- starts at 12 column. PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 30 Programming Assignment #3 • 부록 B에 수록된 Ucode 인터프리터를 각자 PC에 설치하고 100이하의 소수(prime number)를 구하는 프로그램을 Ucode로 작성하시오. – 다른 문제의 프로그램을 작성해서 제출해도 됨. – Ucode 인터프리터 출력 리스트를 제출. • 참고 : – #1 : recursive-decent parser – #2 : MiniPascal LR parser PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Page 31 Concluding Remarks • IL criteria – intermediate level – input language --- high level – output machine --- low level – efficient processing – translation --- source to IL, IL to target – interpretation – optimization – extensibility – external representation – clean separation – language dependence & machine dependence PL&C Lab, DongGuk University Compiler Lecture Note, Intermediate Language Polish Notation Page 32 Three Address Code Tree Structured Code AST TCOL Abstract Machine Code B C A B B B A B C A B B C A A B B B B C C A C B A C A A B external representation A A A A C B A extensibility A A A A A A B clean separation C B B B C A A IL Criteria intermediate level source to IL transration IL to target translation efficie nt proce ssing interpretation optimization Post IR Quadra Triple C B B A C C A : 좋다 B : 보통이다 C : 나쁘다 PL&C Lab, DongGuk University