How to make the computer understand? •Write a program using a programming language Fall 2005 • Microprocessors talk in assembly language Lecture 15: Putting it all together Program written in a Programming Languages From parsing to code generation 2 Saman Amarasinghe Unoptimized Code expr: .LFB2: pushq .LCFI0: movq .LCFI1: movl movl movl imull movl incl imull movl incl imull sall movl movl leave ret int expr(int n) { int d; d = 4 * n * n * (n + 1) * (n + 1); return d; } 3 6.035 6.035 ©MIT Fall 1998 Example (Output assembly code) Example (input program) Saman Amarasinghe Assembly Language Translation Compiler ©MIT Fall 1998 Saman Amarasinghe Anatomy of a Computer Optimized Code expr: .LFB2: movl imull incl imull imull sall ret %rbp %rsp, %rbp %edi, -4(%rbp) -4(%rbp), %eax %eax, %edx -4(%rbp), %edx -4(%rbp), %eax %eax %eax, %edx -4(%rbp), %eax %eax %edx, %eax $2, %eax %eax, -8(%rbp) -8(%rbp), %eax 4 %edi, %eax %edi, %eax %edi %edi, %eax %edi, %eax $2, %eax 6.035 ©MIT Fall 1998 Lexical Analysis Program (character stream) Lexical Analyzer (Scanner) Token Stream Syntax Analyzer (Parser) Parse Tree Semantic Analyzer • Lexical analyzer create tokens out of a text stream • Tokens are defined using regular expressions Intermediate Representation Code Optimizer Optimized Intermediate Representation Code Generator Assembly code Saman Amarasinghe 5 6.035 ©MIT Fall 1998 Saman Amarasinghe 6 6.035 ©MIT Fall 1998 1 Lexical Analysis Examples of Regular Expressions Regular Expression a Strings matched “a” a·b a|b ε a* (a | ε) · b num = 0|1|2|3|4|5|6|7|8|9 posint = num · num* int = (ε | -) · posint real = int · (ε | (. · posint)) “ab” “a” “b” “” “” “a” “aa” “aaa” … “ab” “b” “0” “1” “2” “3” … “8” “6035” … “-42” “1024” … “-12.56” “12” “1.414”... 7 Saman Amarasinghe 6.035 ©MIT Fall 1998 Syntax Analysis (parsing) 9 6.035 ©MIT Fall 1998 <expr> num * Saman Amarasinghe Saman Amarasinghe <expr> 6.035 ©MIT Fall 1998 10 6.035 ©MIT Fall 1998 • Defining a language using context-free grammars • Classification of Grammars Context free unambiguous LR(k) LR(1) LALR(1) SLR(1) LR(0) <expr> ( 8 Syntax Analysis (parsing) num ‘*’ ‘(‘ num ‘+’ num ‘)’ <op> – Transformation algorithm – Executing a DFA is straightforward <expr> → <expr> <op> <expr> <expr> → ( <expr> ) <expr> → - <expr> <expr> → num <op> → + <op> → * Parse Tree Example <expr> – by simple construction • NFA is transformed to a DFA Example: A CFG for expressions • Defining a language using context-free grammars Saman Amarasinghe • Lexical analyzer create tokens out of a text stream • Tokens are defined using regular expressions • Regular expressions can be mapped to Nondeterministic Finite Automatons (NFA) ) regular <expr> <op> num Saman Amarasinghe 11 + <expr> num 6.035 ©MIT Fall 1998 Saman Amarasinghe 12 6.035 ©MIT Fall 1998 2 31 LR(k) Parser Engine Symbol Stack ( shift to s2 error shift to s2 error reduce (2) reduce (3) ACTION ) error error shift to s5 shift to s4 reduce (2 ) reduce (3 ) $ error accept error error reduce (2) reduce (3) Goto X goto s1 goto s3 Parser Action State Stack State s0 s1 s2 s3 s4 s5 State s0 s1 s2 s3 s4 s5 s6 Figure by MIT OCW. Current Symbol s0 <S> <X> <X> <Y> <Y> s5 <S> → <X> • $ 13 6.035 ©MIT Fall 1998 Semantic Analysis Saman Amarasinghe Goto $ reduce (5 ) X goto s5 s1 ( Y <X> <Y> <Y> <Y> Y goto s6 goto s3 goto s3 reduce ( 3 or 5 ) reduce (5 ) error reduce (4 ) accept reduce (2) →(• → ( • <Y> ) → • ( <Y> ) →• ( s2 ( → • <X> $ →•Y →•( → • ( <Y> ) →• X Saman Amarasinghe ACTION ) reduce (5 ) reduce (5) reduce (5 ) sh ift to s4 reduce (4 ) error error ( shift to s1 sh ift to s2 shift to s2 error error error error <Y> → ( • <Y> ) <Y> → • ( <Y> ) <Y> → • Y Y s3 <Y> → ( <Y> • ) s6 14 ) s4 <X> → <Y> • <Y> → ( <Y> ) • 6.035 ©MIT Fall 1998 Translation to Intermediate Format • Building a symbol table • Static Checking • Goal: Remain Largely Machine Independent But Move Closer to Standard Machine Model – From high-level IR to a low-level IR – Flow-of-control checks – Uniqueness checks – Type checking • Eliminate Structured Flow of Control • Convert to a Flat Address Space • Dynamic Checking – Array bounds check – Null pointer dereference check Saman Amarasinghe 15 6.035 ©MIT Fall 1998 Saman Amarasinghe Code Optimizations – Exploit architectural strengths – Hide architectural weaknesses 17 6.035 ©MIT Fall 1998 6.035 ©MIT Fall 1998 Code Optimizations • Generate code as good as hand-crafted by a good assembly programmer • Have stable, robust performance • Abstract the architecture away from the programmer Saman Amarasinghe 16 6.035 ©MIT Fall 1998 • Algebraic simplification • Common subexpression elimination • Copy propagation • Constant propagation • Dead-code elimination • Register allocation • Instruction Scheduling Saman Amarasinghe 18 3 Compiler Project! Compiler Derby • You guys build a full-blown compiler from the ground up!!! • From decaf to working code • Who has the fastest compiler in the east??? • Will give you the program 12 hours in advance – Test and make all the optimizations work – DO NOT ADD PROGRAM SPECIFIC HACKS! • Wednesday, December 14th at 11:00AM location TBA – refreshments provided Saman Amarasinghe 19 6.035 ©MIT Fall 1998 How will you use 6.035 knowledge? • As an informed programmer • As a designer of simple languages to aid other programming tasks • As an engineer dealing with new computer architectures • As a true compiler hacker Saman Amarasinghe 21 6.035 ©MIT Fall 1998 20 Saman Amarasinghe 6.035 ©MIT Fall 1998 1. Informed Programmer • Now you know what the compiler is doing – don’t treat it as a black box – don’t trust it to do the right thing! • Implications – performance – debugging – correctness 22 Saman Amarasinghe 6.035 ©MIT Fall 1998 25 1. Informed Programmer 2. Language Extensions • What did you learned in 6.035? • In many applications and systems, you may need to: – How optimizations work or why they did not work – How to read and understand optimized code – implement a simple language • handle input • define an interface • command and control – extend a language • add new functionality • modify semantics • help with optimizations Saman Amarasinghe 23 6.035 ©MIT Fall 1998 Saman Amarasinghe 24 6.035 ©MIT Fall 1998 4 29 2. Language Extensions 3. Computer Architectures • What you learned in 6.035 • Many special purpose processors – define tokens and languages using regular expressions and CFGs – use tools such as jlex, lex, javacup, yacc – build intermediate representations – perform simple transformations on the IR Saman Amarasinghe 25 6.035 – in your cell phone, car engine, watch, etc. etc. • Designing new architectures • Adapting compiler back-ends for new architectures ©MIT Fall 1998 3. Designing New Architectures • Great advances in VLSI technology Saman Amarasinghe 26 6.035 ©MIT Fall 1998 3. Designing New Architectures • What did you learned in 6.035 – very fast and small transistors – scaling up to billion transistors – but, slow and limited wires and I/O – Capabilities of a compiler: what is simple and what is hard to do – How to think like a compiler writer • A computer architecture is a combination of hardware and compiler – need to know what a compiler can do and what hardware need to do – If compiler can do it don’t waste hardware resources. Saman Amarasinghe 27 6.035 ©MIT Fall 1998 Saman Amarasinghe 3. Back-end support – Even if the ISA is the same, different resource constrains – How to handle new features 29 6.035 ©MIT Fall 1998 3. Back-end support • Every new architecture need a new backend • Instruction scheduling Saman Amarasinghe 28 6.035 ©MIT Fall 1998 • What do you learned in 6.035 – – – – – – Intermediate representations Transforming/optimizing the IR Process of generating assembly from a high-level IR Assembly interface issues (eg: calling conventions) Register allocation issues Code scheduling issues Saman Amarasinghe 30 6.035 ©MIT Fall 1998 5 36 36 4. Compiler Hacking 4. Compiler Hacking • Theory: • Theory – Develop general, abstract concepts – Prove correctness, optimality etc. • Algorithms • Implementation • Examples – – – – 31 Saman Amarasinghe 6.035 ©MIT Fall 1998 parse theory lattices and data-flow abstract interpretation The language ML 4. Compiler Hacking 6.035 ©MIT Fall 1998 4. Compiler Hacking • Algorithms: • Implementation: – Design a solution to a given problem (Mostly new optimizations) – Use many techniques such as graph theory, number theory, etc. – May have to limit the scope and find good heuristics • Examples – partial redundancy elimination – register allocation by graph coloring – using multi-granular (MMX) operations Saman Amarasinghe 32 Saman Amarasinghe 33 6.035 ©MIT Fall 1998 4. Compiler Hacking – Develop a new compiler – Issues of designing a very complex software – Putting theory and algorithms into practice • Examples – A JIT for Java – A query optimization engine for SQL – A rasterizer for postscript Saman Amarasinghe 34 6.035 ©MIT Fall 1998 Where to Look for Current Research? • PLDI – Programming Languages Design and Implementation Conference • Code Generation / Machine specific • What did you learned in 6.035? – Micro Conference – ASPLOS – Architecture Support for Programming Languages and Operating Systems – CGO – Code Generation and Optimization • Language Theory – POPL – Principles of Programming Languages – OOPSLA – Object Oriented Programming Systems Languages and Applications • Program Analysis – SAS – Static Analysis Symposium – PPoPP – Principles and Practice of Parallel Programming Saman Amarasinghe 35 6.035 ©MIT Fall 1998 Saman Amarasinghe 36 6.035 ©MIT Fall 1998 6