Code generation Course "Software Language Engineering" University of Koblenz-Landau Department of Computer Science Ralf Lämmel Software Languages Team © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 1 Motivation © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 2 2! 1! (Intermediate) code generation Intermediate Code Generation! •! Facilitates retargeting: enables attaching a back end for the new machine to an existing front end! Lexical Analysis Intermediate" and" Front end! Back end! code! Lexical Analyzer Generators! Target" machine" code! •! Enables3! machine-independent code optimization! Chapter Intermediate code (representation; IR) facilitates retargeting (addressing a new back end) and machine-independent optimization. COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 3 1! Compiler architecture 2! Position of a Code Generator in the Compiler Model! Lexical Analysis and" Code" Lexical Analyzer Front-End!Generators! Optimizer! Source" program! Intermediate" code! Intermediate" code! Code" Generator! Target" program! Chapter 3! Lexical error" Syntax error" Semantic error! Symbol Table! COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 4 Different options for target code Absolute machine code (directly executable code) Relocatable machine code (subject to linking step) Assembly code (which is suitable for debugging) Bytecode (P code, intermediate representation) © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle The focus in this course 5 Obvious requirements Intermediate and final code must be correct. That is, code generation must be semantics-preserving. Code must be fast and resource-friendly. We need a cost model of the code languages. We may need weights in code generation. We may use optimizations to reduce costs. © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 6 1! Intermediate representations targeted by code generation 3! Intermediate Representations! •! Graphical representations Lexical Analysis and" (e.g. AST)! •! Postfix notation: operations on values stored Lexical Analyzer Generators! on operand stack (similar to JVM bytecode)! •! Three-address code: (e.g. triples and quads)" ! x := y op z ! Chapter 3! •! Two-address code:" ! x := op y" which is the same as x := x op y! COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 7 7! 1! Postfix Notation! a := b * -c + b * -c! Lexical Analysis and" a b c uminus * b c uminus * + assign! Bytecode (for example)! Lexical Analyzer Generators! Postfix notation represents! operations on a stack! Chapter 3! Pro: !easy to generate! Cons: !stack operations are more" ! difficult to optimize! iload 2 iload 3 ineg imul iload 2 iload 3 ineg imul iadd istore 1 // // // // // // // // // // push b push c uminus * push b push c uminus * + store a COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 8 8! 1! Three-Address Code! a := b * -c + b * -c! Lexical Analysis and" Lexical Analyzer Generators! t1 t2 t3 t4 t5 a := := := := := := - c b * t1 - c b * t3 t2 + t4 t5 Chapter 3! Linearized representation" of a syntax tree! t1 t2 t5 a := := := := - c b * t1 t2 + t2 t5 Linearized representation" of a syntax DAG! COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 9 Big questions of code generation What is the “language” of intermediate representations? How “context-free” is the language? What is the distance between high-level and IR language? How to define a mapping between the languages? What sort of optimizations are possible on IR? What backends are available for IR? ... © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 10 Syntax-directed translation re h ig h g in p p a m f o s n a e m l a A typic level to lower-level code © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 11 1! 9! Three-Address Statements! •! Assignment statements: x := y op z, x := op y! •! Indexed assignments: x := y[i], x[i] := y! •! Pointer assignments: x := &y, x := *y, *x := y! •! Copy statements: x := y! •! Unconditional jumps: goto lab! Chapter 3! •! Conditional jumps: if x relop y goto lab! •! Function calls: param x… call p, n" return y! Various such “P code machines” exist ... Lexical Analysis and" Lexical Analyzer Generators! COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 12 1! 10! Syntax-Directed Translation into Three-Address Code! Productions" S ! id := E" | while E do S" E ! E + E" | E * E" | - E" 3! | (Chapter E )" | id! | num" Synthesized attributes:" S.code ! !three-address code for S! S.begin! !label to start of S or nil! S.after ! !label to end of S or nil! E.code! !three-address code for E! E.place! !a name holding the value of E" Lexical Analysis and" Lexical Analyzer Generators! gen(E.place ‘:=’ E1.place ‘+’ E2.place)! Code generation! t3 := t1 + t2 COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 13 1! 11! Syntax-Directed Translation into Three-Address Code (cont’d)! Productions" S ! id := E! S ! while E" do S1" E ! E1 + E2" Semantic rules! S.code := E.code || gen(id.place ‘:=’ E.place); S.begin := S.after := nil" (see next slide)! Lexical Analysis and" Lexical Analyzer Generators! E ! E1 * E2" E ! - E1" E.place := newtemp();" E.code := E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘+’ E2.place)" E.place := newtemp();" E.code := E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘*’ E2.place)! E.place := newtemp();" E.code := E1.code || gen(E.place ‘:=’ ‘uminus’ E1.place)! E.place := E1.place" E.code := E1.code! E.place := id.name" E.code := ‘’" E.place := newtemp();" E.code := gen(E.place ‘:=’ num.value)! Chapter 3! E ! ( E1 )" E ! id! E ! num! COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 14 1! 12! Syntax-Directed Translation into Three-Address Code (cont’d)! Lexical Analysis and" S.begin:! Production" S ! while E do S ! Lexical Analyzer Generators! Semantic rule" 1 E.code! if E.place = 0 goto S.after! S.code! goto S.begin! S.begin := newlabel()" S.after := newlabel()! S.after:! …! S.code := gen(S.begin ‘:’) ||" ! E.code ||" ! gen(‘if’ E.place ‘=‘ ‘0’ ‘goto’ S.after) ||" ! S1.code ||" ! gen(‘goto’ S.begin) ||" ! gen(S.after ‘:’)! Chapter 3! COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 15 1! 13! Example! i := 2 * n + k! while i do! i := i - k" Lexical Analysis and" Lexical Analyzer Generators! Chapter 3! t1 := 2 t2 := t1 * n t3 := t2 + k i := t3 L1: if i = 0 goto L2 t4 := i - k i := t4 goto L1 L2: COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 16 18! 1! Symbol Tables for Scoping! struct S { int a; int b; } s; We need a symbol table" for the fields of struct S! Need symbol table" for global variables" and functions! Lexical Analysis and" void swap(int& a, int& b) { int t; t = a; Lexical Analyzer Generators! a = b; b = t; } Chapter 3! void somefunc() { … swap(s.a, s.b); … } Need symbol table for arguments" and locals for each function! Check: s is global and has fields a and b" Using symbol tables we can generate" code to access s and its fields! COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 17 Additional aspects Register allocation Array operations Procedure calls Exceptions No details provided on this matter. See any textbook on compiler construction. This would be another course. ... © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 18 Portable code machines Important kind of target for code generation © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 19 Examples of P code machines Java Virtual Machine MATLAB precompiled code The machine of UCSD Pascal © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 20 Source versus P code public static int gcd(int, int); public static int gcd(int x, int y) { while (x != y) { if (x > y) x = x - y; else y = y - x; } return x; } 0: iload_0 1: iload_1 2: if_icmpeq 24 5: iload_0 6: iload_1 7: if_icmple 17 10: iload_0 11: iload_1 12: isub 13: istore_0 14: goto 0 © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 17: 18: 19: 20: 21: 24: 25: iload_1 iload_0 isub istore_1 goto 0 iload_0 ireturn 21 Characteristics of P code Portability of code generation Simplicity of code generation Simplicity of interpretation Compact size of P code Debugging of P code Efficiency by JITing or hardware implementation © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 22 Variation points in P code machines The extent of using registers Available machine registers (such as SP) The amount of stacks Availability of heap Supported datatypes Calling conventions ... © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 23 A quick look at compiler optimization This is a huge area making up for the bigg er part of compiler construction which we cannot cove r here in any useful way. Some context is provided for m otivation. © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 24 Major classification Peephole optimization (pretty local) Optimizations requiring global program analysis Optimizations requiring IR in “normal” form © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 25 Peephole optimization Perform optimization over small window of code. Typical examples: Constant folding (evaluate constant subexpressions) Strength reduction (replace slow by fast operations) Common sub-expression elimination (simple forms thereof) Remove redundant operations Algebraic transformations Special case operations (inc vs. add 1) ... © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 26 An example for strength reduction for Java Bytecode ... aload 1 aload 1 mul ... ... aload 1 dup mul ... [http://en.wikipedia.org/wiki/Peephole_optimization, 4 Dec 2012] © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 27 An example for removing redundant operations Source Bytecode a = b + c; d = a + e; MOV ADD MOV MOV ADD MOV b, R0 c, R0 R0, a a, R0 e, R0 R0, d # # # # # # Copy Add Copy Copy Add Copy b to the register c to the register the register to a a to the register e to the register the register to d Eliminate past regular generation [http://en.wikipedia.org/wiki/Peephole_optimization, 4 Dec 2012] © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 28 An example for removing redundant operations for Z80 :-) PUSH AF PUSH BC PUSH DE PUSH HL CALL _ADDR1 POP HL POP DE POP BC POP AF PUSH AF PUSH BC PUSH DE PUSH HL CALL _ADDR2 POP HL POP DE POP BC POP AF PUSH AF PUSH BC PUSH DE PUSH HL CALL _ADDR1 POP HL POP DE POP BC PUSH BC PUSH DE PUSH HL CALL _ADDR2 POP HL POP DE POP BC POP AF PUSH AF PUSH BC PUSH DE PUSH HL CALL _ADDR1 POP HL POP DE PUSH DE PUSH HL CALL _ADDR2 POP HL POP DE POP BC POP AF PUSH AF PUSH BC PUSH DE PUSH HL CALL _ADDR1 POP HL PUSH HL CALL _ADDR2 POP HL POP DE POP BC POP AF PUSH AF PUSH BC PUSH DE PUSH HL CALL _ADDR1 CALL _ADDR2 POP HL POP DE POP BC POP AF [http://en.wikipedia.org/wiki/Peephole_optimization, 4 Dec 2012] © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 29 15! 1! Transformations! An exampleAlgebraic for an algebraic transformation on three-address statements •! Change arithmetic operations to transform blocks to algebraic equivalent forms! Lexical Analysis and" Lexical Analyzer Generators! t1 := a - a t2 := b + t1 t3 := 2 * t2 t1 := 0 t2 := b t3 := t2 << 1 Chapter 3! Bit shift being cheaper than multiplication with 2. COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 30 SSA form: a key concept in compiler optimization © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 31 Static single assignment form (SSA form) A property of IR: each variable is assigned exactly once! SSA enables or enhances compiler optimizations Regular SSA y := 1 y := 2 x := y y1 := 1 y2 := 2 x1 := y2 It requires some analysis to see that the first assignment is “dead” and that the third assignment relies on the second. These facts are more obvious in SSA form. See also: http://en.wikipedia.org/wiki/Static_single_assignment_form © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 32 SSA conversion Replace the target of each assignment with a new variable. Replace each reference with the right “version”. Regular y := 1 y := 2 x := y SSA y1 := 1 y2 := 2 x1 := y2 © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 33 Optimizations benefiting from SSA ik n.w //e rg ia.o iped k /wi global value numbering or m strength reduction f ent_ gnm assi partial redundancy elimination e_ ingl c_s tati i/S dead code elimination : http sparse conditional constant propagation : also value range propagation See constant propagation register allocation © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 34 Control-flow graphs © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 35 2! 1! Flow Graphs! •! A flow graph is a graphical depiction of a sequence of instructions with control flow Lexicaledges! Analysis and" Lexical Analyzer Generators! •! A flow graph can be defined at the intermediate code level or target code level! ChapterMOV3!1,R0 MOV n,R1 JMP L2 L1: MUL 2,R0 SUB 1,R1 L2: JMPNZ R1,L1 MOV 0,R0 MOV n,R1 JMP L2 L1: MUL 2,R0 SUB 1,R1 L2: JMPNZ R1,L1 COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 36 1! 3! Basic Blocks! •! A basic block is a sequence of consecutive with exactly one entry point and Lexical instructions Analysis and" one exit point (with natural flow or a branch Lexical Analyzer Generators! instruction)! MOV 1,R0 MOV n,R1 JMP L2 L1: MUL 2,R0 SUB 1,R1 L2: JMPNZ R1,L1 Chapter 3! MOV 1,R0 MOV n,R1 JMP L2 L1: MUL 2,R0 SUB 1,R1 L2: JMPNZ R1,L1 COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 37 1! 4! Basic Blocks and Control Flow Graphs! •! A control flow graph (CFG) is a directed graph with basic blocks Bi as vertices and Lexical Analysis and" with edges Bi ! Bj iff Bj can be executed Lexical Analyzer Generators! immediately after Bi! MOV 1,R0 MOV n,R1 JMP L2 L1: MUL 2,R0 SUB 1,R1 L2: JMPNZ R1,L1 Chapter 3! MOV 1,R0 MOV n,R1 JMP L2 L1: MUL 2,R0 SUB 1,R1 L2: JMPNZ R1,L1 COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 38 1! 5! Successor and Predecessor Blocks! •! Suppose the CFG has an edge B1 ! B2! –! Basic block Band" is a predecessor of B Lexical Analysis –! Basic block B is a successor of B Lexical Analyzer Generators! 1 2 2 Chapter 3! ! 1! MOV 1,R0 MOV n,R1 JMP L2 L1: MUL 2,R0 SUB 1,R1 L2: JMPNZ R1,L1 COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 39 1! 6! Partition Algorithm for Basic Blocks! Input: !A sequence of three-address statements! Output: A list of basic blocks with each three-address statement" in exactly one block! Lexical Analysis and" Lexical Analyzer Generators! 1.! Determine the set of leaders, the first statements if basic blocks! a)! The first statement is the leader! b)! Any statement that is the target of a goto is a leader! c)! Any statement that immediately follows a goto is a leader! 2.! For each leader, its basic block consist of the leader and all" statements up to but not including the next leader or the end" of the program ! Chapter 3! COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 40 1! 7! Loops! •! A loop is a collection of basic blocks, such Lexical Analysis and" that! Lexical Analyzer Generators! –! All blocks in the collection are strongly connected! Chapter 3! –! The collection has a unique entry, and the only way to reach a block in the loop is through the entry! COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 41 1! 8! Loops (Example)! B1:! MOV 1,R0 MOV n,R1 JMP L2 Strongly connected" components:! Lexical Analysis and" Lexical Analyzer Generators! SCC={!{B2,B3}," ! {B4} }! B2:! L1: MUL 2,R0 SUB 1,R1 B3:! L2: JMPNZ R1,L1 Chapter 3! B4:! L3: ADD 2,R2 SUB 1,R0 JMPNZ R0,L3 Entries:! B3, B4! COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 42 1! 10! Transformations on Basic Blocks! •! A code-improving transformation is a code optimization to improve speed or reduce code size! •! Global transformations are performed across basic blocks! •! Local transformations are only performed on single basic blocks! Chapter 3! •! Transformations must be safe and preserve the meaning of the code! Lexical Analysis and" Lexical Analyzer Generators! –! A local transformation is safe if the transformed basic block is guaranteed to be equivalent to its original form! COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 43 1! 11! Common-Subexpression Elimination! •! Remove redundant computations! Lexical Analysis and" Lexical Analyzer Generators! a b c d := := := := b a b a + + - c d c d a b c d := := := := b + c a - d b + c b Chapter 3! t1 t2 t3 t4 := := := := b * c a - t1 b * c t2 + t3 t1 := b * c t2 := a - t1 t4 := t2 + t1 COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 44 12! 1! Dead Code Elimination! •! Remove unused statements! Lexical Analysis b := aand" + 1 b := a + 1 a := b + c … Lexical Analyzer …Generators! Assuming a is dead (not used)! Chapter 3! if true goto L2 b := x + y … Remove unreachable code! COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 45 1! 13! Renaming Temporary Variables! •! Temporary variables that are dead at the end Lexicalof Analysis and" a block can be safely renamed! Lexical Analyzer Generators! t1 := b + c t2 := a - t1 t1 := t1 * d d := t2 + t1 Chapter 3! t1 := b + c t2 := a - t1 t3 := t1 * d d := t2 + t3 Normal-form block! COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 46 1! 14! Interchange of Statements! •! Independent statements can be reordered! Lexical Analysis and" Lexical Analyzer Generators! t1 := b + c t1 := b + c t2 := a - t1 t3 := t1 * d d := t2 + t3 Chapter 3! t3 := t1 * d t2 := a - t1 d := t2 + t3 Note that normal-form blocks permit all" statement interchanges that are possible! COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 47 1! 15! Algebraic Transformations! Lexical Analysis and"operations to transform •! Change arithmetic blocks to algebraic equivalent forms! Lexical Analyzer Generators! Chapter 3! t1 := a - a t2 := b + t1 t3 := 2 * t2 t1 := 0 t2 := b t3 := t2 << 1 COP5621 Compiler Construction" Copyright Robert van Engelen, Florida State University, 2007-2009! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 48 Alias analysis © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 49 Alias analysis -- illustration p.foo = 1; q.foo = 2; i = p.foo + 3; Knowing whether or not two pointer variables alias each other may heavily influence optimizations. There are three possible alias cases here: 1. The variables p and q cannot alias. 2. The variables p and q must alias. 3. It cannot be conclusively determined at compile time if p and q alias or not. If p and q cannot alias, then i = p.foo + 3; can be changed to i = 4. If p and q must alias, then i = p.foo + 3; can be changed to i = 5. In both cases, we are able to perform optimizations from the alias knowledge. On the other hand, if it is not known if p and q alias or not, then no optimizations can be performed and the whole of the code must be executed to get the result. Two memory references are said to have a may-alias relation if their aliasing is unknown. [http://en.wikipedia.org/wiki/Alias_analysis, 4 Dec 2012] © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 50 Alias analysis Alias analysis is a technique in compiler theory, used to determine if a storage location may be accessed in more than one way. Two pointers are said to be aliased if they point to the same location. Alias analysis techniques are usually classified by flow-sensitivity and context-sensitivity. They may determine may-alias or must-alias information. The termalias analysis is often used interchangeably with term points-to analysis, a specific case. [http://en.wikipedia.org/wiki/Alias_analysis, 4 Dec 2012] © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 51 Concluding remarks Basic concepts: P code, SSA, CFG, alias analysis Typical optimizations: constant folding, common subexpression elimination Compiler architecture: lexer, parser, AST, IR, optimization, code generation Compiler implementation frameworks: LLVM © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 52 Code generation and intermediate representations in practice -- The LLVM experience. (Overview and Demo) © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 53 LLVM overview [The LLVM Compiler Framework and Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 54 !!"#$%&'()*+,$-./0+'$ • !!"#$1$!&2$!+3+*$"),045*$#567)8+$$ • 97+$!!"#$%&'()*+,$:8;,5/0,4604,+$ – <,&3)=+/$,+4/5>*+$6&'(&8+80/$;&,$>4)*=)8?$6&'()*+,/$ – @+=46+$07+$A'+B6&/0$0&$>4)*=$5$8+2$6&'()*+,$ – C4)*=$/05A6$6&'()*+,/D$E:9/D$0,56+F>5/+=$&(A')G+,/D$HHH$ • 97+$!!"#$%&'()*+,$I,5'+2&,J$ – K8=F0&F+8=$6&'()*+,/$4/)8?$07+$!!"#$)8;,5/0,4604,+$ – %$58=$%LL$5,+$,&>4/0$58=$5??,+//)3+M$ • E535D$-67+'+$58=$&07+,/$5,+$)8$=+3+*&('+80$ – K')0$%$6&=+$&,$85A3+$6&=+$;&,$NOPD$-(5,6D$<&2+,<%$ %7,)/$!5Q8+,$R$lattner@cs.uiuc.edu © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 55 !"#$$%&#'()#*%++,-%./(&/0$012% • !"$%++,-%!"#$%&'()*+$#%,-.*(/0$( – !"$%./((/0%3)045)4$6%)07%1)#4$16'07$&$07$01%89% – 801$#0)3%:89;%)07%$<1$#0)3%:&$#2'21$01;%#$&#$2$01)=/0% • >%./33$.=/0%/?%@$336'01$4#)1$7%3'A#)#'$2% – >0)3*2$2B%/&=('C)=/02B%./7$%4$0$#)1/#2B%D8!%./(&'3$#B% 4)#A)4$%./33$.=/0%25&&/#1B%&#/E3'04B%F% • >%./33$.=/0%/?%1//32%A5'31%?#/(%1"$%3'A#)#'$2% – >22$(A3$#2B%)51/()=.%7$A544$#B%3'0G$#B%./7$%4$0$#)1/#B% ./(&'3$#%7#'H$#B%(/753)#%/&=('C$#B%F% I"#'2%+)J0$#%K%lattner@cs.uiuc.edu © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 56 !"#$%%&'$()(**$(+,-./#0$ • 10+,$2"#$".3"$/#4#/5$.2$.6$7$62789709$:+,-./#0;$ – (+,-7<=/#$>.2"$62789709$,7?#@/#6$ – A6#6$B(($CDE$($789$(**$-706#0$ C file llvmgcc .o file C++ file llvmg++ .o file Compile Time llvm linker executable Link Time • F.6<83G.6".83$H#72G0#6;$ – A6#6$%%&'$+-<,.I#065$8+2$B(($+-<,.I#06$ – D+$@/#6$:+827.8$%%&'$JK)=L2#:+9#5$8+2$,7:".8#$:+9#$ – MN#:G27=/#$:78$=#$=L2#:+9#$OPJ! 9Q$+0$,7:".8#$:+9#$ ("0.6$%7R8#0$S$lattner@cs.uiuc.edu © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 57 !"#$%%&'$()(**$(+,-./#0$12+345$ 64738708$2+,-./#0$+0973.:7;+3<$=".2"$>?#?$%%&'$7?$ ,.8/#@#/$ABC$ D$%739>79#$?-#2.E2$F0+34G#38$/+=#0?$2+8#$4+$%%&'$AB$ D$%739>79#)4709#4$.38#-#38#34$+-;,.:#0?$.,-0+@#$2+8#$ D$(+8#$9#3#074+0$2+3@#04?$%%&'$2+8#$4+$4709#4$1#H9H$ AIJK5$2+8#$ ("0.?$%7L3#0$D$lattner@cs.uiuc.edu © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 58 !"##$#%&'()*+,'-&).%&+./*/0/#& 1/#2$3'.&"2'&/4&567.'4'.'#8'&+).)*'9'.2-& !"#$%&''(()%*"+#$!"#$,-.$/$ $$$0(#10"$-234$ 5$ !"#$%&''(0).$/$ $$$0(#10"$%&''(()6.4$ 5$ ?($@&"#A$ !"#$%&''(()!"#$-.$/$ $$$0(#10"$-234" 5$ !"#$%&''(0).$/$ $$$0(#10"$%&''(()6.4$ 5$ %*89!'(+$#*$ !"#$%&''(()%*"+#$!"#$7-.$/$ $$$0(#10"$7-234$$$!!"#$#%&'"(%)*" 5$ !"#$%&''(0).$/$ $$$!"#$#894$$$$$$$$$&!!"+,)-."%/0$-,$ $$$#89$:$64$$$$$$$$&!!"#$#%&'"+,%&$" $$$0(#10"$%&''((),#89.4$ 5$ ! ;'!8!"&#(<$'*&<$!"$%&''(($ ! ;'!8!"&#(<$+#*0($!"$%&''(0$ ! ;'!8!"&#(<$+#&%=$+'*#$>*0$ #89 $ 1:.$2&;)<#'.&=&lattner@cs.uiuc.edu © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 59 !"#$%&$'"%&$"()*+$ • ,-./%)-&$%0'-)1)23-*/)(4$(0(4#&%&5$ – 6/&'$3"(07-$'"-$1)2'2'#1-$28$'"-$3(44--$ – 6/&'$/1*('-$(44$3(44$&%'-&$!$9-$:/&'$!"#$$(44$3(44-)&$ – !"('$(;2/'$3(44-)&$2/'&%*-$'"-$')(0&4(<20$/0%'+$ • ,-./%)-&$(4%(&$(0(4#&%&5$ – ,-8-)-03-$32/4*$(4%(&$2'"-)$12%0'-)&$%0$3(44--$ – 6/&'$=029$'"('$42(*-*$>(4/-$*2-&0 '$3"(07-$8)2:$ 8/03<20$-0')#$'2$'"-$42(*$ – 6/&'$=029$'"-$12%0'-)$%&$02'$;-%07$&'2)-*$'")2/7"$ • ,-8-)-03-$:%7"'$02'$;-$'2$($&'(3=$2;?-3'@$ A")%&$B(C0-)$D$lattner@cs.uiuc.edu © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 60 !""#$%&'$%("')*)%(+',('-"./$0)12.)' 3'40)' 00*.&--' 3'("'!!=7' M9"%()%8' --O ' !!=7'>?' E,9+)9' 5"'40)' 366'40)' 00*.&66' 5"'40)' 3"./$0)12.)' J/2.$K)9' 366'("'!!=7' M9"%()%8' 3"./$0)12.)' J/2.$K)9' &--,+ ' --O/0P+ ' &--,+ ' !!=7' =)9$4)9' 7"8$4)8'*)9+$"%'":';33' <.$(+'!!=7'>?',+'()@('40)' !"A)9+'3'BCD'("'!!=7' FG'!!=7'B%,0H+$+'I' J/2.$K,2"%'E,++)+' !!=7'5L-' M$0)'N9$()9' 7"8$4)8'*)9+$"%'":';66' <.$(+'!!=7'>?',+'()@('40)' !"A)9+'366'BCD'("'!!=7' Q),8';0"L,0'<0$.$%,2"%R'>E'3"%+(,%('E9"/,&,2"%R'Q),8'B9&P.)%(' <0$.$%,2"%R'>%0$%$%&R'?),++"-$,2"%R'!>37R'!""/'J/(+R'7)."9H' E9"."2"%R'Q),8'C("9)'<0$.$%,2"%R'BQ3<R'ST' 3U9$+'!,V%)9'W'lattner@cs.uiuc.edu © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 61 !""#$%&'$%("')*)%(+',('-$%#./0)' 1"'2-)' 1"'2-)' 1"'2-)' 1"'2-)' !!@A' !$%#)3' --*0'-$%#)3' )4)56(,7-)' !$%#./0)' B</0$C)3' HI'!!@A'J%,-K+$+'L' B</0$C,/"%'M,++)+' B</"%,--K' $%()3%,-$C)+ N' 0,3#+'0"+('D6%5/"%+',+' $%()3%,-O'("'$0<3"*)'FMB' M)3D)5('<-,5)'D"3',3&60)%(' <3"0"/"%'"</0$C,/"%P' 175'2-)'D"3'!!@A'EFG' 8,/*)'9":)' ;,5#)%:' --5 ' 9'9":)' ;,5#)%:' --5'=0,35>?5 ' 8,/*)')4)56(,7-)' 9'9"0<$-)3' &55 ' 8,/*)' )4)56(,7-)' !$%#'$%'%,/*)'1"'2-)+' ,%:'-$73,3$)+'>)3)' 9>3$+'!,Q%)3'='lattner@cs.uiuc.edu © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 62 © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 63 © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 64 © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 65 © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 66 © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 67 © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 68 © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 69 !"#$%&'()*%+$,-().*%/$0-$1123$ !"#$%&''(()%*"+#$!"#$,-.$/$ $$$0(#10"$,-234$$$!!"#$%&" 5$ !"#$%&''(0).$/$ $$$!"#$64$$$$$$$$$$!!"$'"()%*+$ $$$6$7$84$$$$$$$$$!!"()$,-" $$$0(#10"$%&''(()96.4$ 5$ !"#(0"&'$!"#$:%&''(()!"#,$:-.$/$ $$$:#;<=3$7$'*&>$!"#,$:-$ $$$:#;<=?$7$&>>$!"#$:#;<=3@$3$ $$$0(#$!"#$:#;<=?" 5$ !"#$:%&''(0).$/$ $$$:6$7$&''*%&$!"#$ $$$+#*0($!"#$8@$!"#,$:6$ $$$:#;<=A$7$%&''$!"#$:%&''(()!"#,$:6.$ $$$0(#$!"#$:#;<=A$ 5$ 1.75%#$ 9**$*-'/8:80-#%8$'#%$ .70%#7'*.<%8 $ 40',5$'**-,'6-7$.8$%&)*.,.0$ (-80$="7,6-78$.7$(-80$ %&)*.,.0$.7$0;%$1123$ .7$1123$ #%)#%8%70'6-7$ ,'8%8$ >;#.8$1'?7%#$@$lattner@cs.uiuc.edu © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 70 !"#$%&'()*%+$,%-.#%,$/#'0-12#('320$ !"#$%"&'(!"#()*&''$$+!"#=(),/(0( ((()#12-7(4(':&5(!"#=(),( ((()#12-3(4(&55(!"#()#12-76(7( (((%$#(!"#()#12-3! 8( !"#()*&''$%+/(0( ((()9(4(&'':*&(!"#( (((;#:%$(!"#(<6(!"#=()9( ((()#12->(4(*&''(!"#()*&''$$+!"#=()9/( (((%$#(!"#()#12->( 8( !/5%#$/#'0-12#('320$ :),'/%$'**$8'**$-./%-$21$ 45'06%$/5%$)#2/2/7)%$12#$ 90-%#/$*2',$.0-/#"8320-$ ;<(%(=#%6>$8*%'0-$")$ 8'**%% $ .0/2$'**$8'**%#-$ /5%$1"08320$ /5%$#%-/$ !"#$%"&'(!"#()*&''$$+!"#(),-.&'/(0( ((()#12-3(4(&55(!"#(),-.&'6(7( (((%$#(!"#()#12-3! 8( !"#()*&''$%+/(0( ((()9(4(&'':*&(!"#( (((;#:%$(!"#(<6(!"#=()9( ((()#12-7(4(':&5(!"#=()9( ((()#12->(4(*&''(!"#()*&''$$+)#12-7/( (((%$#(!"#()#12->( 8( !"#()*&''$%+/(0( ((()#12->(4(*&''(!"#()*&''$$+!"#(</( (((%$#(!"#()#12->( 8( 45#.-$?'@0%#$A$lattner@cs.uiuc.edu © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 71 © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 72 © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 73 © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 74 © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 75 © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 76 © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 77 © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 78 © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 79 © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 80 © 2012 Ralf Framework Lämmel, Softwareand Language Engineering http://softlang.wikidot.com/course:sle [The LLVM Compiler Infrastructure, LCPC Tutorial, 2004 by Chris Lattner] 81 Various omissions Exception handling, string literals Tools, applications, benchmarks ... © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 82 LLVM The LLVM Compiler Infrastructure (Homepage) Getting Started with the LLVM System (How to install ...?) Kaleidoscope: Implementing a Language with LLVM llvm/build/examples/Kaleidoscope © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 83 Implementing a language with LLVM Sample code (e xtracted from LL VM user guide) https://github. available online com/slecourse/ : slecourse/tree/ master/sources /llv © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle m 84 Kaleidoscope: Implementing a Language with LLVM LLVM Tutorial: Table of Contents Implementing a Parser and AST Implementing Code Generation to LLVM IR Adding JIT and Optimizer Support Extending the language: control flow Extending the language: user-defined operators ml .ht dex /in : ce rial ur uto So cs/t do g/ .o r vm /ll p:/ htt Tutorial Introduction and the Lexer Extending the language: mutable variables / SSA construction Conclusion and other useful LLVM tidbits © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 85 Kaleidoscope © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 86 Recursive Fibonacci # Compute the x'th fibonacci number. def fib(x) if x < 3 then 1 else fib(x-1)+fib(x-2) # This expression will compute the 40th number. fib(40) © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 87 Iterative Fibonacci def binary : 1 (x y) y; User-defined sequencing operator def fibi(x) var a = 1, b = 1, c in (for i = 3, i < x in c=a+b: a=b: Use of mutable variables. b = c) : b; © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 88 A simple parser for Kaleidoscope © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 89 A simple parser for Kaleidoscope A simple lexer A recursive descent parser Plain old C++ objects for AST © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 90 A simple lexer if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ std::string NumStr; do { NumStr += LastChar; LastChar = getchar(); } while (isdigit(LastChar) || LastChar == '.'); NumVal = strtod(NumStr.c_str(), 0); return tok_number; } © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 91 A recursive descent parser /// primary /// ::= identifierexpr /// ::= numberexpr /// ::= parenexpr static ExprAST *ParsePrimary() { switch (CurTok) { default: return Error("unknown token when expecting an expression"); case tok_identifier: return ParseIdentifierExpr(); case tok_number: return ParseNumberExpr(); case '(': return ParseParenExpr(); } } © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 92 Plain old C++ objects for AST /// NumberExprAST - Expression class for numeric literals like "1.0". class NumberExprAST : public ExprAST { double Val; public: NumberExprAST(double val) : Val(val) {} }; /// numberexpr ::= number static ExprAST *ParseNumberExpr() { ExprAST *Result = new NumberExprAST(NumVal); getNextToken(); // consume the number return Result; } © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 93 A simple code generator for Kaleidoscope © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 94 A simple code generator for Kaleidoscope Add virtual code generation methods to AST classes Expressions return SSA registers (“Value” in LLVM) Use some house-keeping data structures Use uniqued IR, e.g., for literals Use of name hints for intermediate results Functions return, well, functions (“Function” in LLVM) Use an error method to report, for example, missing decls © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 95 Add virtual code generation methods to AST classes /// ExprAST - Base class for all expression nodes. class ExprAST { public: virtual ~ExprAST() {} virtual Value *Codegen() = 0; }; /// NumberExprAST - Expression class for numeric literals like "1.0". class NumberExprAST : public ExprAST { double Val; public: NumberExprAST(double val) : Val(val) {} virtual Value *Codegen(); }; ... © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 96 Use some house-keeping data structures Scope for functions The builder used for code generation static Module *TheModule; static IRBuilder<> Builder(getGlobalContext()); static std::map<std::string, Value*> NamedValues; Remember values for variables (parameters) © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 97 Floating point literals Arbitrary precision numbers Value *NumberExprAST::Codegen() { return ConstantFP::get(getGlobalContext(), APFloat(Val)); } Uniqued IR for numbers; no “new”! © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 98 Variable references Value *VariableExprAST::Codegen() { // Look this variable up in the function. Value *V = NamedValues[Name]; return V ? V : ErrorV("Unknown variable name"); } Mapping to be initialized by code for function declaration. © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 99 Binary expressions Value *BinaryExprAST::Codegen() { Value *L = LHS->Codegen(); Value *R = RHS->Codegen(); if (L == 0 || R == 0) return 0; Use of name hints for intermediate results switch (Op) { case '+': return Builder.CreateFAdd(L, R, "addtmp"); case '-': return Builder.CreateFSub(L, R, "subtmp"); case '*': return Builder.CreateFMul(L, R, "multmp"); case '<': L = Builder.CreateFCmpULT(L, R, "cmptmp"); // Convert bool 0/1 to double 0.0 or 1.0 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), "booltmp"); default: return ErrorV("invalid binary operator"); } } © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 100 Function calls Value *CallExprAST::Codegen() { // Look up the name in the global module table. Function *CalleeF = TheModule->getFunction(Callee); if (CalleeF == 0) return ErrorV("Unknown function referenced"); Get // If argument mismatch error. if (CalleeF->arg_size() != Args.size()) return ErrorV("Incorrect # arguments passed"); pointer to function std::vector<Value*> ArgsV; for (unsigned i = 0, e = Args.size(); i != e; ++i) { ArgsV.push_back(Args[i]->Codegen()); if (ArgsV.back() == 0) return 0; Generate code } for arguments return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); } © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 101 Function prototypes Function *PrototypeAST::Codegen() { // Make the function type: double(double,double) etc. std::vector<Type*> Doubles(Args.size(), Type::getDoubleTy(getGlobalContext())); FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), Doubles, false); Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); ... error handling omitted ... // Set names for all arguments. unsigned Idx = 0; for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); ++AI, ++Idx) { AI->setName(Args[Idx]); // Add arguments to variable symbol table. NamedValues[Args[Idx]] = AI; } return F; } © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 102 Function definitions Function *FunctionAST::Codegen() { NamedValues.clear(); Function *TheFunction = Proto->Codegen(); if (TheFunction == 0) return 0; BasicBlock *BB = BasicBlock::Create( getGlobalContext(), "entry", TheFunction); Builder.SetInsertPoint(BB); Create a new block if (Value *RetVal = Body->Codegen()) { // Finish off the function. Builder.CreateRet(RetVal); verifyFunction(*TheFunction); return TheFunction; } TheFunction->eraseFromParent(); return 0; } Generate code for body Perform verification Error in body. Remove function © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 103 Some optimizations for Kaleidoscope © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 104 Trivial constant folding Kaleidoscope def test(x) 1+2+x; Relies on basic optimization during IR construction IR define double @test(double %x) { entry: %addtmp = fadd double 3.000000e+00, %x ret double %addtmp } © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 105 Missing constant folding Kaleidoscope def test(x) (1+2+x)*(x+(1+2)); IR define double @test(double %x) { entry: %addtmp = fadd double 3.000000e+00, %x %addtmp1 = fadd double %x, 3.000000e+00 %multmp = fmul double %addtmp, %addtmp1 ret double %multmp } © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle x+3 computed twice! 106 Missing constant folding Kaleidoscope def test(x) (1+2+x)*(x+(1+2)); IR define double @test(double %x) { entry: %addtmp = fadd double 3.000000e+00, %x %multmp = fmul double %addtmp, %addtmp ret double %multmp } © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle Only computed once! 107 Register per-functions optimizations FunctionPassManager OurFPM(TheModule); // Basic setup logic. OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); // Provide basic AliasAnalysis support for GVN. OurFPM.add(createBasicAliasAnalysisPass()); // Do simple "peephole" optimizations and bit-twiddling optzns. OurFPM.add(createInstructionCombiningPass()); // Reassociate expressions. OurFPM.add(createReassociatePass()); // Eliminate Common SubExpressions. OurFPM.add(createGVNPass()); // Simplify the control flow graph (deleting unreachable blocks, etc). OurFPM.add(createCFGSimplificationPass()); OurFPM.doInitialization(); TheFPM = &OurFPM; // Inform code generation via global. MainLoop(); // Run the main "interpreter loop" now. © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 108 Revised function definitions if (Value *RetVal = Body->Codegen()) { // Finish off the function. Builder.CreateRet(RetVal); // Validate the generated code, checking for consistency. verifyFunction(*TheFunction); // Optimize the function. TheFPM->run(*TheFunction); return TheFunction; } © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 109 JIT-based execution/evaluation for Kaleidoscope © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 110 Wanted! $ ./toy ready> 4+5; ready> Read top-level expression: define double @0() { entry: ret double 9.000000e+00 } Evaluated to 9.000000 ready> © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 111 Create an IR interpreter static ExecutionEngine *TheExecutionEngine; ... int main() { .. // Create the JIT. This takes ownership of the module. TheExecutionEngine = EngineBuilder(TheModule).create(); .. } © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 112 Original main loop /// top ::= definition | external | expression | ';' static void MainLoop() { while (1) { fprintf(stderr, "ready> "); switch (CurTok) { case tok_eof: return; case ';': getNextToken(); break; // ignore top-level “;”. case tok_def: HandleDefinition(); break; case tok_extern: HandleExtern(); break; default: HandleTopLevelExpression(); break; } } } © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 113 Eval top-level expression past parsing static void HandleTopLevelExpression() { // Evaluate a top-level expression into an anonymous function. if (FunctionAST *F = ParseTopLevelExpr()) { if (Function *LF = F->Codegen()) { LF->dump(); // Dump the function for exposition purposes. // JIT the function, returning a function pointer. void *FPtr = TheExecutionEngine->getPointerToFunction(LF); // Cast it to the right type (no args, returns a double). // Thus, call it as a native function. double (*FP)() = (double (*)())(intptr_t)FPtr; fprintf(stderr, "Evaluated to %f\n", FP()); } ... continue with parsing ... © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 114 Control flow for Kaleidoscope © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 115 Wanted! IR define double @fib(double %x) { entry: %cmptmp = fcmp ult double %x, 3.000000e+00 br i1 %cmptmp, label %ifcont, label %else def fib(x) if x < 3 then 1 else fib(x-1)+fib(x-2) else: ; preds = %entry %subtmp = fadd double -1.000000e+00, %x %calltmp = call double @fib(double %subtmp) %subtmp1 = fadd double -2.000000e+00, %x %calltmp2 = call double @fib(double %subtmp1) %addtmp = fadd double %calltmp, %calltmp2 br label %ifcont Kaleidoscope ifcont: ; preds = %entry, %else %iftmp = phi double [ %addtmp, %else ], [ 1.000000e+00, %entry ] ret double %iftmp } © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 116 Extend lexer (trivial) tok_if = -6, tok_then = -7, tok_else = -8, ... if (IdentifierStr == "def") return tok_def; if (IdentifierStr == "extern") return tok_extern; if (IdentifierStr == "if") return tok_if; if (IdentifierStr == "then") return tok_then; if (IdentifierStr == "else") return tok_else; return tok_identifier; © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 117 Extend AST (trivial) /// IfExprAST - Expression class for if/then/else. class IfExprAST : public ExprAST { ExprAST *Cond, *Then, *Else; public: IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) : Cond(cond), Then(then), Else(_else) {} virtual Value *Codegen(); }; © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 118 Extend parser (trivial) /// ifexpr ::= 'if' expression 'then' expression 'else' expression static ExprAST *ParseIfExpr() { getNextToken(); // eat the if. ExprAST *Cond = ParseExpression(); if (!Cond) return 0; if (CurTok != tok_then) return Error("expected then"); getNextToken(); // eat the then ExprAST *Then = ParseExpression(); if (Then == 0) return 0; if (CurTok != tok_else) return Error("expected else"); getNextToken(); ExprAST *Else = ParseExpression(); if (!Else) return 0; return new IfExprAST(Cond, Then, Else); } © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 119 Extend parser (trivial) static ExprAST *ParsePrimary() { switch (CurTok) { default: return Error("unknown token when expecting an expression"); case tok_identifier: return ParseIdentifierExpr(); case tok_number: return ParseNumberExpr(); case '(': case tok_if: return ParseParenExpr(); return ParseIfExpr(); } } © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 120 Code generation for if-then-else 1/5 Value *IfExprAST::Codegen() { Value *CondV = Cond->Codegen(); if (CondV == 0) return 0; // Convert condition to a bool by comparing equal to 0.0. CondV = Builder.CreateFCmpONE(CondV, ConstantFP::get(getGlobalContext(), APFloat(0.0)), "ifcond"); © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 121 Code generation for if-then-else 2/5 Function *TheFunction = Builder.GetInsertBlock()->getParent(); // Create blocks for the then and else cases. // Insert the 'then' block at the end of the function. BasicBlock *ThenBB = BasicBlock::Create( getGlobalContext(), "then", TheFunction); BasicBlock *ElseBB = BasicBlock::Create( getGlobalContext(), "else"); BasicBlock *MergeBB = BasicBlock::Create( getGlobalContext(), "ifcont"); Builder.CreateCondBr(CondV, ThenBB, ElseBB); © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 122 Code generation for if-then-else 3/5 // Emit then value. Builder.SetInsertPoint(ThenBB); Value *ThenV = Then->Codegen(); if (ThenV == 0) return 0; Builder.CreateBr(MergeBB); // Codegen of 'Then' can change the current block. // Update ThenBB for the PHI. ThenBB = Builder.GetInsertBlock(); © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 123 Code generation for if-then-else 4/5 // Emit else block. TheFunction->getBasicBlockList().push_back(ElseBB); Builder.SetInsertPoint(ElseBB); Value *ElseV = Else->Codegen(); if (ElseV == 0) return 0; Builder.CreateBr(MergeBB); // Codegen of 'Else' can change the current block. // Update ElseBB for the PHI. ElseBB = Builder.GetInsertBlock(); © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 124 Code generation for if-then-else 5/5 // Emit merge block. TheFunction->getBasicBlockList().push_back(MergeBB); Builder.SetInsertPoint(MergeBB); PHINode *PN = Builder.CreatePHI( Type::getDoubleTy(getGlobalContext()), 2, "iftmp"); PN->addIncoming(ThenV, ThenBB); PN->addIncoming(ElseV, ElseBB); return PN; } © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 125 User-defined operators for Kaleidoscope © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 126 User-defined operators Treat operators like functions Use of precedence parsing Use of external functions Demo http://llvm.org/docs/tutorial/LangImpl6.html © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 127 Mutable variables for for Kaleidoscope © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 128 Wanted! Iterative Fibonacci def binary : 1 (x y) y; User-defined sequencing operator def fibi(x) var a = 1, b = 1, c in (for i = 3, i < x in c=a+b: a=b: Use of mutable variables. b = c) : b; © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 129 SSA revisited We need SSA for optimizations. LLVM IR supports SSA. Functional Kaleidoscope implied SSA trivially. Let’s add mutable variables and see how IR is retained. © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 130 Illustrative C example int G, H; int test(_Bool Condition) { int X; if (Condition) X = G; The value of X obviously else depends on the executed path. X = H; return X; } © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 131 Desired SSA @G = weak global i32 0 @H = weak global i32 0 ; type of @G is i32* ; type of @H is i32* define i32 @test(i1 %Condition) { entry: br i1 %Condition, label %cond_true, label %cond_false cond_true: %X.0 = load i32* @G br label %cond_next cond_false: %X.1 = load i32* @H br label %cond_next Extra merge step: select the right value to use based on where control flow is coming from. cond_next: %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] ret i32 %X.2 } © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 132 Desired SSA SSA required (by LLVM) for registers. Too difficult to generate in a frontend. SSA not required (by LLVM) for memory: Model mutable variables via stack. Rely on optimization mem2reg. © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 133 Mutable variables based on stack alloca @G = weak global i32 0 @H = weak global i32 0 ; type of @G is i32* ; type of @H is i32* define i32 @test(i1 %Condition) { entry: %X = alloca i32 ; type of %X is i32*. br i1 %Condition, label %cond_true, label %cond_false cond_true: %X.0 = load i32* @G store i32 %X.0, i32* %X br label %cond_next cond_false: %X.1 = load i32* @H store i32 %X.1, i32* %X br label %cond_next cond_next: %X.2 = load i32* %X ret i32 %X.2 } ; Update X 1. 2. 3. 4. Each mutable variable becomes a stack allocation. Each read of the variable becomes a load from the stack. Each update of the variable becomes a store to the stack. The variable represents for the stack address. ; Update X ; Read X © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 134 Revision static std::map<std::string, Value*> NamedValues; static std::map<std::string, AllocaInst*> NamedValues; Add allocations at the entry block of each function. Codegen for variables reference needs to load from stack. Codegen for assginments needs to store on stack. © 2012 Ralf Lämmel, Software Language Engineering http://softlang.wikidot.com/course:sle 135