CS 320: Compiling Techniques David Walker People David Walker (Professor) 412 Computer Science Building dpw@cs.princeton.edu office hours: after each class Limin Jia, Jay Ligatti (TAs) 418a Computer Science Building ljia,jligatti@cs.princeton.edu office hours: Mondays & Wednesdays (we’ll send email to the email list) Information Web site: www.cs.princeton.edu/courses/archive/spri ng05/cos320/index.htm Mailing list: To subscribe: cos320-request@lists.cs.princeton.edu To post to this list, send your email to: cos320@lists.cs.princeton.edu Books Modern Compiler Implementation in ML, Andrew Appel A reference manual for SML best choice: Online references see course web site several hardcopy books Elements of ML Programming, Jeffrey D. Ullman Work Assignments: In class Midterm: build your own compiler approximately a module/week 40% late penalty: 20%/day. Don’t be late! ask questions of me, TAs, friends on course mailing list turn in your own work 25% Final during exam period: 35% Assignment 0 Write your name and other information on the sheet circulating Find, skim and bookmark the course web pages Subscribe to course e-mail list Begin assignment 1 Figure out how to install, run & use SML Due next Thursday February 16 If you’ve never used a functional language like ML, this might be a difficult assignment. Start early! onward! What is a compiler? A compiler is program that translates a source language into an equivalent target language What is a compiler? while (i > 3) { a[i] = b[i]; i ++ } C program compiler does this mov eax, ebx add eax, 1 cmp eax, 3 jcc eax, edx assembly program What is a compiler? class foo { int bar; ... } Java program compiler does this struct foo { int bar; ... } C program What is a compiler? class foo { int bar; ... } Java program compiler does this ........ ......... ........ Java virtual machine program What is a compiler? \newcommand{ .... } Latex program compiler does this \sfd\sf\fadg Tex program What is a compiler? \newcommand{ .... } Tex program compiler does this \sfd\sf\fadg Postscript program What is a compiler? Other places: Web scripts are compiled into HTML assembly language is compiled into machine language hardware description language is compiled into a hardware circuit ... Compilers are complex front-end text file to abstract syntax middle-end abstract syntax to intermediate form (IR) back-end lexing; parsing type checking; analysis; optimizations; IR to machine code code generation; data layout; register allocation; more optimization Course project front-end middle-end simple imperative language Only 1 IR (the initial abstract syntax generated by the parser) back-end Fun Source Language type checking; high-level optimizations Code Generation instruction selection algorithms; register allocation via graph coloring Standard ML Standard ML is a domain-specific language for building compilers Support for Complex data structures (abstract syntax, compiler intermediate forms) Memory management like Java Large projects with many modules Advanced type system for error detection Introduction to ML You will be responsible for learning ML on your own. Today I will cover some basics Resources: Robert Harper’s Online book “an introduction to ML” is a good place to start See course webpage for pointers and info about how to get the software Preliminaries start sml in Unix by typing sml at a prompt: tux% sml Standard ML of New Jersey, Version 110.0.7, September 28, 2000 [CM; autoload enabled] - (* quit SML by pressing ctrl-D; ctrl-Z some times... *) (* just so you know, comments can be (* nested *) *) Preliminaries Read – Eval – Print – Loop - 3 + 2; Preliminaries Read – Eval – Print – Loop - 3 + 2; > 5: int Preliminaries Read – Eval – Print – Loop - 3 + 2; > 5: int - it + 7; > 12 : int Preliminaries Read – Eval – Print – Loop - 3 + 2; > 5: int - it + 7; > 12 : int - it – 3; > 9 : int - 4 + true; stdIn:17.1-17.9 Error: operator and operand don't agree [literal] operator domain: int * int operand: int * bool in expression: 4 + true Preliminaries Read – Eval – Print – Loop - 3 div 0; Failure : Div run-time error Basic Values - (); > () : unit => like “void” in C (sort of) => the uninteresting value/type - true; > true : bool - false; > false : bool - if it then 3+2 else 7; > 7 : int - false andalso loop_Forever; > false : bool “else” clause is always necessary and also, or else short-circuit eval Basic Values Integers - 3 + 2; > 5 : int - 3 + (if not true then 5 else 7); > 10 : int Strings - “Dave” ^ “ “ ^ “Walker”; > “Dave Walker” : string - print “foo\n”; foo > 3 : int Reals - 3.14; > 3.14 : real No division between expressions and statements Using SML/NJ Interactive mode is a good way to start learning and to debug programs, but… Type in a series of declarations into a “.sml” file - use “foo.sml” [opening foo.sml] list of declarations … with their types Larger Projects SML has its own built in interactive “make” Pros: It automatically does the dependency analysis for you No crazy makefile syntax to learn Cons: May be more difficult to interact with other languages or tools Compilation Manager sources.cm Group is a.sig b.sml c.sml a.sig b.sml c.sml % sml - OS.FileSys.chDir “~/courses/510/a2”; - CM.make(); looks for “sources.cm”, analyzes dependencies [compiling…] compiles files in group [wrote…] saves binaries in ./CM/ - CM.make’ “myproj/”(); specify directory What is next? ML has a rich set of structured values Tuples: (17, true, “stuff”) Records: {name = “Dave”, ssn = 332177} Lists: 3::4::5::nil or [3,4]@[5] Datatypes Functions And more! Rather than list all the details, we will write a couple of programs An interpreter Interpreters are usually implemented as a series of transformers: lexing/ parsing stream of characters (concrete syntax) evaluate abstract syntax print abstract value stream of characters A little language (LL) An arithmetic expression e is a boolean value an if statement (if e1 then e2 else e3) an integer an add operation a test for zero (isZero e) LL abstract syntax in ML datatype term = Bool of bool | If of term * term * term | Num of int | Add of term * term | IsZero of term vertical bar separates alternatives LL abstract syntax in ML datatype term = Bool of bool | If of term * term * term | Num of int | Add of term * term | IsZero of term vertical bar separates alternatives This one declaration creates: • a new type (called term) • a new set of functions for creating terms (Bool, If, Num, Add, IsZero) • a new set of patterns you can use case statements (like C’s “switch”) that check what sort of term object you have LL abstract syntax in ML datatype term = Bool of bool | If of term * term * term | Num of int | Add of term * term | IsZero of term vertical bar separates alternatives -- by convention, constructors are capitalized -- constructors can take a single argument of a particular type type of a tuple, in this case a triple of 3 term objects LL abstract syntax in ML Add In your program, writing: Num Add (Num 2, Num 3) makes an object tagged with Add containing 2 sub-objects tagged with Num represents the expression “2 + 3” 2 Num 3 LL abstract syntax in ML If If (Bool true, Num 0, Add (Num 2, Num 3)) represents Bool Num Add true Num Num 0 “if true then 0 else 2 + 3” 2 3 Function declarations fun isValue (t:term) : bool = case t of Num n => true | Bool b => true | _ => false Function declarations function name patterns in pink function parameter t with type term fun isValue (t:term) : bool = case t of Num n => true | Bool b => true | _ => false default pattern matches anything function result type is bool Function declarations fun isValue t = case t of Num n => true | Bool b => true | _ => false ML type inference can infer the types of parameters and results A type error fun isValue t = case t of Num n => n | _ => false ex.sml:22.3-24.15 Error: types of rules don't agree [literal] earlier rule(s): term -> int this rule: term -> bool in rule: _ => false A type error Sometimes, ML will give you several errors in a row: ex.sml:22.3-25.15 Error: types of rules don't agree [literal] earlier rule(s): term -> int this rule: term -> bool in rule: _ => true ex.sml:22.3-25.15 Error: types of rules don't agree [literal] earlier rule(s): term -> int this rule: term -> bool in rule: _ => false A very subtle error fun isValue t = case t of num => true | _ => false The code above type checks. But when we test it refined the function always returns “true.” What has gone wrong? A very subtle error fun isValue t = case t of num => true | _ => false The code above type checks. But when we test it refined the function always returns “true.” What has gone wrong? -- num is not capitalized (and has no argument) -- ML treats it like a variable pattern (matches anything!) Exceptions exception Error of string fun debug s : unit = raise (Error s) Exceptions exception Error of string fun debug s : unit = raise (Error s) in SML interpreter: - debug "hello"; uncaught exception Error raised at: ex.sml:15.28-15.35 Evaluator fun isValue t = ... exception NoRule fun eval t = case t of Bool _ | Num _ => t | ... Evaluator ... let statement fun eval t = for remembering case t of temporary Bool _ | Num _ => t results | If(t1,t2,t3) => let val v = eval t1 in case v of Bool b => if b then (eval t2) else (eval t3) | _ => raise NoRule end Evaluator exception NoRule fun eval1 t = case t of Bool _ | Num _ => ... | ... | Add (t1,t2) => case (eval v1, eval v2) of (Num n1, Num n2) => Num (n1 + n2) | (_,_) => raise NoRule Finishing the Evaluator fun eval1 t = case t of ... | ... | Add (t1,t2) => ... | IsZero t => ... be sure your case is exhaustive Finishing the Evaluator fun eval1 t = case t of ... | ... | Add (t1,t2) => ... What if we forgot a case? Finishing the Evaluator fun eval1 t = case t of ... | ... | Add (t1,t2) => ... What if we forgot a case? ex.sml:25.2-35.12 Warning: match nonexhaustive (Bool _ | Zero) => ... If (t1,t2,t3) => ... Add (t1,t2) => ... Summary All ML expressions produce values that have a particular type ML data types are super-cool ML doesn’t have “statements” ML can do type inference (and give you hard-to-decrypt error messages a new type name (term) new constructors (Bool, If, Num, ...) new patterns (Bool b, If (x,y,_), Num _, ...) ML has a “top-level loop” to execute commands and a compilation manager type CM.Make() to load and compile a project edit sources.cm to add new files Last Things Learning to program in SML can be tricky at first But once you get used to it, you will never want to go back to imperative languages Check out the reference materials listed on the course homepage