Language definition by interpreter Lecture 14 Prof. Fateman CS 164 Lecture 14 1 Routes to defining a language • Formal mathematics – – – – Context free grammar for syntax Mathematical semantics (axioms, theorems, proofs) Rarely used alone (the downfall of Algol 68) Can be used to verify/prove properties (with difficulty) • Informal textual – CFG + natural language (Algol 60, Java Language Spec) – Just natural language (Visual Basic for Dummies) -- examples – Almost universally used • Operational – Here’s a program that does the job – Metacircular evaluator for Scheme, Lisp – Evaluator/ interpreter for MiniJava Prof. Fateman CS 164 Lecture 14 2 Typical compiler structure input Source program Machine output Lex, parse AST Object code Typecheck, cleanup Intermediate form Assembly lang Prof. Fateman CS 164 Lecture 14 3 “MiniJava run” structure Source program Lex, parse AST input Except that MJ has no input… Typecheck, cleanup Intermediate form output interpreter What language is interpreter written in? What machine does it run on? Prof. Fateman CS 164 Lecture 14 4 Interpreter structure: advantages interpreter •Interpreter is written in a higher level language: source language derives semantics from interpreter program and the semantics of the language of the interpreter (e.g. whatever it is that “+” does in Lisp). •What does EACH STATEMENT MEAN? •Exhaustive case analysis •What are the boundaries of legal semantics? •What exactly is the model of scope, etc.. Prof. Fateman CS 164 Lecture 14 5 Interpreter structure: more advantages interpreter •Prototyping / debugging easier (compare to machine level) •Portable intermediate form; here AST in Lisp as text; could be byte code • intermediate form may be compact •Security may be more easily enforced by restricting the interpreter or machine model [e.g. if Lisp were “safe”, so would the interpreter be safe.] •In modern scripting applications, most time is spent in library subroutines so speed is not usually an issue. Prof. Fateman CS 164 Lecture 14 6 Interpreter structure: disadvantages interpreter •Typically unable to reach full machine speed. Repeatedly checking stuff •Difficult to transcend the limits of the underlying language implementation (not full access to machine: if interpreter is in Lisp, then “interrupts” are viewed through Lisp’s eyes. If interpreter is in Java, then Java VM presents restrictions.) •Code depends on presence of infrastructure (all of Lisp??) so even a tiny program starts out “big”. [Historically, was more of an issue] •(if meta-circular) bootstrapping… (digression on first PL) Prof. Fateman CS 164 Lecture 14 7 An interpreter compromise (e.g. Java VM) interpreter •“Compile” to a hypothetical byte-code stack machine appropriate for Java and maybe some other languages, easily simulated on any real machine. •Implement this virtual byte-code stack machine on every machine of interest. •When speed is an issue, try Just In Time compiling; convert sections of code to machine language for a specific machine. Or translate Java to C or other target. Prof. Fateman CS 164 Lecture 14 8 IMPORTANT OBSERVATION • Much of the work that you do interpreting has a corresponding kind of activity that you do in typechecking or compiling. • This is why I prefer teaching CS164 by first showing a detailed interpreter for the target language. Prof. Fateman CS 164 Lecture 14 9 Interpreter to TypeChecker / Static Analysis is a small step • Modest modification of an interpreter program can result in a new program which is a typechecker. An interpreter has to figure out VALUES and COMPUTE with them. A typechecker has to figure out TYPES and check their validity. •For example: •Interpreter: To evaluate a sequence {s1, s2}, evaluate s1 then evaluate s2, returning the last of these. •Typechecker: To TC a sequence {s1, s2}, TC s1 then TC s2, returning the type for s2. •Interpreter: To evaluate a sum (+ A B) evaluate A, evaluate B and add. •Typechecker: To TC a sum, (+ A B) TC A to an int, TC B to an int, Then, return the type, i.e. Int. •Program structure is a walk over the AST. Prof. Fateman CS 164 Lecture 14 10 How large is MJ typechecker / semantics ? • Environment setup code for MJ is about 318 lines of code. • Simple interpreter, including all environment setup, is additional 290 lines of code, including comments. • Add to this file the code needed for type checking and you end up with an extra 326 lines of code. • Environment setup would be smaller if we didn’t anticipate type checking. Prof. Fateman CS 164 Lecture 14 11 Interpreter to Compiler is a small step • Modest modification of an interpreter program can result in a new program which is a compiler. •For example: •Interpreter: To evaluate a sequence {s1, s2}, evaluate s1 then evaluate s2, returning the last of these. •Compiler: To compile a sequence {s1, s2}, compile s1 then compile s2, returning the concatenation of the code for s1 and the code for s2. •Interpreter: To evaluate a sum (+ A B) evaluate A, evaluate B and add. •Compiler: To compile a sum, (+ A B) compile A, compile B, concatenate code-sequences. Then, compile + to “add the results of the two previous sections of code”. And concatenate that. •Program structure is a walk over the intermediate code AST. Prof. Fateman CS 164 Lecture 14 12 AST for MJ program • AST is a data structure presenting the Program and all subparts in a big tree reflecting ALL parsed constructs of the system. • Here’s fact.java’s AST with some parts abbreviated with #. (Program (MainClass (id Factorial 1) (id a 2) (Print (Call (NewObject #) (id ComputeFac 3) (ExpList #)))) (ClassDecls (ClassDecl (id Fac 7) (extends nil) (VarDecls) (MethodDecls (MethodDecl IntType # # # # #))))) Prof. Fateman CS 164 Lecture 14 13 Start of the interpreter (defun mj-run (ast) "Interpret a MiniJava program from the AST" (mj-statement (fourth (second ast)) ;; the body (setup-mj-env ast) ;; set up env. )) (Program (MainClass (id Factorial 1) (id a 2) (Print (Call (NewObject #) (id ComputeFac 3) (ExpList #)))) (ClassDecls (ClassDecl (id Fac 7) (extends nil) (VarDecls) (MethodDecls (MethodDecl IntType # # # # #))))); define ComputeFac Prof. Fateman CS 164 Lecture 14 14 MJ statements (defun mj-statement (ast env) "Execute a MiniJava statement" ;; we do a few of these in-line, defer others to subroutines. They could all be subroutines.. (cond ((eq (car ast) 'If) (if (eq (mj-exp-eval (cadr ast)env) 'true) (mj-statement (caddr ast) env) ;;then (mj-statement (cadddr ast) env))) ;;else ((eq (car ast) 'While) (mj-while ast env)) ;; um, do this on another slide ((eq (car ast) 'Print) (format t "~A~%" (mj-exp-eval (cadr ast)))) ((eq (car ast) 'Assign) (mj-set-value (id-name (second ast)) ;; look at this later (mj-exp-eval (caddr ast) env))) ((eq (car ast) 'ArrayAssign) ;; etc ((eq (car ast) 'Block) (dolist (s (cdadr ast)) (mj-statement s env))) ;;;; add other statement types in here, if we extend MJ (t (pprint ast) (error "Unexpected statement"))))) Prof. Fateman CS 164 Lecture 14 15 What else is needed? • All the functions (mj-while etc etc) [simple] • All the supporting data structures (symbol table entries and their organization) [tricky] Prof. Fateman CS 164 Lecture 14 16 if COMPARE…. ((eq (car ast) 'If) (if (eq (mj-exp-eval (cadr ast)env) 'true) (mj-statement (caddr ast) env) ;;then (mj-statement (cadddr ast) env))) ;;else (mj-if (mj-exp-eval (cadr ast)env) (mj-statement (caddr ast) env) ;;then (mj-statement (cadddr ast) env))) ;;else (defun mj-if(a b c)(if a b c)) 2 problems: in Lisp, (if X Y Z) evaluates X. If X is non-nil, returns value of Y. MJ’s false is non-nil. Also, in the call to mj-if, we have evaluated both branches. We lose. Prof. Fateman CS 164 Lecture 14 17 Loops: While is very much like Lisp’s while ((eq (car ast) 'While) (while (eq (mj-exp-eval (cadr ast) env) 'true) (mj-statement (caddr ast) env))) Prof. Fateman CS 164 Lecture 14 18 Also part of the main interpreter: exp-eval (defun mj-exp-eval (ast env) "Evaluate a Mini-Java expression subtree" (labels ((c (v) (eq (car ast) v)) ;; some shorthands (DSB idea!) (e1 () (mj-exp-eval (second ast) env)) (e2 () (mj-exp-eval (third ast) env))) (cond ((eq ast 'this) (mj-this env)) ((atom ast) (error "Unexpected atomic expression")) ((c 'Not) (mj-not (e1))) ((c 'And) (mj-and (second ast) (third ast) env)) ((c 'Plus) (mj-+ (e1) (e2))); also Times, Minus, LessThan ((c 'IntegerLiteral) (second ast)) ;also BooleanLiteral ((c 'ArrayLookup) (elt (e1) (e2))) ((c 'ArrayLength) (length (e1))) ((c 'NewArray) (make-array `(,(e1)) :initial-element 0)) ((c 'NewObject) (mj-new-object (id-name (second (second ast))) env)) ((c 'Call) (mj-call ast env)) ;; REALLY IMPORTANT ((c 'IdentifierExp) (mj-get-value (id-name (second ast)) env))))) Prof. Fateman CS 164 Lecture 14 19 Revisit the statement interpreter • • • • (labels ((c (v) (eq (car ast) v)) (e (i) (mj-exp-eval (nth i ast) env)) (s (i) (mj-statement (nth i ast) env))) cond ((c 'Assign) (mj-set-value (id-name (second ast)) (e 2) env)) ((c 'ArrayAssign) (setf (elt (mj-get-value (id-name (second ast)) env) (e 2)) (e 3))) Prof. Fateman CS 164 Lecture 14 20 (looking at code for simple-interp) • [no more slides of this] Prof. Fateman CS 164 Lecture 14 21