Comp 511 Notes Inscribed by Anthony Castanares, Seth Fogarty, and Kristin Y. Rozier March 2, 2005 1 Jumbo We begin today’s discussion with Jumbo. Jumbo is a compiler for a twolevel version of Java. In their paper, Jumbo: Run-time Code Generation for Java and Its Applications, Kamin, Clausen, and Jarvis describe how Jumbo provides constructs for run-time code generation such as brackets and backquotes. Brackets, represented as $<>$, signal the begining of a value of type Code. The single back-quote character, ‘, is used to splice-in existing code fragments into larger ones. Jumbo has no explicit lift operator, but the semantics of lift can be achieved by using certain syntactic categories when splicing-in code. For example, the authors list various categories that reify Java values so that they may be used generated programs. These categories include String, Bool, Float, Int, Char. In Jumbo, classes can be declared inside brackets. Classes declared in this manner are converted into class files at compile time when the create method is called with this code fragment as an argument. The create method invokes another method, generate, which converts the argument of create to an intermediate form, Java Virtual Machine code (JVM code), and finally returns a class file representing this code fragment. It is important to note that code declared within brackets is not statically typed in Jumbo. Code is only checked before it is run at runtime. Although MetaOCaml performs type checking statically, it is very difficult to do so. As a side note, it would be a good experiment to develop a version of MetaOCaml that did not perform any static checking of generated code. Users would then be able to see what kind of interesting things could happen when code is 1 stitched together like strings without any static checking. Type checking at runtime is difficult. The authors of this paper indicate that runtime checking is performed on generated code, but it is difficult to image how this is done since the compiler usually discards the AST representation of a program before runtime. Finally, we as a class will pose the following question to Sam Kamin: Is type checking on generated code performed on byte code or source code? The Jumbo compiler is a compositional compiler. A compositional compiler defines the meaning of a statement in terms of its subterms. For example, upon evaluation of an if-statement, code is generated for it in a compositional manner whereby code is generated for the branches of the conditional statement but filled in at some later time with values. On to the topic of syntactic categories: when splicing-in code into an existing code fragment, the back-quote character must be followed by a special syntactic category. A syntactic category is similar to a tag that wraps an expression. During discussion, the class was trying to figure out why these categories must be explicitly specified. One suggestion was that perhaps special functions are associated with different kinds of expressions, for example, one for statements, one for identifiers, etc, and these functions can only be invoked when a particular syntactic category is encountered. But upon inspection we found that the authors gave only one explanation for the use of syntactic categories: they are there to allow for parsing of the surrounding code. An interesting note on syntactic categories: you can do things in Jumbo with categories that can not be done in MetaOCaml. Namely, you can splice in identifiers. This would make static type checking very hard. No other language we have seen in class thus far provides such an ability. One rational reason for providing this ability is because you can write an entire program that is entirely full of quotes. Finally, in contrast to MetaOCaml, a programmer can quote types in Jumbo. 1.1 Piecing Code Together In jumbo you are allowed to create case statements out of branches using MonoLists (pg 5). MonoList is a collection class interface included in the Jumbo API. At runtime, a program can throw a bunch of pieces of code together and create a case statement out of them of arbitrary length. This seems like an odd feature; perhaps its purpose is just to avoid the explicit use 2 of labels. Using MonoLists, one could quote a bunch of switch statements and piece them together. MonoLists allow a programmer to take pairs of conditions and statements and put them in a sequence such that the program will look for the first true condition and then execute it’s expression. However, they do not provide a really clean way of piecing them together in a statically-typed way. Although, if one is dealing with arbitrary conditions it would be difficult statically typing them in any language so there is no point in making an attempt! (This property is independent of the number of possible staging levels of the language.) The end of the paper was less satisfying than the others we have covered; their examples are not clearly useful experiments. This should be a lesson in writing research papers: when you write a paper make sure that you are very clear about what your contributions are, state them up front, and demonstrate them clearly in the text. 1.2 Syntax Categories Do we really need all of the syntax categories provided by Jumbo? All of them seem to be instances of lift. It seems that the last seven syntax categories in Table 2, starting from Char, are dispensable. They could be replaced with a more simple lift-equivalent, which we know how to deal with. In principle, Jumbo expression statements can be merged in the same way that they are merged in MetaOCaml without causing a problem. Essentially, what we need to do to get rid of the ‘syntax-category() construct is to join the expressions. However, that would require changing the base language, which we might not be able to do in this case. There is no example of the Name construct in the paper so its necessity is not obvious. The Type category may really be needed in the language but could probably be replaced with a construct more similar to the brackets and escapes of MetaOCaml. It is unclear what exactly the role of splicing and constructing types is in an object-oriented language. Ultimately, with a sufficiently expressive type system, what they call Types could be merged into expressions. Now we are left with Fields and Methods. These two constructs can be composed without having to know which one of these is being dealt with. Method is a declaration like you would include in a body of a class. 3 2 Tempo The difference between partial evaluation and 2-level languages is that partial evaluation is accomplished by the following steps: 1. The binding-time analysis process takes as inputs a program and binding time annotations. It produces a 2-level program in a language like ‘C or Cyclone or MetaOCaml which is an annotated version of the basic program that it took as input. 2. The specialization process takes as input the 2-level program and the static inputs. It outputs the specialized program. These two major blocks are the major steps of partial evaluation. The formal link between hand-staging and what partial evaluators do in binding time analysis is a little bit unclear because of the hand-waving over CPS and other functions that partial evaluation programmers add during the bindingtime analysis phase. 2.1 Partial Evaluation In Practice We use MSP because it gives us better control. The principle with partial evaluation is that the programmer deals with the specialized program as a black box. In practice, a programmer has to tweak a source program many many times to arrive at just the right specialized program so ultimately, the programmer has to know exactly what’s going on anyway. The authors don’t mention the amount of work involved in partial evaluation in practice. MetaOCaml allows a programmer to do his or her own binding-time analysis for a program. If there are any binding-time improvements that make it easier to annotate to yield the right result, such as CPS, monads, etc. the programmer can make those changes whereas a partial evaluator will not do both binding-time analysis and binding-time code improvements. It is also not clear that it will do the right binding-time improvements automatically. A programmer has to know what the resulting code should look like, formalize this vision, write a 2-level program, and then throw away the annotations in order to give the program to Tempo to put them back in. This seems backward. Especially when abstract interpretation is involved, it is very hard to image automatic optimizers being adept at binding-time improvement analysis. The way a program is written is so closely tied to how 4 it is staged and annotated that the programmer might as well do that part too. It would be really nice if MetaOCaml did automatic annotations as an initial offer to the programmer. This feature should be added to the language. 2.2 Polyvariant Specialization Polyvariant specialization is when you want to binding-time annotate your program with different assumptions about what is late and what is early. This is not a pressing problem because the nature of the problem usually dictates a typical scenario of what inputs will be late or early. For example, typically we only stage interpreters in one way. When it is necessary to deal with polyvaliant specialization, abstract interpretation helps (as in the familiar FFT example). The overhead of polyvariant specialization is not too bad and this feature is definitely advantageous when the program contains a recursive function or one that needs a fixed-point computation. Then it would be helpful to have two interpretations: one with the recursion inside brackets and one with the brackets on the outside. This is because the nature of the recursion depends on whether a certain input is received early or late. Here termination analysis becomes necessary, which is the Achilles heal of binding-time analysis. How well you do binding-time analysis is a variation of how well you do termination analysis. 3 Return to Tempo One point of confusion was in the use of the terminology “Compile time specalization” verus “run time specialization”. Whey they say “compile time specialization”, they mean specialization in which you generate the source code and then compile the completed source code. The implication here is that the first and second compile times are very close together, and that the specialization depends on a small set of completely static data. The second compile time occurs “at compile time”, not “at run time”. What they call “runtime specialization: specialization” is where you generate machien code at compile time and then assemble it in a certain fashion at run time. The problem with this is that we can generate source code at 5 run time and compile it, and this falls under neither category. You should leave the decision of what to generate at run time up to the programmer 4 Evolution We have ’C fiddling with the compiler to do run time code generation. Then Tempo comes and say “We can grab gcc and manipulate the binaries to be able to compile the stuff into binary template, and stitch them together at run time.” But Tempo has no garunatee of correctness, it’s a hack around the compilers. Cyclone is an attempt to formalize this trick of compiling early on incomplete programs. The main issue in Tempo is keeping optomizations from operating across template barriers. Cyclone fixes this, but then attempts to bring back safe optimizations across templates. The problem here is compiling early on incomplete programs: it is not obvious that you can do any optimizations when you don’t know how the templates will be assembled. Cyclone seeks to justify these optimizations as correct by justifying flow analysis on incomplete programs. Jumbo takes a slightly different path. They write own compiler from scratch, like ’C, but the compiler work son the full language and is compositional. This is their main point: “If you want to be able to compile programs with holes in them, make your compiler compositional.” The reason Tempo fails in some cases is that gcc is not a compositional compiler, it generates code based on ad-hoc combinations of nodes. One note on Jumbo: t is somewhere between run-time code generation and source-code generation because jvm is so close to Java. And a brief note on writing papers: examples and diagrams are very useful. Remember this when it comes time to write your own. 5 Homework Download Tempo and stage the power function. 6