Polyglot An Extensible Compiler Framework for Java Nathaniel Nystrom Michael R. Clarkson Andrew C. Myers Cornell University Language extension • Language designers often create extensions to existing languages • e.g., C++, PolyJ, GJ, Pizza, AspectJ, Jif, ArchJava, ESCJava, Polyphonic C#, ... • Want to reuse existing compiler infrastructure as much as possible • Polyglot is a framework for writing compiler extensions for Java 2 Requirements • Language extension • Modify both syntax and semantics of the base language • Not necessarily backward compatible • Goals: • Easy to build and maintain extensions • Extensibility should be scalable • No code duplication • Compilers for language extensions should be open to further extension 3 Rejected approaches • In-place modification base compiler 1.0 copy & modify bug fixes & upgrades base compiler 2.0 extension compiler 1.0 bug fixes & upgrades (again) copy & modify (again) extension compiler 2.0 • Macro languages • Limited to syntax extensions • Semantic checks after macro expansion 4 Polyglot • Base compiler is a complete Java front end • 25K lines of Java • Name resolution, inner class support, type checking, exception checking, uninitialized variable analysis, unreachable code analysis, ... • Can reuse and extend through inheritance 5 Scalable extensibility Changes to the compiler should be proportional to changes in the language. • Most compiler passes are sparse: AST Nodes + Passes if x e.f = name resolution type checking exception checking constant folding 6 Non-scalable approaches Easy to add or modify Using Passes AST nodes Visitors pass as AST node method (“naive OO”) Polyglot 7 Polyglot architecture Base Polyglot compiler Java source Java parser AST rewriting passes Code generator Java target Ext source Ext parser AST rewriting passes Code generator Java target Ext2 Ext2 parser AST rewriting passes Code generator Java target source 8 Architecture details • Parser written using PPG • Adds grammar inheritance to Java CUP • AST nodes constructed using a node factory • Decouples node types from implementation • AST rewriting passes: • Each pass lazily creates a new AST • From naive OO: traverse AST invoking a method at each node • From visitors: AST traversal factored out 9 Example: PAO • Primitive types as subclasses of Object • Changes type system, relaxes Java syntax • Implementation: insert boxing and unboxing code where needed HashMap m; m.put(“two”, 2); int v = (int) m.get(“two”); HashMap m; m.put(“two”, new Integer(2)); int v = ((Integer) m.get(“two”)).intValue(); 10 PAO implementation • Modify parser and type-checking pass to permit e instanceof int • Parser changes with PPG: include “java.cup” drop { rel_expr ::= rel_expr INSTANCEOF ref_type } extend rel_expr ::= rel_expr:a INSTANCEOF type:b {: RESULT = node_factory.Instanceof(a, b); :} • Add one new pass to insert boxing and unboxing code 11 Implementing a new pass • Want to extend Node interface with rewrite() method • Default implementation: identity translation • Specialized implementations: boxing and unboxing • Mixin extensibility: extensions to a base class should be inherited by subclasses Node typeCheck() codeGen() If cond then else typeCheck() codeGen() typeCheck() codeGen() rewrite() Add lhs rhs typeCheck() codeGen() cond then else typeCheck() codeGen() rewrite() lhs rhs typeCheck() codeGen() rewrite() 12 Inheritance is inadequate Node typeCheck() codeGen() If Add cond then else typeCheck() codeGen() lhs rhs typeCheck() codeGen() PaoIf cond then else typeCheck() codeGen() rewrite() PaoNode typeCheck() codeGen() rewrite() PaoAdd lhs rhs typeCheck() codeGen() rewrite() 13 Inheritance is inadequate Node typeCheck() codeGen() typeCheck() codeGen() PaoNode typeCheck() codeGen() rewrite() typeCheck() codeGen() typeCheck() codeGen() rewrite() typeCheck() typeCheck() codeGen() typeCheck() codeGen() typeCheck() codeGen() typeCheck() codeGen() codeGen() typeCheck() typeCheck() codeGen() typeCheck() codeGen() rewrite() typeCheck() codeGen() rewrite() typeCheck() codeGen() rewrite() codeGen() rewrite() rewrite() typeCheck() codeGen() typeCheck() codeGen() rewrite() typeCheck() typeCheck() codeGen() typeCheck() codeGen() typeCheck() codeGen() typeCheck() codeGen() codeGen() typeCheck() typeCheck() codeGen() typeCheck() codeGen() rewrite() typeCheck() codeGen() rewrite() typeCheck() codeGen() rewrite() codeGen() rewrite() rewrite() typeCheck() typeCheck() codeGen() typeCheck() codeGen() typeCheck() codeGen() typeCheck() codeGen() codeGen() typeCheck() typeCheck() codeGen() typeCheck() codeGen() rewrite() typeCheck() codeGen() rewrite() typeCheck() codeGen() rewrite() codeGen() rewrite() rewrite() typeCheck() codeGen() rewrite() typeCheck() typeCheck() codeGen() typeCheck() codeGen() typeCheck() codeGen() typeCheck() codeGen() codeGen() typeCheck() typeCheck() codeGen() typeCheck() codeGen() rewrite() typeCheck() codeGen() rewrite() typeCheck() codeGen() rewrite() codeGen() rewrite() rewrite() typeCheck() typeCheck() codeGen() typeCheck() codeGen() typeCheck() codeGen() typeCheck() codeGen() codeGen() typeCheck() typeCheck() codeGen() typeCheck() codeGen() rewrite() typeCheck() codeGen() rewrite() typeCheck() codeGen() rewrite() codeGen() rewrite() rewrite() 14 Extension objects Use composition to mixin methods and fields into AST node classes Node ext typeCheck() codeGen() If ext cond then else typeCheck() codeGen() PaoExt ext rewrite() null Add ext lhs rhs typeCheck() codeGen() PAO extension objects; installed into all nodes by node factory 15 Extension objects Extension objects have their own ext field to leave extension open Node ext typeCheck() codeGen() If ext cond then else typeCheck() codeGen() PaoExt ext rewrite() Add ext lhs rhs typeCheck() codeGen() ext typeCheck() ext_type_info null 16 Method invocation • A method may be implemented in the node or in any one of several extension objects. • Extension should call node.ext.ext.typeCheck() • Base compiler should call: node.typeCheck() • Cannot hardcode the calls Node ext typeCheck() codeGen() PaoExt ext rewrite() ext typeCheck() ext_type_info null 17 Delegate objects • Each node & extension object has a del field • Delegate object implements same interface as node or ext • Directs call to appropriate method implementation • Ex: node.del.typeCheck() • Ex: node.ext.del.rewrite() • Run-time overhead < 2% JavaDel typeCheck() { node.ext.ext.typeCheck() } codeGen() { node.codeGen() } Node del ext typeCheck() codeGen() PaoExt del ext rewrite() del ext typeCheck() ext_type_info null 18 Scalable extensibility • To add a new pass: • Use an extension object to mixin default implementation of the pass for the Node base class • Use extension objects to mixin specialized implementations as needed • To change the implementation of an existing pass • Use delegate object to redirect to method providing new implementation • To create an AST node type: • Create a new subclass of Node • Or, mixin new fields to existing node using an extension object 19 Polyglot family tree Polyglot base (Java) PAO parameterized types Coffer PolyJ JMatch covariant return Jif Jif/split 20 Results • Can build small extensions in hours or days • 10% of base code is interfaces and factories Extension # Tokens % of Base Polyglot base (Java) 166K 100 Jif 129K 78 JMatch 108K 65 Jif/split 99K 60 PolyJ 79K 48 Coffer 24K 14 PAO 6.1K 3.6 parameterized types 3.2K 2 covariant return 1.6K 1 javac 1.1 132K 80 21 Related work • Other extensible compilers • e.g., CoSy, SUIF • e.g., JastAdd, JaCo • Macros • e.g., EPP, Java Syntax Extender, Jakarta • e.g., Maya • Visitors • e.g., staggered visitors, extensible visitors 22 Conclusions • Several Java extensions have been implemented with Polyglot • Programmer effort scales well with size of difference with Java • Extension objects and delegate objects provide scalable extensibility • Download from: http://www.cs.cornell.edu/projects/polyglot 23 Acknowledgments Brandon Bray JMatch Michael Brukman PPG Steve Chong Jif, Jif/split, covariant return Matt Harren JMatch Aleksey Kliger JLtools, PolyJ Jed Liu JMatch Naveen Sastry JLtools Dan Spoonhower JLtools Steve Zdancewic Jif, Jif/split Lantian Zheng Jif, Jif/split http://www.cs.cornell.edu/projects/polyglot 24 Questions? Mixin extensibility Inheritance does not provide mixin extensibility: when a base class is extended, subclasses should inherit the changes Node typeCheck() codeGen() If cond then else typeCheck() codeGen() typeCheck() codeGen() rewrite() Add lhs rhs typeCheck() codeGen() cond then else typeCheck() codeGen() rewrite() lhs rhs typeCheck() codeGen() rewrite() 26 Other Polyglot features • Quasi-quoting library • Useful for translation from extension language AST to base language or intermediate language AST qqStmt(“if (%e.relabelsTo(%e)) %s; else %s;”, new Object[] { L, Li, then_body, else_body }); • Automatic separate compilation • Serialize type information and store in the AST • Encoded into the class file via javac • Extracted from class file using reflection • Data-flow analysis framework 27 PAO rewriting • rewrite(ts) called for each AST node: class PaoExt extends Ext { Node rewrite(PaoTypeSystem ts) { return node(); } } class PaoInstanceofExt extends PaoExt { Node rewrite(PaoTypeSystem ts) { Instanceof e = (Instanceof) node(); Type rtype = e.compareType(); // e.g., “e instanceof int” “e instanceof Integer” if (rtype.isPrimitive()) return e.compareType(ts.boxedType(rtype)); else return n; } } 28 Node factories • Each extension has a node factory (nf) • To create a node of type T, call method nf.T() • T() may return an extension-specific subclass of T • T() attaches the extension and delegate objects to T via a call to extT() • Mixin extensibility: if T is a subclass of S, then extT() will call extS() 29 Results • Can build small extensions in hours, days • 10% of base compiler code is interfaces and factories Extension javac 1.1 (excl. bytecode asm) Polyglot base Jif JMatch Lexe Parse Total r r % of Base 119 K 2.7K 11K 166 K 2.7K 5.4K 129 K 2.7K 14K 108 K 72 100 78 65 30 Why not output bytecode? • Wanted to be able to read the output • The symmetry is satisfying • Limitations of Java as a target language • Scoping rules sometimes make it difficult to output Java code, especially with inner classes • Lack of goto can make generated control flow inefficient 31 Name resolution Semantic checking Translation 32