Maya: Multiple-Dispatch Syntax Extension in Java University of Utah ABSTRACT

advertisement
Maya: Multiple-Dispatch Syntax Extension in Java
Jason Baker and Wilson C. Hsieh
University of Utah
ABSTRACT
Extension Library
Extension Source
We have designed and implemented Maya, a version of Java that allows programmers to extend and reinterpret its syntax. Maya generalizes macro systems by treating grammar productions as generic
functions, and semantic actions on productions as multimethods
on the corresponding generic functions. Programmers can write
new generic functions (i.e., grammar productions) and new multimethods (i.e., semantic actions), through which they can extend the
grammar of the language and change the semantics of its syntactic
constructs, respectively. Maya’s multimethods are compile-time
metaprograms that transform abstract syntax: they execute at program compile-time, because they are semantic actions executed by
the parser. Maya’s multimethods can be dispatched on the syntactic structure of the input, as well as the static, source-level types of
expressions in the input.
In this paper we describe what Maya can do and how it works.
We describe how its novel parsing techniques work and how Maya
can statically detect certain kinds of errors, such as code that generates references to free variables. Finally, to demonstrate Maya’s
expressiveness, we describe how Maya can be used to implement
the MultiJava language, which was described by Clifton et al. at
OOPSLA 2000.
mayac
Compiled Extensions
Application Source
mayac
Compiled Application
Figure 1: Compiling language extensions and extended programs with the Maya compiler, mayac
its host language with database query syntax. Syntax extension
can also be used to add language features when they are found to
be necessary. For example, design patterns [19] can be viewed as
work-arounds for specialized features missing from general-purpose
languages: the visitor pattern implements multiple dispatch in a
single-dispatch language. Some language designers have chosen to
specialize their languages to support certain patterns: the C# [25]
language includes explicit support for the state/observer pattern and
delegation. However, unless we are willing to wait for a new language each time a new design pattern is identified, such an approach is unsatisfactory. Instead, a language should admit programmer-defined syntax extensions.
Macro systems [13, 27, 30] support a limited form of syntax extension. In most systems, a macro call consists of the macro name
followed by zero or more arguments. Such systems do not allow
macros to define infix operators. In addition, macros cannot change
the meaning of the base language’s syntax. Even in Scheme, which
has an extremely powerful macro system, a macro cannot redefine
the procedure application syntax.
Other kinds of systems allow more sophisticated forms of syntax
rewriting than simple macro systems. These systems range from
aspect-oriented languages [31] to compile-time metaobject protocols (MOPs) [10, 29]. Compile-time MOPs allow a metaclass to
rewrite syntax in the base language. However, these systems typically have limited facilities for defining new syntax.
This paper describes an extensible version of Java called Maya,
which supports both the extension of its syntax and the extension
of its base semantics. Figure 1 shows how Maya is used. Extensions, which are called Mayans, are written as code that generates
abstract syntax trees (ASTs). After Mayans have been compiled,
they can be loaded into the Maya compiler and used while compiling applications. Mayans are dispatched from the parser at application compile-time; they can reinterpret or extend Maya syntax by
expanding it to other Maya syntax.
Categories and Subject Descriptors
D.3.3 [Programming Languages]: Language Constructs and Features
General Terms
Languages
Keywords
Java, metaprogramming, macros, generative programming
1. INTRODUCTION
Syntax extension can be used to embed a domain-specific language
within an existing language. For example, embedded SQL extends
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
PLDI’02, June 17-19, 2002, Berlin, Germany.
Copyright 2002 ACM 1-58113-463-0/02/0006 ...$5.00.
1
Maya has the following combination of features:
3. Maya supports hygiene and referential transparency in templates through a compile-time renaming step.1 This renaming is possible because binding constructs must be explicitly declared as such in Maya’s grammar. Maya’s implementation of hygiene detects most references to free variables
when templates are compiled, but it does not support Dylanlike implicit parameters.
• Mayans operate on abstract syntax. Maya can ensure that
Mayans produce valid ASTs, because AST nodes are well
typed. Since Mayans do not operate on flat token streams,
they are not subject to precedence errors that occur in typical macro systems. For example, the Java Syntactic Extender
(JSE) [2] allows macros to be defined with a case statement
that matches concrete syntax against patterns. Because JSE
macros operate on concrete syntax, they can generate many
of the same parse and precedence errors that C macros generate.
The rest of this paper is organized as follows. Section 2 reviews
some of the systems that we compare Maya against. Section 3 introduces the basics of the Maya language with an extended example. Section 4 describes the high-level design and implementation
of the Maya compiler, including how Mayans are compiled and
integrated into the Maya parser, how lazy parsing and type checking work, how templates are parsed, and how hygiene and dispatch
work. Section 5 sketches how Maya can be used to implement
an interesting language extension: namely, open classes and multimethods as defined in MultiJava by Clifton et al. [12]. Section 6
describes related work, and Section 7 summarizes our conclusions.
• Maya treats grammar productions as generic functions, and
semantic actions (Mayans) as multimethods on those generic
functions. Mayans can be dispatched on a rich set of parameter specializers: AST node types, the static types of expressions, the concrete values of tokens, and the syntactic
structure of AST nodes. Multiple dispatch allows users to
extend the semantics of the language by overriding Maya’s
base semantic actions.
2.
• Maya allows programmers to generate ASTs with templates,
a facility like quasiquote in Lisp [28]. Maya templates can be
used to build arbitrary pieces of abstract syntax. Most other
systems provide less general template mechanisms (<bigwig> [6] is an exception). For example, JTS [5] is a framework for building Java preprocessors that provides limited
support for templates, in that it only defines template syntax
corresponding to a fixed subset of the JTS grammar’s nonterminals.
BACKGROUND
In this section we describe three systems that are closely related to
Maya, and to which we compare Maya in the rest of the paper: JSE,
JTS, and OpenJava.
JSE [2] is a port of Functional Objects’ procedural Dylan macros
to Java. JSE macros are defined using Dylan patterns, but a macro’s
expansion is computed by JSE code rather than by pattern substitution. JSE macros must follow one of two syntactic forms: methodlike and statement-like macros. JSE recognizes macro keywords
through a class naming convention.
JSE provides a quasiquote mechanism to build macro return values. However, because macros operate on unparsed trees of tokens
and matching delimiters, JSE cannot statically check that quasiquotes produce syntactically correct output. In addition, macro expansion does not honor precedence.
JTS [5] is a framework for writing Java preprocessors that operate on ASTs. JTS and Maya approach the problem of extensible
languages from opposite directions, and make different tradeoffs
between flexibility and expressiveness. Whereas Maya can be used
to implement macros, it is impractical to define a simple macro by
building a JTS preprocessor.
JTS language extensions can define new productions, AST node
types, and grammar symbols. JTS extensions are also free to mutate
syntax trees, since typechecking is performed by a standard Java
compiler, after extensions have run to completion. JTS provides
great flexibility at the syntactic level but ignores static semantics.
OpenJava [29] is a compile-time metaobject protocol for Java. It
allows metaclasses to be associated with classes in the base program. A class declares its metaclass with an instantiates
clause, which appears after implements in a class declaration.
Metaclasses inherit introspection methods similar to the Java reflection API, and control translation to Java through the visitor pattern. OpenJava provides for two kinds of macro expansion: callerside translation allows a metaclass to expand expressions involving
its instance types through visit methods on various expression and
declaration forms, and callee-side translation allows a metaclass to
• Like Java classes, Mayans are lexically scoped. Local Mayan declarations can capture the state of enclosing instances.
In addition, Mayan definitions are separate from imports.
Imported Mayans are only applicable within the scope of
their import. These features allow great flexibility in the way
that syntax transformers share state and are exposed to the
base code. In comparison, compile-time metaobject protocols such as OpenJava [29] typically provide fixed relationships between transformers, state, and the lexical structure of
base code.
To support these features, Maya makes use of three new implementation techniques:
1. To support dispatch on static types, Maya interleaves lazy
type checking with lazy parsing. That is, types and abstract
syntax trees are computed on demand. Lazy type checking allows a Mayan to dispatch on the static types of some
arguments, and create variable bindings that are visible to
other arguments. The latter arguments must have their types
checked lazily, after the bindings are created. Lazy parsing
allows Mayans to be imported at any point in a program.
Syntax that follows an imported Mayan must be parsed lazily,
after the Mayan defines any new productions. The lazy evaluation of syntax trees is exposed explicitly to the Maya programmer, so that the effects of laziness can be controlled.
1
Hygiene means that variable references in Mayans cannot by default capture variables in scope at the use of a Mayan; referential transparency means that variable references in Mayans capture
variables in scope at the definition of a Mayan [13]. As is common
in the literature, we use the single term “hygiene” to mean both
hygiene and referential transparency.
2. Maya uses a novel parsing technique that we call pattern
parsing to statically check the bodies of templates for syntactic correctness. The pattern parsing technique allows a
programmer to use quasiquote to generate any valid AST.
2
modify instance class declarations. OpenJava also permits limited
extensions to the syntax of instantiates and type names.
OpenJava controls syntax translation based on the static types of
expressions, but imposes some limitations. Metaclasses must be
explicitly associated with base-level classes through the instantiates clause. As a result, primitive and array types cannot be
used to dispatch syntax expanders. Additionally, caller-side expanders override visit methods defined on a subset of Java syntax.
OpenJava lacks some features that make compile-time metaprograms robust: Its macros can generate illegal pieces of syntax, because they allow metaprograms to convert arbitrary strings to syntax. In addition, OpenJava metaclasses inspect nodes through accessor methods rather than pattern matching. Finally, OpenJava
does not support hygiene.
This code can avoid both object allocation and method calls because maya.util.Vector exposes its underlying object array
by overriding Vector.getElementData().
To support the optimization of foreach on a vector’s elements,
Maya allows semantic actions (called Mayans) to be dispatched
on both the structure and static types of arguments. For this example, the left-hand side of the specialized foreach must be a
call to elements() and the receiver of the call must have type
maya.util.Vector. Although macro systems such as Scheme
syntax-rules [21] and JSE support syntactic pattern matching, and although compile-time MOPs such as OpenJava dispatch
on static types, Maya is the first compile-time metaprogramming
system to unify these features. Maya’s multiple-dispatch model
provides benefits over the case statements of syntax-case [17]
and JSE in that the behavior of a syntactic form can be extended
without modifying the original definition.
This example shows the central challenge in providing flexible
macro dispatching. The foreach statement’s expansion depends
on the static type of v.elements(), yet the loop body cannot
be type checked until the foreach statement is expanded and the
variable st is declared. Some code cannot even be parsed until surrounding Mayans have been expanded. Maya addresses this challenge through lazy parsing and lazy type checking.
3. MAYA OVERVIEW
Maya can be used to implement simple macros as well as language
extensions such as MultiJava and aspects [4]. Maya provides a
macro library that includes features such as assertions, printf-style
string formatting, comprehension syntax for building arrays and
collections, and foreach syntax for walking them. This section
describes the features that a foreach macro should have, and the
way that these features can be implemented in Maya. In the examples that follow, we use bold text for keywords, and italic text for
binding instance names.
Given a Hashtable variable h, the following use of foreach:
3.1
Production Declarations
In Maya, a syntax extension is defined in two parts. First, a new
LALR(1) production may need to be added, if the new syntax is
not accepted by the existing grammar. Second, Mayans define semantic actions for a production, and are dispatched based on values
on the production’s right-hand side.
Before Mayans can be defined to implement foreach, we must
extend the grammar to accept the foreach syntax. The following
production would suffice (the concrete Maya syntax for describing
this production is given in the next paragraph):
h.keys().foreach(String st) {
System.err.println(st + " = " + h.get(st));
}
should expand to:
for (Enumeration enumVar$ = h.keys();
enumVar$.hasMoreElements(); ) {
String st;
st = (String) enumVar$.nextElement();
System.err.println(st + " = " + h.get(st));
}
Statement
→
MethodName ( Formal ) lazy-block
We choose this syntax to avoid making foreach a reserved word
in this context. The MethodName nonterminal matches everything
left of ‘(’ in a method invocation. In particular, MethodName
matches the ‘Expression . Identifier’ sequence accepted by foreach. In this example, lazy-block matches a Java block that is
lazily parsed: it is not parsed or type checked until its syntax tree is
needed.
The production above is declared in Maya by the following code:
where the identifier emumVar$ does not appear in source program.
Many macro systems support exactly this sort of macro. Although
JSE cannot express our chosen concrete syntax, it allows a similar
macro to be written. OpenC++ [10] includes specific support for
member statements such as foreach.
Macro overloading is useful with statements such as foreach.
We might want a version of foreach that works on arrays. Alternatively, we might want overloading for optimization. For instance,
the following code:
import maya.tree.*;
import maya.grammar.*;
import java.util.*;
maya.util.Vector v;
v.elements().foreach(String st) {
System.err.println(st);
}
abstract Statement
syntax(MethodName(Formal)
lazy(BraceTree, BlockStmts));
The production is introduced through the abstract and syntax keywords. The syntax keyword indicates that a grammar
production or Mayan is being defined. (Recall that a Mayan is a semantic action in the grammar.) The abstract keyword indicates
that a production is being defined. Intuitively, a production is like
an abstract method in that it defines an interface; a corresponding
Mayan is a concrete method that “implements” the interface.
Statement is the return type of the production (i.e., the lefthand side); it is also the return type of any corresponding Mayans
(semantic actions). The arguments to the production are the righthand side of the production. The lazy keyword indicates lazy
could be expanded to more efficient code with a specialized version
of foreach:
maya.util.Vector v;
{
Vector v$ = v;
int len$ = v$.size();
Object[] arr$ = v$.getElementData();
for (int i$ = 0; i$ < len$; i++) {
String st = (String) arr$[i$];
System.err.println(st);
}
}
3
1
2
3
4
5
6
7
The body of a Mayan is Maya code that generates an AST. For
example, EForEach’s body consists of a local variable declaration
and a return statement. The return value is computed using a template expression that builds ASTs. Template can be used to build
the ASTs from concrete syntax. For example, a template containing ‘1 + 2 * 3’ builds the corresponding tree. A template may
also contain expressions unquoted with ‘$’: the values of these expressions are substituted into the resulting AST when the template
is evaluated. Templates are statically parsed to ensure syntactic
correctness.
Maya’s templates provide automatic hygiene for lexically scoped
names; programmers are given mechanisms to explicitly break hygiene and to explicitly generate fresh names if they so desire. In the
case of EForEach, hygiene ensures two things: first, that the loop
variable will not interfere with references to other variables called
enumVar in the loop body; second, that the loop variable will have
the specific type java.util.Enumeration, regardless of the
calling context.
Like OpenJava metaclasses, Mayans have access to a variant of
the Java reflection API. References to Type objects are available through methods such as VarDeclaration.getType()
and Expression.getStaticType(). Type objects support
java.lang.Class’s introspection API and a limited form of
intercession that allows member declarations to be added to a class
body.
Mayans can use the reflection API to insulate themselves from
some details of Maya’s AST representation. For example, the abstract syntax trees for String[] args and String args[]
have different shapes, but both declare variables of the same type.
EForEach uses the reflection API in two places. First, line 7 of
Figure 2 builds a StrictTypeName node from the object that
represents the type of a variable. Second, line 13 generates a reference to a local variable directly, rather than generating an occurrence of the variable’s name. Reference.makeExpr also
allows fields to be referenced when they are shadowed by local
variables. Line 12 translates a formal parameter declaration into a
form that may be used in a statement context.
Statement syntax
EForEach(Expression:Enumeration enumExp
\. foreach(Formal var)
lazy(BraceTree, BlockStmts) body)
{
final StrictTypeName castType
= StrictTypeName.make(var.getType());
8
return new Statement {
for (Enumeration enumVar = $enumExp;
enumVar.hasMoreElements(); ) {
$(DeclStmt.make(var))
$(Reference.makeExpr(var.getLocation()))
= ($castType) enumVar.nextElement();
$body
}
};
9
10
11
12
13
14
15
16
17
18
}
Figure 2: The Mayan that implements foreach on Enumerations
parsing. The particular use of lazy above means that a BraceTree (a portion of code surrounded by braces) is to be lazily parsed
into a BlockStmts (a block of statements).
Most symbols in the Maya grammar are AST node types such
as Statement. Productions and Mayans may only be defined on
node-type symbols. Maya also supports several kinds of parameterized grammar symbols that are used to define repetition and
control lazy parsing. lazy(BraceTree, BlockStmts), is
an example of the use of a parameterized grammar symbol.
3.2 Mayan Declarations
After we define the production for foreach, we can declare Mayans to translate various kinds of foreach statements to standard
Maya syntax. Note that if no Mayans are declared on a new production (that is, no semantic actions are present on the production),
an error is signaled on input causes the production to reduce.
A Mayan declaration differs from a production declaration in
three ways: Mayan parameters have specializers and names; Mayan declarations have bodies; and Mayans do not begin with the
abstract keyword. Figure 2 shows the EForEach Mayan that
implements foreach for Enumerations (not the optimized
version).
The EForEach Mayan is defined on the LALR(1) production
described earlier, which takes the left-hand side of a method invocation followed by a formal parameter and a block. Maya determines
that this production corresponds to EForEach when the EForEach parameter list is parsed. A Mayan parameter list serves two
purposes. First, it determines which occurrences of syntax a Mayan can be applied to. Second, it binds formal parameters to actual
arguments and their substructure. Mayan parameter lists and case
patterns in functional languages serve similar purposes. In fact,
Maya’s pattern matching facility is made available through a syntax case statement as well as through Mayan dispatch.
Parameter specializers are used to narrow Mayan applicability:
EForEach only applies to MethodName nodes that contain an
explicit receiver expression. The receiver expression is bound to
enumExp and must have the static type Enumeration. The final
identifier in the MethodName syntax is also specialized to a particular token value: namely, foreach. Maya’s ability to dispatch
on the values of identifiers such as foreach allows macros to be
defined without introducing reserved words. EForEach binds the
loop variable and loop body to var and body respectively.
3.3
Using Mayans
Maya decouples Mayan definition from use: a Mayan is not implicitly loaded into the compiler at the point of declaration, but must be
loaded explicitly. This feature allows a Mayan to be compiled independent of its uses, and allows local Mayans to use any value in
their environments.
A Mayan declaration, such as EForEach in Figure 2, is compiled to a class that implements MetaProgram. An instance of
the class is allocated when a Mayan is imported. A programmer
uses the use directive to import MetaProgram instances into a
lexical scope; the argument to use can be any class that implements MetaProgram. For example, EForEach can be used in a
method body as follows:
void showEm(Enumeration e) {
use EForEach;
e.foreach(Object o) { System.out.println(o); }
}
Imports of Mayans are lexically scoped. In this example, the scope
of the translation defined by EForEach consists only of the method body of showEm.
The use directive is also valid in class bodies and at the top
level. Additionally, Maya provides a -use command line option
that allows a programmer to compile a file using different Mayan
implementations.
4
file names
abstract Statement
syntax(typedef(Identifier = StrictClassName)
lazy(BraceTree, BlockStmts));
file names
file
names
public Statement syntax
Typedef(typedef(Identifier var=StrictClassName val)
lazy(BraceTree, BlockStmts) body)
{
file
reader
class declarations
class
declarations
class
shaper
class
declarations
class
compiler
class
files
file names
// a local Mayan
StrictName syntax Subst(Identifier id) {
if (id.getName() == var.getName())
// substitute the type variable
return val;
else
// resolve this name normally
return nextRewrite();
}
stream
lexer
token trees
lists of nodes
parser
pattern
parser
single nodes
Mayan
dispatcher
// Construct a statement node that exposes
// an instance of Subst to the typedef body
return new UseStmt(new Subst(), body);
Figure 4: Overview of Maya’s internal architecture
}
Figure 3: Local Mayans, as well as local classes, may refer to
arguments of an enclosing Mayan or method.
satisfies these constraints by parsing and type checking lazily, i.e.,
computing syntax trees and their types on demand.
Figure 4 shows the major components of our Maya compiler,
mayac. The file reader reads class declarations from source files.
Our compiler then processes each class declaration in two additional stages. The class shaper parses the class body and computes
member types; the class compiler parses member initializers, including method and constructor bodies. The parser is invoked in
all three steps to incrementally refine a declaration’s AST. To compute the shape of a class C, all super-types of C and the types of
all members of C must be found. Similarly, to compile C, the
shapes of all types referred to by C’s code must be known. Maya
provides class-processing hooks that execute user-defined code as
a class declaration leaves the shaper.
The stream lexer enables lazy parsing by generating a tree of tokens rather than a flat stream. Specifically, the stream lexer creates
a subtree for each pair of matching delimiters: parentheses, braces,
and brackets. These subtrees are called lexers since they can provide input to the parser. The stream lexer resembles a Lisp reader
in that it builds trees from a simple context-free language. This tree
structure allows the Maya compiler to quickly find the end of a method body or field initializer without full parsing. Maya processes
toplevel declarations and class bodies in phases so that forward references can be resolved and typechecking can be performed within
the parser.
Unless otherwise noted, all arcs coming into the parser are lexers and all arcs going out are ASTs. The parser builds ASTs with
the help of the Mayan dispatcher: on each reduction, the dispatcher
executes the appropriate Mayan to build an AST node. Mayan dispatch may involve recursive parsing, as shown in Figure 4.
The pattern parser is used when compiling Mayans and templates. It takes a stream of both terminal and nonterminal input
symbols, and returns a partial parse tree.
The remainder of this section describes several of Maya’s features. Section 4.1 discusses Maya’s lazy grammar and how new
productions can be written to extend it. Section 4.2 discusses pattern matching in Mayan parameter lists, AST templates, and the
parsing techniques used to implement these language features. Section 4.3 describes Maya’s static approach to hygiene. Section 4.4
discusses Maya’s dispatching rules in detail. Finally, Section 4.5
compares Maya’s features to those of related Java extensions.
Since Mayans are typically small units of code, they can be aggregated into larger metaprograms. An instance of such a class
is allocated, and its run method is called to update the environment. For example, the class maya.util.ForEach defines a
single run method, which instantiates and runs each built-in foreach Mayan in turn. As a result, a programmer need only use the
maya.util.ForEach class to import all of the built-in foreach Mayans.
Like class declarations, Mayan declarations can appear within
method bodies. Local Mayan instances are closed over their lexical environment in the same way as local class instances. Because
a local Mayan definition can use any value in its environment, it
cannot be imported until these values exist. As a result, local Mayans can never be imported with use; however, local Mayans can
be instantiated and run by other metaprograms. The advantage of
local Mayans is that one Mayan can expose state to other Mayans
without resorting to templates that define Mayans. As a result, nontrivial metaprograms can be structured as a group of classes and a
few small Mayans, rather than as a series of macro definitions.
Figure 3 demonstrates the use of local Mayans. The Typedef Mayan defines an alternate name for a class within a block
of statements. Type variables are substituted by a local Mayan
that compares each identifier in the block against the argument to
Typedef. A local Mayan type, Subst, is defined within Typedef. Like a local class, this type can only be refered to in its
lexical scope. Typedef allocates an instance of Subst, and exposes that instance to body through a UseStmt node. (The code
‘use EForEach; ...’ in the previous example is translated
into a UseStmt node.) UseStmt nodes contain the metaprogram
that is imported and the list of statements in which it is visible.
4. DESIGN AND IMPLEMENTATION
Translating Maya source code to a fully expanded syntax tree involves dispatching and running Mayans. Dispatch may require type
checking, while executing Mayans may change the types of subtrees. This mutual recursion precludes parsing and type checking
in one pass; Mayan dispatch requires that they be interleaved. Maya
5
Statement
MethodName
Expression
.
(Formal)
Recall that EForEach in Figure 2 is defined on the production
given in Section 3.1. The pattern parser must infer the structure
of EForEach’s argument list, which is shown in Figure 5. Since
EForEach is a semantic action, it takes three actual arguments:
a MethodName, a parenthesized Formal, and a lazily parsed
block. However, the first argument does not explicitly appear in
EForEach’s formal parameter list. The pattern parser infers the
structure of the first argument by parsing the symbols in the argument list.
The pattern parser also parses template bodies. Maya guarantees
that a template is syntactically correct by parsing its body when
the template is compiled. The pattern parser generates a parse tree
from a sequence of tokens that may be interleaved with expressions
unquoted with ‘$’. An unquote expression’s grammar symbol is
determined by its static type or an explicit coercion operator.
The template parse tree is compiled into code that performs the
same sequence of shifts and reductions the parser would have performed on the template body. Templates honor laziness: sub-templates that correspond to lazy syntax are compiled into local thunk
classes that are expanded when the corresponding syntax would be
parsed.
While some systems have developed ad hoc approaches for template parsing [5, 10, 26, 29, 30], Maya opts for a more general
solution. Our pattern-parsing algorithm allows Mayans to be defined on any nonterminal node type, and templates can generate
both nonterminal node types and parameterized symbols. A proof
of this algorithm’s correctness is available elsewhere [3].
lazy(BraceTree, BlockSmts)
Identifier
Figure 5: The structure of EForEach’s formal parameters
4.1 Parsing
Maya productions are written in a high-level metagrammar that explicitly supports laziness. Users can define general-purpose productions on any builtin nonterminal of the Maya grammar. Userdefined productions are indistinguishable from those built into Maya. In addition, parameterized grammar symbols may implicitly
define productions on new nonterminals.
Production arguments (right-hand sides) consist of token literals,
node types, matching-delimiter subtrees, or parameterized symbols. (lazy(BraceTree, BlockStmts) is an example of a
parameterized symbol.) The former two symbol types are used directly by the LALR(1) grammar, while the latter two require special
handling. When a subtree or parameterized symbol is encountered,
Maya ensures that the corresponding production is defined in the
LALR(1) grammar, and uses the left-hand side in the outer production.
For instance, the production used by foreach includes both a
subtree and a parameterized symbol:
The Parsing Algorithm. The description of the pattern parsing algorithm uses the function names and lexical conventions of
Aho et al. [1, §4.4]: upper-case letters are nonterminal symbols,
lower-case Greek letters are strings of terminals and nonterminals,
lower case Roman letters are strings of terminals, and FIRST maps
a string of symbols to the set of all terminals that appear first in
some derivation.
The pattern parser uses parse tables in much the same way as a
normal LALR(1) parser. Terminal input symbols are shifted onto
the stack or trigger reductions normally. However, nonterminal
symbols require some special handling. Figure 6 provides concrete
examples of how nonterminals can be parsed. When the pattern
parser encounters a syntactically valid input Xγ, there must be a
production Y → αXβ in the grammar, and the parser must be in
some state s0 such that one of the following cases holds:
abstract Statement
syntax(MethodName(Formal)
lazy(BraceTree, BlockStmts));
It is translated to the set of productions below:
Statement
G0
G1
→
→
→
MethodName G0 G1
ParenTree
BraceTree
Semantic actions (which are not shown) on G0 and G1 produce
AST nodes from unstructured subtrees. The semantic action for
G0 recursively parses the ParenTree to a Formal, which is mentioned explicitly in the production. The action for G1 delays the
parsing of the BraceTree, as specified by the lazy symbol. If
the productions and actions already exist in the grammar, they are
not added again. For example, the production and action for G0 are
used to parse both foreach and catch clauses; those for G1 are
used throughout the Maya grammar.
A production is valid if it does not introduce conflicts into the
grammar. The Maya parser generator attempts to resolve conflicts
with operator precedence relations. Unlike YACC, Maya does not
resolve shift/reduce conflicts in favor of shifts or reduce/reduce
conflicts based on the lexical order of productions. The parser generator rejects grammars that contain unresolved LALR(1) conflicts.
1. s0 contains an item Y → α.Xβ: actions on FIRST(X) are all
shifts, and s0 contains a goto for X that leads to some state
sn . In this case, X is shifted onto the stack, and the goto is
followed. That is, the current parsing state will “accept” an
X because there is an entry in the goto table.
Figure 6(b) illustrates this case, given the grammar in Figure 6(a). In this example, the metavariable X corresponds to
A; Y corresponds to S; α corresponds to D e; and β and γ
correspond to .
4.2 Pattern Parsing
2. s0 contains an item Z → δ., where Z is a nonterminal such
* ζZ, and the actions on FIRST(Xγ) all reduce the
that α ⇒
same rule Z → δ. In this case, the stack is reduced, which
leads to a state s1 in which one of the above conditions holds.
That is, the current parsing state will “accept” an X because
we can perform a reduction on the input before X.
6(c) illustrates this case, again given the grammar in Figure 6(a). In this example, X corresponds to A; Y corresponds to S; α and Z correspond to F ; and β, γ, and ζ
correspond to .
Maya uses patterns in two ways: first, to establish bindings in Mayan parameter lists and case clauses; and second, to compile templates that dynamically construct AST nodes. Although these uses
are very different, they involve the same data structure: a partial
parse tree built from a sequence of both terminal and nonterminal
input symbols. The pattern parser differs from a standard parser in
that its input may include nonterminals as well as tokens. Just as
the pattern parser accepts nonterminal input symbols, it also generates parse trees that may contain nonterminal leaves.
6
i n p u t: d e . A
s ta te = 5 6
A
D
F
S
→
|
|
→
→
→
|
a
b
c
d
f
DeA
FA
(a) A simple grammar
i npu t: d eA .
i n p u t: f.A
i nput: f. A
state = 6 74
state = 67
state = 33
sta c k:
stac k:
ac
a c ti o n s f o r 5 6
stac k:
stac k:
a ct io n s o n 6 7
a s h i f t 57
57
b s
hift 5
8
shift
58
A
a rredu
e d u ce F
e
c s hift 5 9
e
b r e duce
d u ce F
D
A g ot o 67 4
D
c re
edu
d u ce F
f
(b) A goto can be followed if present
F
(c) Otherwise, FIRST(A) serves as lookahead
Figure 6: Pattern parsing example
Statement
If neither case holds, the input must not be valid. Note that X could
be invalid in the second case, and that the pattern parser may not
detect the error until it has performed some reductions.
MethodName
CallExpr .
4.3 Hygiene
Maya supports compile-time determination of variable hygiene in
templates, unlike most macro systems. Maya’s static approach to
hygiene detects references to unbound variables when a template is
compiled, rather than when it is executed. In general hygiene decisions can only be made when the syntactic role of each identifier is
known. In most systems, this information is only available after all
macros have been expanded. The key to Maya’s hygiene rules is
that a Mayan writer must make explicit which identifiers are bound
and which are not: productions that establish lexically scoped bindings must use special nonterminals such as UnboundLocal.
Maya examines trees produced by the pattern parser to decide
where hygienic and referentially transparent renaming should occur. Referential transparency in Mayan parameter lists ensures that
a class name matches nodes that denote the same fully qualified
name, and that class names in templates refer to the appropriate
classes. Maya implements referential transparency by generating
nodes that refer to classes directly.
Maya’s implementation of hygiene relies on direct generation of
bytecode. Maya assigns fresh names to local variable declarations
generated by Mayans. The names that Maya generates contain the
character ‘$’ and are guaranteed to be unique within a compilation
unit. In a source-to-source translator, hygiene would have to be
implemented through a more complicated mechanism.
As an aside, implementing referential transparency in a translation to Java source code would be difficult, since class names
can be completely inaccessible in certain packages. For instance,
java.lang.System cannot be referenced inside the package
below, due to Java’s name rules:
MethodName
Expression
.
(Formal)
lazy(BraceTree, BlockSmts)
Identifier
list(Expression,‘,‘)
Identifier
Figure 7: The structure of VForEach’s formal parameters
4.4
Dispatch
Mayan dispatching is at the core of Maya. Each time a production
is reduced, the parser dispatches to the appropriate Mayan. This
Mayan is selected by first finding all Mayans applicable to the production’s right-hand side, then choosing the most applicable Mayan
from this set. While multiple dispatch is present in many languages
such as Cecil [9], CLOS [28], and Dylan [27], the exact formulation of Maya’s rules is fairly unique.
Mayan dispatching is based on the runtime types of AST nodes,
but also considers secondary parameter attributes. As described
earlier, the pattern parser builds Mayan parameter trees from unstructured lists of tokens and named parameters. A Mayan parameter consists of a node type and an optional secondary attribute. This
attribute may contain subparameters, a token value, a static expression type, or a class literal type. For example, in Figure 5, EForEach specializes MethodName on its structure, Expression
on its static type, and Identifier on the token value foreach.
Secondary attributes are used to compare formal parameters with
identical node types. Static expression types are compared using
subtype relationships; substructure is compared recursively; class
types and token values must match exactly. For example, the following optimized foreach definition
package p;
class java { }
class System { }
Statement syntax
VForEach(Expression:maya.util.Vector v
\.elements()\.foreach(Formal var)
lazy(BraceTree, BlockStmts) body)
{ /* ... */ }
Maya provides mechanisms both to explicitly generate unique
names and to explicitly break hygiene. Environment.makeId() generates a fresh Identifier. Mayans can violate hygiene by unquoting Identifier-valued expressions in templates,
rather than embedding literal names. Referential transparency can
also be bypassed through the introspection API. Maya’s hygiene
rules do not support implicit parameters that are only shared between a Mayan and its point of use: names are either fully hygienic
or fully unhygienic.
overrides EForEach for certain input values. The structure of
VForEach’s formal parameters is shown in Figure 7. VForEach
is more specific than EForEach because the foreach “receiver”
is specialized on the CallExpr node type, while EForEach does
not specialize the receiver node type.
7
5.
Mayan dispatching is symmetric. If two Mayans are more specific than each other on different arguments, neither is more specific
overall, and an error is signaled. This approach avoids the surprising behavior described by Ducournau et al. [16], and is consistent
with Java’s static overloading semantics.
Maya supports an unusual form of lexical tie breaking among
Mayans that are equally specific: Mayans that are imported later
override earlier Mayans. Lexical tie-breaking allows several Mayans to be imported with identical argument lists. Where several
Mayans are equal according to the argument lists, the last Mayan
imported with use is considered most applicable. For example,
many of the Mayans that we have written do not specialize their
arguments. Such Mayans are more applicable than the built-in Mayans only because the built-in Mayans are imported first.
Maya provides an operator that is necessary to support the layering of macros. The nextRewrite operator calls the next-mostapplicable Mayan, and is analogous to super calls in methods.
This operator may only be used within Mayan bodies, whereas
other Maya features such as templates may be used in any Maya
code.
To evaluate Maya, we implemented MultiJava, an extension to Java
that Clifton et al. have specified [12]. The first author’s thesis [3]
contains several smaller examples of how Maya can be used; it also
discusses the MultiJava implementation in greater detail.
MultiJava adds generic functions to Java through two new constructs: open classes and multimethods. Open classes allow methods to be declared at the top level, and multimethods allow dispatch
based on the runtime types of all method arguments. MultiJava’s
formulation of open classes and multimethods supports separate
compilation.
5.1
MultiJava Overview
Open classes in MultiJava offer an alternative to the visitor design
pattern. Rather than maintaining a visit method on each class in a
hierarchy and a separate hierarchy of visitor classes, one can simply
augment the visited class hierarchy with externally defined methods. External methods can be added without recompiling the visited class hierarchy, but a class can override external methods that
it inherits.
Within an external method, this is bound to the receiver instance. However, external methods in MultiJava may not access
private fields of this, nor can they access nonpublic fields of
this if the receiver class is defined in another package.
MultiJava defines a compilation strategy and type checking rules
that together allow external methods and their receiver classes to
be compiled separately. Essentially, an external virtual function is
compiled to a dispatcher class, and calls to an external method are
replaced with calls to a method in the dispatcher class.
In addition to open classes, MultiJava supports multimethods.
Multimethods allow a class to define several methods with the same
name and signature, but distinct parameter specializers. Each virtual function is treated as a generic function, so that multiple dispatch and static overloading can be used together. A super call
in a multimethod (to the same generic function) selects the next
applicable method, rather than the method defined by a superclass.
MultiJava imposes some restrictions on parameter specializers
to support separate compilation. First, only class-typed parameters may be specialized, and they may only be specialized by subclasses. Second, a concrete class must define or inherit multimethods for all argument types (including abstract types). These restrictions allow MultiJava to statically ensure that there is exactly one
most applicable method for every possible call to a generic function.
4.5 Discussion
Maya’s expressiveness can be compared to that of the other languages discussed in Section 2. Neither JSE, JTS, nor OpenJava
support automatic hygiene. JSE and JTS do not provide static type
information, while JSE and OpenJava provide limited facilities for
defining new syntax.
JTS allows extensions to define arbitrary new productions, while
Maya only allows general-purpose productions to be defined on a
fixed set of left-hand sides. JTS extensions are also free to extend
the abstract syntax tree format. Since Maya must be able assign
standard Java meaning to every syntax tree, only a few specific
node types may be subclassed by users.2
Finally, JTS extensions may walk syntax trees in any order and
perform mutation. In contrast, Mayans are executed by the parser, and can only mutate class declaration by adding and removing
members when certain dynamic conditions are met. This restriction ensures that class updates cannot change the results of previous
type checking and Mayan dispatch.
OpenJava and Maya both allow some form of syntax extension,
dispatch based on static information, and the association of syntax transformation code with the lexical structure of the program.
Maya provides more flexibility in these areas.
OpenJava allows three extensions to declaration syntax: new
modifiers may be defined, clauses may be added to the class declaration syntax, and suffixes may be added to the type in a variable
declaration. In all cases, a keyword signals the presence of extended syntax, and the metaclass must provide a parsing routine. In
contrast, Maya allows new productions to be defined on any grammar symbol, and does not require that keywords be used.
OpenJava dispatches caller-side visit methods to transform an
input program based on its static types. However, visit methods
are only defined for a few expression and declaration forms. In
contrast, Mayans may be defined on all productions in the base
grammar, as well as user-defined productions.
OpenJava associates syntax transformation methods with class
declarations through the instantiates clause. A metaclass is
free to examine and modify the bodies of its instance classes. Although Maya does not provide a similar mechanism, local Mayan
definitions and imports allow user-defined metaobjects to be created easily. These metaobjects need not be associated with classes.
2
MULTIJAVA
5.2
Maya Implementation
We implemented MultiJava with a mixture of class and Mayan definitions. The majority of code is defined in ordinary methods, rather
than in Mayan bodies. Maya provides several features that make a
MultiJava implementation practical:
• The full power of Maya’s extensible LALR(1) grammar is
used to extend Java’s concrete syntax in two ways. First,
MultiJava allows methods to be declared outside of their receiver class. These methods declarations include the receiver
class name immediately after their return type.
abstract Declaration
syntax(list(Modifier) LazyTypeName
QualifiedName\.Identifier (FormalList)
Throws lazy(BraceTree, BlockStmts));
Second, MultiJava allows method parameters to be declared
with both a base type that determines the method’s virtual
This restriction is not currently enforced in the implementation.
8
function and a type specializer that narrows the method’s applicability at runtime. A parameter specializer appears after
an ‘@’ in a parameter declaration.
Expression dispatchArg
(VarDeclaration[] vars, // dispatcher params
Vector applicable,
// applicable methods
int n)
// argument to dispatch
{
if (n == vars.length || applicable.size() == 1){
// Applicable methods are sorted from most to
// least specific.
MultiMethod m =
(MultiMethod) applicable.elementAt(0);
return dispatchCall(vars, m);
}
abstract Formal
syntax(list(Modifier)
LazyTypeName @ LazyTypeName
LocalDeclarator);
• Maya’s lexical tie-breaking rule lets MultiJava transparently
change the translation of base Java syntax. Our MultiJava
implementation examines every ordinary method declaration
to determine whether its parameters include specializers, and
whether it overrides an external method. Maya dispatches
to these Mayans rather than built-in Mayans because of the
dispatcher’s lexical tie-breaking rule.
// For each type specializer on the nth
// argument, find all methods that may be
// applicable when that type is encountered.
// Sort the array of specializers and methods so
// that subtypes come before supertypes. This
// gives us a valid order in which to perform
// instanceof tests.
Map.Entry[] ents = sortOnArg(applicable, n);
• Maya provides standard Java type information to Mayans,
which MultiJava needs to enforce its type checking rules.
• Semantic actions are implemented as local Mayans, which
allows them to share state.
// Generate dispatch code from right to left,
// i.e., generate superclass cases first.
int specInd = ents.length - 1;
Vector v = (Vector) ents[specInd].getValue();
Expression ret = dispatchArg(vars, v, n + 1);
for (specInd-- ; specInd >= 0; specInd--) {
Type t = (Type) ents[specInd].getKey();
Vector app = (Vector)ents[specInd].getValue();
ret = new Expression {
($(as Expression Reference.make(vars[n]))
instanceof $(StrictTypeName.make(t)))
? $(dispatchArg(vars, app, n + 1))
: $ret
};
}
return ret;
• The lexical scoping of imported Mayans is used to limit the
scope of translations.
Local Mayan declarations and lexically scoped imports allow
our MultiJava implementation to be structured hierarchically into
high-level constructs. For example, our implementation instantiates classes named MultiMethod and GenericFunction to
keep track of MultiJava language constructs. These objects are similar to instances of metaclasses in OpenJava; unlike metaclasses,
our classes are not a fixed part of the Maya language. MultiMethod and GenericFunction store information that is used
to ensure that generic function definitions cannot produce dispatch
errors, and to compute the method of super sends from multimethods. The actual translation of super sends is performed by
a method-local Mayan defined in MultiMethod. This Mayan
relies on its enclosing MultiMethod instance, and is exported
locally within a multimethod body.
GenericFunction generates code to dispatch multimethods.
For example, if D is a subclass of C,
}
Figure 8: Simplified version of code to that generates mulitmethod dispatching expressions.
50,000 lines in kjc. In contrast, our MultiJava implementation is
less than 2,500 noncomment, nonblank lines of code. It should be
noted that Clifton’s MultiJava implementation involved fixing bugs
and implementing standard Java features in kjc.
One might consider implementing MultiJava in the languages
described in Section 2. It is hard to imagine a MultiJava implementation in a macro system such as JSE, because JSE macros
cannot transparently expand method declarations. JTS is a more
viable option: since JTS allows the Java grammar to be extended
and reinterpreted, it is certainly powerful enough to express MultiJava. However, one would have to implement Java’s standard type
checking rules along with MultiJava’s rules, since JTS relies on
javac for its type checking.
OpenJava provides some of the features that one would need.
OpenJava allows the translation of a class to be changed when its
declaration is annotated with an instantiates clause. A simple extension to OpenJava might allow the default metaclass to be
changed transparently. OpenJava also exposes static type information to metaprograms, and supports lexical scoping through calleeside translation. However, OpenJava supports very a limited form
of syntax extension that cannot express new kinds of declarations
such as external methods. OpenJava’s model of lexically scoped
macros is also a poor match for MultiJava, since OpenJava supports class, but not method, metaobjects.
class D extends C {
int m(C c)
{ return 0; }
// a multimethod: executes if c is dynamically
// a D
int m(C@D c) { return 1; }
}
D will be translated as follows, with the Java method m generated
by GenericFunction:
class D extends C {
private int m$1(C c) { return 0; }
private int m$2(D c) { return 1; }
int m(C c) {
return c instanceof D ? m$2((D) c) : m$1(c);
}
}
The body of m is generated by a recursive method, GenericFunction.dispatchArg(). A simplified version of this method, which does not perform error checking or support void methods, is shown in Figure 8.
5.3 Discussion
Clifton [11] implemented MultiJava by modifying the Kopi [15]
Java compiler, kjc. He added or materially altered 20,000 of the
9
6. RELATED WORK
return values are statically checked for both syntactic correctness
and type safety. In contrast, Maya checks the syntactic structure
of a template at compile-time, but only type checks the resulting
tree when a template is expanded. MacroML supports three macro
forms. Function macros consist of the macro name followed by one
or more arguments, and may not establish bindings. Lambda-like
macros consist of the macro name, a variable name, and a body
in which the variable is bound. Finally, let-style macros consist of
the let keyword, the macro name, a variable name and initializer,
and a body in which the variable is bound. Since MacroML makes
bindings explicit in a macro’s form, it can make hygiene decisions
statically.
Cardelli et al. [8] define an extensible LL(1) parsing package
with macro-like features. A semantic action may not contain arbitrary code. Instead, it consists of calls to a predefined set of AST
node constructors and references to parameters on the production’s
right hand side. Semantic actions in a language extension can also
be defined through templates written in the base language syntax.
This system takes an unusual approach to hygiene and referential
transparency: hygienic names must be explicitly declared as local, but referential transparency is provided automatically.
<bigwig> [6] includes an unusual pattern-based macro facility in which macro expansion is guaranteed to terminate. <bigwig> macros define new productions in an extended LL(1) grammar. Macro productions consist of a macro keyword followed by
zero or more terminal and nonterminal arguments. <bigwig> also
allows the programmer to define new nonterminal symbols called
metamorphs. Metamorphs are named and can be mutually recursive, features not present in Maya’s parameterized grammar symbols.
A <bigwig> macro body is a template that contains concrete
syntax and references to nonterminal arguments. <bigwig> allows all nonterminals to appear in macro arguments and templates
(as Maya does), but Brabrand and Schwartzbach do not indicate
whether their template implementation uses extra grammar productions or direct parser support.
<bigwig>’s extended LL(1) parser defines two conflict resolution rules. First, longer rules are chosen over shorter rules. Second,
smaller FIRST sets are considered more specific than larger ones.
This second rule handles identifiers used as keywords elegantly, but
has farther-reaching implications. <bigwig>’s definition of wellformed grammars would allow:
Most multiple-dispatch systems allow dispatch on more than just
types. For instance, CLOS [28] dispatches on argument values using eql specializers, and Dylan [27] generalizes this feature with
singleton and union types. Maya’s dispatching rules bear a particular resemblance to the grand unified dispatcher (GUD) in Dubious [18]. Like GUD, Maya uses pattern matching. Additionally,
GUD allows dispatch on user-defined predicates whereas Maya allows dispatch on the static types of expressions. Languages such
as Dubious and Cecil [9] perform static checks to ensure that all
generic function invocations unambiguously dispatch to a method.
Maya defers these checks to expansion time.
A* [23] is a family of languages for writing preprocessors: it
extends Awk to operate on syntax trees for particular source languages. A* supports a limited form of pattern matching: a case
guard may be either a boolean expression or a YACC production.
A production case matches nodes generated by the corresponding
production in A*’s parser, and introduces new syntax for accessing
the node’s children. Unlike Maya’s patterns, A* patterns cannot
contain substructure or additional attributes such as identifier values or expression types.
Maya is certainly not the only syntax manipulation system to
support pattern matching. Scheme’s syntax-rules [21] and its
descendants [17, 27] allow macros to be defined using case analysis.
Maya borrows many ideas from high-level macro systems such
as Chez Scheme’s syntax-case [17], Dylan [27], and XL [24].
However, Maya makes a different set of tradeoffs than these systems. Specifically, Maya’s ability to statically check template syntax comes at the cost of hygienic implicit parameters, and local
Mayans can share state more easily than local macros.
Chez Scheme’s syntax-case provides programmable hygienic macros with lexical scoping. Unlike Chez Scheme, Maya separates the concepts of local macro definition and local macro import. This separation allows Mayans to share context directly. In
syntax-case, context can only be shared indirectly: data must
be encoded as literals and exposed to inner macros through letsyntax bindings. This technique can be cumbersome in any language, and is particularly unattractive in languages such as Java
that limit literals to numbers, strings, and arrays of literals.
McMicMac [22] is an extended macro facility for Scheme that
overcomes some of these limitations. McMacMic allows macros
to explicitly pass inherited attributes as arguments and synthetic
attributes as multiple return values. McMicMac also allows macros
to be defined on “special” grammar constructs such as procedure
application, but it is less powerful than Maya’s multiple-dispatch
model.
MS2 [30] is essentially Common Lisp’s defmacro for the C
language. Macro functions are evaluated by a C interpreter and
may expand declarations, statements, or expressions. As in Dylan,
macro keywords are recognized by the lexer, and macro parameters
may be recursively parsed according to their syntactic types.
MS2 defines template syntax for building ASTs and supports a
polymorphic unquote operator, ‘$’. Maya shares these features.
When parsing templates, MS2 type checks ‘$’ expressions to determine the grammar symbols they produce. In MS2 , templates can
only generate declarations, statements, and expressions, but three
additional node types can be unquoted: identifiers, numeric literals, and type specifiers. MS2 ’s recursive-descent parser is written
to accept macro calls and unquoted expressions at these few welldefined places.
MacroML [20] extends ML with a type-safe macro system. In
MacroML, macro expansion cannot produce type errors. Macro
S
T
→
→
aa|Tb
a|b
even though <bigwig> cannot parse “a b”. Maya’s LALR(1) parser can accept this language, but more importantly, Maya signals
an error when grammar conflicts cannot be resolved.
Maya’s parsing strategy offers several advantages over <bigwig>’s. First, it allows parsing and type checking to be interleaved.
Second, it uses standard LALR(1) parsing techniques, which are
more flexible than LL(1). Finally, Maya rejects grammars that it
cannot recognize.
Camlp4 [14] is a preprocessor for the OCaml programming language. Camlp4 parses OCaml source code using a variant of LL(1)
parsing. Macros may extend the OCaml grammar, but as in <bigwig>, conflicts are not detected. Camlp4 allows syntax trees to be
generated through quotations that are similar to Maya’s templates.
Quotations can also be used in pattern matching. Maya provides
similar functionality through the syntax case statement. Maya
differs from Camlp4 in that it exposes static type information to
macros and that it rejects grammar extensions that produce conflicts.
10
ELIDE [7] is a Java preprocessor that allows programmers to
define new declaration modifiers in terms of syntax transformer
classes. Like OpenJava and Maya, ELIDE allows syntax transformers to modify declarations through an extension of the Java
reflection API. OpenJava and ELIDE are also similar in that they
allow strings to be dynamically converted to syntax objects and do
not support general syntax extensions. However, where OpenJava
allows a single metaclass to control the compilation of a class declaration, ELIDE allows an arbitrary number of modifiers.
[4] J. Baker and W. C. Hsieh. Runtime aspect weaving through
metaprogramming. In Proceedings of the First International
Conference on Aspect-Oriented Software Development,
Enschede, The Netherlands, Apr. 2002.
[5] D. Batory, B. Lofaso, and Y. Smaragdakis. JTS: Tools for
implementing domain-specific languages. In Proc. of the 5th
International Conference on Software Reuse, pages 143–153,
Victoria, Canada, 1998.
[6] C. Brabrand and M. Schwartzbach. Growing languages with
metamorphic syntax macros. In Proceedings of the Workshop
on Partial Evaluation and Semantics-Based Program
Manipulation ’02, Portland, OR, Jan. 2002.
http://www.brics/dk/˜mis/macro.ps.
[7] A. Bryant, A. Catton, K. De Volder, and G. C. Murphy.
Explicit programming. In Proceedings of the First
International Conference on Aspect-Oriented Software
Development, Enschede, The Netherlands, Apr. 2002.
[8] L. Cardelli, F. Matthes, and M. Abadi. Extensible syntax with
lexical scoping. Technical Report 121, DEC SRC, Feb. 1994.
[9] C. Chambers. The Cecil Language Specification and
Rationale: Version 2.0, 1995.
[10] S. Chiba. A metaobject protocol for C++. In Proceedings of
the Conference on Object-Oriented Programming Systems,
Languages, and Applications’95, pages 285–299, Austin,
TX, 1995.
[11] C. Clifton. The MultiJava project.
http://www.cs.iastate.edu/˜cclifton/
multijava/index.shtml.
[12] C. Clifton, G. Leavens, C. Chambers, and T. Millstein.
MultiJava: Modular open classes and symmetric multiple
dispatch for Java. In Proceedings of the Conference on
Object-Oriented Programming Systems, Languages, and
Applications ’00, pages 130–146, Minneapolis, MN, Oct.
2000.
[13] W. Clinger and J. Reese. Macros that work. In Proceedings
of the 18th Annual ACM Symposium on Principles of
Programming Languages, pages 155–162, Toronto, Ontario,
Jan. 1991.
[14] D. de Rauglaudre. Camlp4 reference manual.
http://caml.inria.fr/camlp4/manual/, Jan.
2002.
[15] W. Divoky, C. Forgione, T. Graf, C. Laborde, A. Lemonnier,
and E. Wais. The Kopi project.
http://www.dms.at/kopi/index.html.
[16] R. Ducournau, M. Habib, M. Huchard, and M. Mugnier.
Monotonic conflict resolution mechanisms for inheritance. In
Proceedings of the Conference on Object-Oriented
Programming Systems, Languages, and Applications ’92,
pages 16–24, Vancouver, BC, Oct. 1992.
[17] R. K. Dybvig, R. Hieb, and C. Bruggeman. Syntactic
abstraction in Scheme. Lisp and Symbolic Computation,
5(4):pp. 295–326, 1993.
[18] M. Ernst, C. Kaplan, and C. Chambers. Predicate
dispatching: A unified theory of dispatch. In Proceedings of
the Conference on Object-Oriented Programming Systems,
Languages, and Applications ’98, pages 186–211,
Vancouver, BC, Oct. 1998.
[19] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design
Patterns: Elements of Reusable Object-Oriented Software.
Addison Wesley, Massachusetts, 1994.
[20] S. Ganz, A. Sabry, and W. Taha. Macros as multi-stage
7. CONCLUSIONS
We have described Maya, a version of Java that supports userdefined syntax extensions. These extensions are written as multimethods on generic functions, which makes them expressive and
powerful. Local Mayan declarations and lexically scoped imports
allow great flexibility in how language extensions are structured.
Several implementation mechanisms make Maya’s macros easy to
write:
• Pattern parsing is a novel technique for clean and general
quasiquoting in languages like Java.
• Lazy type checking allows Maya to dispatch arbitrary Mayans based on static type information, even though Mayans
may create variable bindings.
• Lazy parsing allows lexically scoped Mayans to change the
grammar.
• Maya’s static template parsing and hygienic renaming allows
large classes of macro errors to be detected: syntax errors are
detected in parsing, and references to unbound variables are
detected during hygiene analysis.
While Maya is well suited to implement language extensions
such as aspect weaving [4] and MultiJava, it is equally well suited
for smaller tasks such as the foreach macro described in Section 3. The Maya compiler and our MultiJava implementation are
available at http://www.cs.utah.edu/˜jbaker/maya.
Acknowledgments
We thank Eric Eide for his many comments on drafts of this paper.
We also thank Sean McDirmid and John Regehr for their feedback.
This research was supported in part by the National Science Foundation under CAREER award CCR–9876117 and the Defense Advanced Research Projects Agency and the Air Force Research Laboratory under agreement number F33615–00–C–1696. The U.S.
Government is authorized to reproduce and distribute reprints for
Governmental purposes notwithstanding any copyright annotation
hereon.
8. REFERENCES
[1] A. Aho, R. Sethi, and J. Ullman. Compilers: Principles,
Techniques, and Tools. Addison-Wesley, 1986.
[2] J. Bachrach and K. Playford. The Java syntactic extender
(JSE). In Proceedings of the Conference on Object-Oriented
Programming Systems, Languages, and Applications ’01,
pages 31–42, Tampa Bay, FL, Oct. 2001.
[3] J. Baker. Macros that play: Migrating from Java to Maya.
Master’s thesis, University of Utah, Dec. 2001. http://
www.cs.utah.edu/˜jbaker/maya/thesis.html.
11
[21]
[22]
[23]
[24]
[25]
computations: Type-safe, generative, binding macros in
MacroML. In Proceedings of the International Conference
on Functional Programming ’01, pages 74–85, Florence,
Italy, Sept. 2001.
R. Kelsey, W. Clinger, and J. Rees (Eds.). The revised5
report on the algorithmic language Scheme. ACM SIGPLAN
Notices, 33(9), Sept. 1998.
S. Krishnamurthi. Linguistic Reuse. PhD thesis, Rice
University, 2001.
D. A. Ladd and J. C. Ramming. A*: A language for
implementing language processors. IEEE Transactions on
Software Engineering, 21(11):894–901, Nov. 1995.
W. Maddox. Semantically-sensitive macroprocessing.
Master’s thesis, University of California, Berkeley, 1989.
Microsoft. C# language specification.
http://msdn.microsoft.com/library/
dotnet/csspec/vclrfcsharpspec_Start.htm.
[26] M. Poletto, W. Hsieh, D. Engler, and M. Kaashoek. “‘C and
tcc: A Language and Compiler for Dynamic Code
Generation”. ACM Transactions on Programming Languages
and Systems, 21(2):324–369, 1999.
[27] A. Shalit. Dylan Reference Manual. Addison-Wesley, 1996.
[28] G. Steele Jr. Common Lisp, the Language. Digital Press,
second edition, 1990.
[29] M. Tatsubori, S. Chiba, M. Killijian, and K. Itano. OpenJava:
A class-based macro system for Java. In Proceedings of the
OOPSLA ’00 Reflection and Software Engineering
Workshop, Minneapolis, MN, Oct. 2000.
[30] D. Weise and R. Crew. Programmable syntax macros. In
Proceedings of the SIGPLAN ’93 Conference on
Programming Language Design and Implementation, pages
156–165, Albuquerque, NM, June 1993.
[31] Xerox. The AspectJ programming guide. http:
//www.aspectj.org/doc/dist/progguide/.
12
Download