where type

advertisement
Typed Compilation of
Recursive Datatypes
Joseph C. Vanderwaart, Derek Dreyer, Leaf Petersen,
Karl Crary, Robert Harper, and Perry Cheng
Carnegie Mellon University
TLDI 2003
SML Datatypes
• Elegant mechanism for defining recursive variant
types, such as:
datatype intlist = Nil | Cons of int * intlist
• Important that constructor applications and pattern
matching should be implemented efficiently
• Subject of this talk:
– How to implement SML datatypes efficiently in a
type-preserving compiler
2
Formal Framework
• Harper and Stone’s type-theoretic interpretation of
Standard ML:
– “Elaborates” SML programs into a type theory
• Reasons for using HS:
– Models first phase of type-preserving compiler, in
particular the TILT compiler (developed at CMU)
– Can explain datatype semantics in terms of type theory
3
Overview
• Three interpretations of datatypes:
– Harper-Stone interpretation
– Transparent interpretation
– Coercion interpretation
• Comparison on three axes:
– Efficiency
– Fidelity to the Definition of SML
– Meta-theoretic complexity
4
The Harper-Stone Interpretation
Datatype Semantics
• SML datatypes are generative:
– Identical datatype declarations in separate modules yield
distinct (abstract) types
• HS elaborates datatypes as modules providing:
– The datatype itself defined as a recursive sum type
– Functions to construct and destruct values of the datatype
• HS models generativity by “sealing” the datatype
module with an abstract signature
6
ExpDec Example
datatype exp =
|
and dec =
|
VarExp
LetExp
ValDec
SeqDec
VarExp(v)
LetExp(d,e)
ValDec(v,e)
SeqDec(d1,d2)
¼
¼
¼
¼
of
of
of
of
var
dec * exp
var * exp
dec * dec
“v”
“let d in e”
“val v = e”
“d1; d2”
7
ExpDec Implementation
structure ExpDec :> EXPDEC = struct
type exp =
m1(a,b).(var + b * a, var * a + b * b)
type dec =
m2(a,b).(var + b * a, var * a + b * b)
fun exp_in x = rollexp(x)
fun exp_out x = unrollexp(x)
fun dec_in x = rolldec(x)
fun dec_out x = unrolldec(x)
end
8
ExpDec Interface
signature EXPDEC =
sig
type exp
type dec
val exp_in : var + (dec * exp) -> exp
val exp_out : exp -> var + (dec * exp)
val dec_in : (var * exp) + (dec * dec) -> dec
val dec_out : dec -> (var * exp) + (dec * dec)
end
9
Elaborating Constructor Calls
• Client of the datatype does the injection into the sum,
then calls the datatype’s “in” function:
Ã
LetExp(d,e)
Ã
ValDec(v,e)
Ã
SeqDec(d1,d2) Ã
VarExp(v)
ExpDec.exp_in(inj1(v))
ExpDec.exp_in(inj2(d,e))
ExpDec.dec_in(inj1(v,e))
ExpDec.dec_in(inj2(d1,d2))
• But the cost of function calls to the in functions is
too expensive.
10
Inlining the Constructor Calls
• We would like to inline the roll’s to avoid calling the
exp_in and dec_in functions:
Ã
LetExp(d,e)
Ã
ValDec(v,e)
Ã
SeqDec(d1,d2) Ã
VarExp(v)
rollExpDec.exp(inj1(v))
rollExpDec.exp(inj2(d,e))
rollExpDec.dec(inj1(v,e))
rollExpDec.dec(inj2(d1,d2))
• But the definitions of exp and dec are not known
outside of ExpDec, so inlining the roll’s is ill-typed!
11
Separate Compilation
• Not a problem if client of datatype defined in
same compilation unit:
– Unseal the datatype ) roll’s become well-typed
• Is a problem if client of datatype is defined in
separately compiled module:
– Datatype is an abstract import of client
– Can’t assume knowledge of implementation
– Similar problem for datatypes in functor arguments
12
A Transparent Interpretation
Making Datatypes Transparent
• Expose the implementation of a datatype as a recursive sum
type in its interface:
signature EXPDEC =
sig
type exp = m1(a,b).(var + b * a, var * a + b * b)
type dec = m2(a,b).(var + b * a, var * a + b * b)
(* in and out function specs as before *)
end
• Inlining calls to the in and out functions is now well-typed
outside of ExpDec
14
Implications of Transparency
• Datatypes are no longer generative
– Identically defined datatypes are “visibly” equal
– More types are equivalent, more programs may
typecheck
• Matching a datatype specification is harder
– To match a datatype spec, a datatype must now be
implemented as a particular recursive sum type
– Depending on how you define recursive type
equivalence, fewer programs may typecheck!
15
Transparent Matching Example
struct
datatype exp =
|
and dec =
|
end
VarExp
LetExp
ValDec
SeqDec
of
of
of
of
var
dec * exp
var * exp
dec * dec
?
:>
sig
type exp
datatype dec = ValDec of var * exp
| SeqDec of dec * dec
end
16
Transparent Matching Example
struct
type exp =
m1(a,b).(var + b * a, var * a + b * b)
type dec =
m2(a,b).(var + b * a, var * a + b * b)
end
?
:>
sig
type exp
datatype dec = ValDec of var * exp
| SeqDec of dec * dec
end
17
Transparent Matching Example
struct
type exp =
m1(a,b).(var + b * a, var * a + b * b)
type dec =
m2(a,b).(var + b * a, var * a + b * b)
end
?
:>
sig
type exp
type dec = m1(b).(var * exp + b * b)
end
18
Transparent Matching Example
struct
type exp =
m1(a,b).(var + b * a, var * a + b * b)
type dec =
m2(a,b).(var + b * a, var * a + b * b)
end
?
:>
sig
type exp
type dec = m1(b).(var * exp + b * b)
end
?
=
19
Notation
• Use  to stand for a recursive type, i.e.:
 ::= mk(a1,...,an).(t1,...,tn) (k 2 1..n)
• Expansion of a recursive type: expand()
For example, if
intlist = m a. 1 + int * a
then
expand(intlist) = 1 + int * intlist
20
Iso-Recursive Types
• Iso-recursive equivalence is purely structural:
–   expand(), but the two are isomorphic
– roll : expand() ! 
– unroll :  ! expand()
• Works fine for H-S with abstract datatypes, but…
21
Transparent Matching Example
struct
type exp =
m1(a,b).(var + b * a, var * a + b * b)
type dec =
m2(a,b).(var + b * a, var * a + b * b)
end
?
:>
X
sig
type exp
type dec = m1(b).(var * exp + b * b)
end
22
Equi-Recursive Types
• Another form of recursive type equivalence:
–  = expand()
– ma.t(a) represents unique solution of a = t(a)
–  = ma.t(a) iff  = t()
• Equi-recursive equivalence is sufficient:
– dec matches its specification
– Enables transparent interpretation to accept all valid
SML datatype matchings
23
A Hybrid Equivalence
• Equi-recursive equivalence is overkill:
– Unnecessary to equate a recursive type with a
non-recursive type (its expansion)
• Hybrid of iso- and equi-recursive equivalence:
– Based on FLINT intermediate lang. [League and Shao]
– Restriction of Amadio-Cardelli algorithm
– Only equates ’s with ’s
• Paper gives details of the hybrid algorithm, along
with formal argument that it is sufficient
25
Complications
• Strong versions of type equivalence not well studied
outside simply typed -calculus.
(TILT IL’s have h.-o. constructors, singleton kinds…)
• Conflicts with SML semantics:
– Datatypes no longer generative.
– Problems involving datatypes in sharing and
where type constraints.
– To implement SML, must handle these issues another way.
26
The Coercion Interpretation
Those in and out Functions
• Recall the definitions given during elaboration:
fun in(x) = roll(x)
fun out(x) = unroll(x)
• Consider the roll and unroll operations.
– Commonly implemented as “no-ops”.
That is, the values v and roll(v) are represented the same.
So, roll and unroll are just “retyping” operators, or
coercions.
– Untyped machine code for in/out same as for the identity function.
28
ExpDec Revisited
signature EXPDEC =
sig
type exp
type dec
val exp_in : var + (dec * exp) ->
) exp
val exp_out : exp ->
) var + (dec * exp)
val dec_in : (var * exp) + (dec * dec) ->
) dec
val dec_out : dec ->
) (var * exp) + (dec * dec)
end
• New
type constructor:
t1 ) t2 act as the identity, but:
•At
runtime,
exp_in, exp_out
onlyrecognized
by coercive from
terms the type
–– Inhabited
Cannot be
– Coerciveness of exp_in, exp_out reflected in type
– Applications can be ignored at runtime
29
Coercions
• New constructs for the internal language:
– Coercion values fold/unfold replace
roll/unroll
– Special type t1)t2 distinguishes them from functions.
– Special application syntax: v @ e
• Define in/out using coercions
val in
: expand() )  = fold
val out :  ) expand() = unfold
• Define constructor app’s using coercion app’s
VarExp(x)
à ExpDec.exp_in@(inj1(x))
30
Coercion Erasure
• Why are coercion applications better than function
applications? Because:
– A closed value of coercion type can only be fold or
unfold.
– No work is required at run time to apply either fold or
unfold.
– To compile v@e, generate the same code as for e.
• Safety argument (in the paper)
– Formalized via a translation into an untyped target
calculus.
31
Performance
•
Run times of benchmarks
under 3 interpretations.
•
Harper-Stone ¼ 37% slower
than the others
•
Coercion interpretation
about the same as
transparent.
•
Coercion interpretation is
faithful to SML semantics,
requires only simple
extension to the type theory.
32
Conclusion
Efficiency
Conformance to Meta-theoretic
SML Semantics
Simplicity
Harper-Stone



Transparent


?
Coercion



33
Download