Typed Compilation of Recursive Datatypes Joseph C. Vanderwaart, Derek Dreyer, Leaf Petersen, Karl Crary, Robert Harper, and Perry Cheng Carnegie Mellon University TLDI 2003 SML Datatypes • Elegant mechanism for defining recursive variant types, such as: datatype intlist = Nil | Cons of int * intlist • Important that constructor applications and pattern matching should be implemented efficiently • Subject of this talk: – How to implement SML datatypes efficiently in a type-preserving compiler 2 Formal Framework • Harper and Stone’s type-theoretic interpretation of Standard ML: – “Elaborates” SML programs into a type theory • Reasons for using HS: – Models first phase of type-preserving compiler, in particular the TILT compiler (developed at CMU) – Can explain datatype semantics in terms of type theory 3 Overview • Three interpretations of datatypes: – Harper-Stone interpretation – Transparent interpretation – Coercion interpretation • Comparison on three axes: – Efficiency – Fidelity to the Definition of SML – Meta-theoretic complexity 4 The Harper-Stone Interpretation Datatype Semantics • SML datatypes are generative: – Identical datatype declarations in separate modules yield distinct (abstract) types • HS elaborates datatypes as modules providing: – The datatype itself defined as a recursive sum type – Functions to construct and destruct values of the datatype • HS models generativity by “sealing” the datatype module with an abstract signature 6 ExpDec Example datatype exp = | and dec = | VarExp LetExp ValDec SeqDec VarExp(v) LetExp(d,e) ValDec(v,e) SeqDec(d1,d2) ¼ ¼ ¼ ¼ of of of of var dec * exp var * exp dec * dec “v” “let d in e” “val v = e” “d1; d2” 7 ExpDec Implementation structure ExpDec :> EXPDEC = struct type exp = m1(a,b).(var + b * a, var * a + b * b) type dec = m2(a,b).(var + b * a, var * a + b * b) fun exp_in x = rollexp(x) fun exp_out x = unrollexp(x) fun dec_in x = rolldec(x) fun dec_out x = unrolldec(x) end 8 ExpDec Interface signature EXPDEC = sig type exp type dec val exp_in : var + (dec * exp) -> exp val exp_out : exp -> var + (dec * exp) val dec_in : (var * exp) + (dec * dec) -> dec val dec_out : dec -> (var * exp) + (dec * dec) end 9 Elaborating Constructor Calls • Client of the datatype does the injection into the sum, then calls the datatype’s “in” function: Ã LetExp(d,e) Ã ValDec(v,e) Ã SeqDec(d1,d2) Ã VarExp(v) ExpDec.exp_in(inj1(v)) ExpDec.exp_in(inj2(d,e)) ExpDec.dec_in(inj1(v,e)) ExpDec.dec_in(inj2(d1,d2)) • But the cost of function calls to the in functions is too expensive. 10 Inlining the Constructor Calls • We would like to inline the roll’s to avoid calling the exp_in and dec_in functions: Ã LetExp(d,e) Ã ValDec(v,e) Ã SeqDec(d1,d2) Ã VarExp(v) rollExpDec.exp(inj1(v)) rollExpDec.exp(inj2(d,e)) rollExpDec.dec(inj1(v,e)) rollExpDec.dec(inj2(d1,d2)) • But the definitions of exp and dec are not known outside of ExpDec, so inlining the roll’s is ill-typed! 11 Separate Compilation • Not a problem if client of datatype defined in same compilation unit: – Unseal the datatype ) roll’s become well-typed • Is a problem if client of datatype is defined in separately compiled module: – Datatype is an abstract import of client – Can’t assume knowledge of implementation – Similar problem for datatypes in functor arguments 12 A Transparent Interpretation Making Datatypes Transparent • Expose the implementation of a datatype as a recursive sum type in its interface: signature EXPDEC = sig type exp = m1(a,b).(var + b * a, var * a + b * b) type dec = m2(a,b).(var + b * a, var * a + b * b) (* in and out function specs as before *) end • Inlining calls to the in and out functions is now well-typed outside of ExpDec 14 Implications of Transparency • Datatypes are no longer generative – Identically defined datatypes are “visibly” equal – More types are equivalent, more programs may typecheck • Matching a datatype specification is harder – To match a datatype spec, a datatype must now be implemented as a particular recursive sum type – Depending on how you define recursive type equivalence, fewer programs may typecheck! 15 Transparent Matching Example struct datatype exp = | and dec = | end VarExp LetExp ValDec SeqDec of of of of var dec * exp var * exp dec * dec ? :> sig type exp datatype dec = ValDec of var * exp | SeqDec of dec * dec end 16 Transparent Matching Example struct type exp = m1(a,b).(var + b * a, var * a + b * b) type dec = m2(a,b).(var + b * a, var * a + b * b) end ? :> sig type exp datatype dec = ValDec of var * exp | SeqDec of dec * dec end 17 Transparent Matching Example struct type exp = m1(a,b).(var + b * a, var * a + b * b) type dec = m2(a,b).(var + b * a, var * a + b * b) end ? :> sig type exp type dec = m1(b).(var * exp + b * b) end 18 Transparent Matching Example struct type exp = m1(a,b).(var + b * a, var * a + b * b) type dec = m2(a,b).(var + b * a, var * a + b * b) end ? :> sig type exp type dec = m1(b).(var * exp + b * b) end ? = 19 Notation • Use to stand for a recursive type, i.e.: ::= mk(a1,...,an).(t1,...,tn) (k 2 1..n) • Expansion of a recursive type: expand() For example, if intlist = m a. 1 + int * a then expand(intlist) = 1 + int * intlist 20 Iso-Recursive Types • Iso-recursive equivalence is purely structural: – expand(), but the two are isomorphic – roll : expand() ! – unroll : ! expand() • Works fine for H-S with abstract datatypes, but… 21 Transparent Matching Example struct type exp = m1(a,b).(var + b * a, var * a + b * b) type dec = m2(a,b).(var + b * a, var * a + b * b) end ? :> X sig type exp type dec = m1(b).(var * exp + b * b) end 22 Equi-Recursive Types • Another form of recursive type equivalence: – = expand() – ma.t(a) represents unique solution of a = t(a) – = ma.t(a) iff = t() • Equi-recursive equivalence is sufficient: – dec matches its specification – Enables transparent interpretation to accept all valid SML datatype matchings 23 A Hybrid Equivalence • Equi-recursive equivalence is overkill: – Unnecessary to equate a recursive type with a non-recursive type (its expansion) • Hybrid of iso- and equi-recursive equivalence: – Based on FLINT intermediate lang. [League and Shao] – Restriction of Amadio-Cardelli algorithm – Only equates ’s with ’s • Paper gives details of the hybrid algorithm, along with formal argument that it is sufficient 25 Complications • Strong versions of type equivalence not well studied outside simply typed -calculus. (TILT IL’s have h.-o. constructors, singleton kinds…) • Conflicts with SML semantics: – Datatypes no longer generative. – Problems involving datatypes in sharing and where type constraints. – To implement SML, must handle these issues another way. 26 The Coercion Interpretation Those in and out Functions • Recall the definitions given during elaboration: fun in(x) = roll(x) fun out(x) = unroll(x) • Consider the roll and unroll operations. – Commonly implemented as “no-ops”. That is, the values v and roll(v) are represented the same. So, roll and unroll are just “retyping” operators, or coercions. – Untyped machine code for in/out same as for the identity function. 28 ExpDec Revisited signature EXPDEC = sig type exp type dec val exp_in : var + (dec * exp) -> ) exp val exp_out : exp -> ) var + (dec * exp) val dec_in : (var * exp) + (dec * dec) -> ) dec val dec_out : dec -> ) (var * exp) + (dec * dec) end • New type constructor: t1 ) t2 act as the identity, but: •At runtime, exp_in, exp_out onlyrecognized by coercive from terms the type –– Inhabited Cannot be – Coerciveness of exp_in, exp_out reflected in type – Applications can be ignored at runtime 29 Coercions • New constructs for the internal language: – Coercion values fold/unfold replace roll/unroll – Special type t1)t2 distinguishes them from functions. – Special application syntax: v @ e • Define in/out using coercions val in : expand() ) = fold val out : ) expand() = unfold • Define constructor app’s using coercion app’s VarExp(x) Ã ExpDec.exp_in@(inj1(x)) 30 Coercion Erasure • Why are coercion applications better than function applications? Because: – A closed value of coercion type can only be fold or unfold. – No work is required at run time to apply either fold or unfold. – To compile v@e, generate the same code as for e. • Safety argument (in the paper) – Formalized via a translation into an untyped target calculus. 31 Performance • Run times of benchmarks under 3 interpretations. • Harper-Stone ¼ 37% slower than the others • Coercion interpretation about the same as transparent. • Coercion interpretation is faithful to SML semantics, requires only simple extension to the type theory. 32 Conclusion Efficiency Conformance to Meta-theoretic SML Semantics Simplicity Harper-Stone Transparent ? Coercion 33