Explaining Type Errors by Finding the Source of a Type Conflict

advertisement
Explaining Type Errors by Finding the Source of a
Type Conflict
Jun Yang
ceejy1@cee.hw.ac.uk
Department of Computing and Electrical Engineering,
Heriot-Watt University
Abstract
Typically a type error is reported when unification fails, even though the programand
mer’s
actual error may have occurred much earlier in the program. The
inference algorithms report the site where a type conflict is detected, but the
error message is isolated information: it is not clear what the relationship is between the site where error is reported and the context in which the subexpression
was typed. As a result, the error message may give little help to locate the source
of the error.
This report investigates better methods of explaining type conflicts. We aim
to find a method that may be effective even when the user has little knowledge of
type checking. The philosophy of our approach is to find sources of type errors by
reporting which parts of the program conflict, rather than isolated error sites. We
implement two new inference algorithms with this philosophy: the Unification of
Assumption Environments ( AE) and the Incremented Error Inference ( ).
1
INTRODUCTION
It is known that the place where a type error is detected is often not the place
where the type error actually originated. The source of a type error may be far
from where it is detected. This is because conventional type inference algorithm
proceed from left to right and the context is refined by the previous type-checking
processes . This is called left-to-right bias [6]. For example, after type checking
fn x map x [x +2] Standard ML of New Jersey, Version 110.0.6 reports the
x +2 as the error site. The error is the type conflicts between the monomorphic
usages of λ bound variable x in two sites, the first as a function, the second as an
integer. Even worse, when the
algorithm succeeds at every subexpression and
fails at the top application expression, for example, with (fn x x +1) ((fn y if
y then true else false) false ) , it reports the whole expression as the error site, this
large error source is not informative.
There are several type inference explanation systems that explain how the inference process concludes a type error [2, 4, 9] . But there are some inherent
problems in textual explanations:
1. Internal type variables which act as the bridge between instances, are actually
the central point of the explanation, since they are the ones that get refined, but a
programmer would not be concerned with those type variables. To understand the
explanations, it is necessary to remember from which program variable the type
variable is inferred and where it is refined. If there are more than a few type variables, it becomes impossible to remember what entity a type variable represents.
2. Sometimes the textual explanation has so much information, it rapidly becomes tedious. Experts usually find this explanation too detailed to be of real
help, though they find very valuable information about the different positions in
the programs that contribute to the type given by the tool [7].
3. The user needs sufficient knowledge about type checking to understand the
explanations.
To overcome some limitations of textual explanations, we have implemented a visualisation of polymorphic type checking [12]. An evaluation of the visualisation
shows that it has the same power as the textual representation [13]. The operations
of icons in the visualisation can be improved by better explanation algorithms .
There are also several attempts directly address the location of the source of
type errors:
1. Mitchell Wand’s approach [11] keeps a record of the pieces of code that contribute to each type deduction when type errors are detected, using the information
to explain why the errors happen. The algorithm records substitutions together
with function applications which are the causes of the substitution. A type error
consists of two types that cannot be unified, and both are derived by some substitutions: the algorithm reports each application that caused one of these substitutions
as a possible cause of the error. Where the inconsistency is found depends on the
arbitrary order of traversal of the syntax tree during type analysis. Consequently,
the number of candidate error sites is also decided by the programming style. The
system lists the possible error sites; some of them are not the direct source of the
type inconsistency. It is not clear if there is any relationship between the candidate
error sites: the user needs to check the types of the proposed error sites against
their own intentions.
2. Greg F. Johnson and Janet Walz [10] give a maximum-flow approach to decide
which usage is the most likely error source. The usage that is in the minority is
a candidate for the mistake. But for many errors there is one correct usage and
one incorrect usage and it is not clear how often the minority can be isolated, and
sometimes the minority usage may be the correct type.
3. Bernstein and Stark give an method of debugging type errors [3, 8]. This provides a method for diagnosing type errors where the user can probe the internal
type information which the ordinary inference algorithm hides. This is done manually. Sometimes it is difficult to make out the connections between each use of
subexpressions.
We aim to find a method that may be effective even when the user has little
knowledge about type checking. The philosophy of our approach is to find sources
of type errors that conflict, rather than isolated error sites. We have implemented
a previously proposed inference algorithm ( AE), Unification of Assumption
Environments [1], proposed and implemented another inference algorithm incremental error inference ( ) .
2
UNIFICATION OF ASSUMPTION ENVIRONMENTS ( AE)
2.1 Introduction
The primary advantage of the AE is that it eliminates the left-to-right bias,
and can point out all type conflicts automatically. The key idea is to type each
subexpression in an application independly. In a unification-based type inference
system, when unification fails, we have detected a type conflict. And it has been
observed that where type conflicts are detected is often far from the site of the real
error, because of the type-checking left-to-right bias. The algorithm removes the
left-to-right bias by unifying the assumption environments at the top level of AST:
in this way every subexpression is treated equally. This idea is due to Johan Agat
and Jörgen Gustavsson, Chalmers Univ of Technology, June 1999 [1]. But ours is
the first implementation of the idea. First, we show how the method works on a
simple example, Shown in Figure 1. After type checking fn x map x [ x +2]
, the
algorithm reports the error in x+2, the AE algorithm explains that the
λ-bound argument x, which is monomorphic, is used inconsistently in different
subexpressions. At the first site map x, x is required to be a function type; at the
second site [x+2] , x is required to be int type.
2.2 The AE
Algorithm
Two principles are observed when reports type errors:
1. Type error messages are reported in the context of use. The philosophy is
fn x map x [x+2]
——————– Error ——————–
Type conflicts in subexpressions:
map (x)
+(x, 2)
the common program variables have type conflicts in different sites
from the first expression
x: ’c ’d
but from the second expression
x: int
FIGURE 1.
Explanation of type conflicts of λ bound variable
that a subexpression in an expression has connections with other subexpressions
in some manner; to report the relationships of its use in context is better than to
report just an isolated subexpression as an error site.
2. Error explanations should be complete and concise; the system should highlight
the sources of type conflicts which directly contribute to the conflict but nothing
more;
In the algorithm, predefined functions such as map and operators such as +
which are free variables in an expression are excluded from the assumption environments. Their types are supplied by the type environment T E instead. It
is more efficient when the predicates with known types are excluded from the
assumption environments, because the assumption environments are smaller.
The unification procedures Uenv and Usum are explained in Figure 3, and make
sure the program variables are consistent in each subexpression . There are two
environments, the context T E which is passed downwards, the assumption environment FENV which is passed upwards during type checking.
3
3.1
INCREMENTAL ERROR INFERENCE
Introduction
We propose an incremental error inference algorithm that finds the type conflicts
in application expressions when the AE algorithms cannot. The
algorithm
fails only at a function application, and an erroneous expression is often successfully type-checked long before its consequence collides at an application. When
the
algorithm succeeds at every subexpression and fails at the top application
expression, it reports the whole expression as the error site, implying some of its
subexpressions are ill-typed. This error message is not very useful for locating
the sources of type errors.
The folklore algorithm - the
algorithm always stops earlier than the
algorithm [5] when there is a type error
and it can be used to cure the problem by
reporting a fine grain error site. The
algorithm brings the context of an expression down to its sub-or-sibling expressions. It keeps carrying a type constraint(
or an expected type) that each expression must satisfy where the expression appears. For example, in the case of application, if the required type for application
expression e1 e2 is real from the context, then the required type for e1 is β real,
and the required
type for e2 is β.
The
algorithm reports a finer grain site, at a constant, a variable, or a
lambda expression. For example, in Figure 4, the
algorithm stops at true. But
we still do not know where the type conflicts are. The reported error site is often
not the real error site.
In the implementation, we combine the
algorithm and the AE algorithm on-the-fly. When the AE algorithm fails at the top expression, the system
switches to the
algorithm,
which always stops at a site which is smaller than
the whole expression. If the
algorithm stops at an argument site, it means the
argument has a type conflict between the function and its arguments. For example,
in the example as - Figure 4, when the M algorithm stops at true, it means that
the function (fn x x + 1) has a type conflict with the type of ((fn y if y then
true
else false ) false ). We find the conflict by assuming that the site where the
algorithm found a type inconsistency is type correct, and then type checking
the function node on the AST under the assumption. Hence we can find another
conflicting site in the function application, and the reason for the conflicts. In this
way, we erase the type-checking left-to-right bias.
3.2
Example
Consider (fn x x+1) ((fn y if y then true else false ) false ) . In the example,
there are no common program variables in the subexpressions,
the
algorithm
reports the whole expression as an error site, the
algorithm reports an isolated
error site true, the AE algorithm behaves in the same way as the
algorithm.
The incremental error inference algorithm gives error explanation messages
by finding a pair of directly type conflicting sites, and showing the reasons for
their conflicts. reports the + and the subsequent application expression as
the directly conflicting usages - Figure 4. gives the reason of the conflicts
as well. The required type for the operator + is bool * int ’k, means that if the
subsequent application expression is correct, then it is assumed that the argument
of (fn x x +1) is of bool type. So the the usage of + is not correct, which means
that + should be replaced by another function or its two arguments x and 1 are
not correct usages. Alternatively, if (fn x x +1 ) is correct, then the required
type for the subsequent application expression is of int type, but the application
expression actually is of bool type.
3.3
The algorithm
The AE algorithm may be improved in the case of applications - Figure 6.
When the algorithm succeeds at every subexpression and there are no common
free variables in every subexpression, if the algorithm fails at the top level expression, then it is switched to the modified M algorithm .
The incremental type error inference algorithm for application subexpression is shown in Figure 5.
4
CONCLUSIONS
We have implemented two new type error inference algorithms, both with the philosophy of reporting conflicting sites rather than by an isolated error site. Compared to other type inference explanation methods, our methods automatically
locates the sources of type conflicts, and gives a pair of direct conflicting error
sites rather than individually isolated error sites. The reason for a type error can
be found in the relationship of the conflicting uses. Our algorithms also remove
the left-to-right type-checking bias of the
algorithm. The AE algorithm
analyses the uses of common variables in each subexpression, to see if they are
consistent in each subexpression. When there are no common variables between
subexpressions, finds the source of conflicts from the context. Both new
algorithms can be implemented in interactive programming environments.
We hope to prove the equivalence of AE and
algorithms, soundness and
completeness of AE.
5
ACKNOWLEDGMENTS
Thanks to Greg Michaelson and Phil Trinder for their great help. I also wish to
thank Joe Wells for his helpful comments on the AE algorithm.
REFERENCES
[1] Johan Agat and Jörgen Gustavsson. Personal communication. Chalmers University
of Technology, June 1999.
[2] Mike Beaven and Ryan Stansifer. Explaining type errors in polymorphic languages.
ACM Letters on Programming Languages and Systems, 2:17–30, March 1993.
[3] Karen L. Bernstein and Eugene W. Stark. Debugging type errors (full version). Technical report, State University of New York at Stony Brook, 1995.
[4] Dominic Duggan and Frederick Bent. Explaining type inference. Science of Computer Programming, 27:37–83, 1996.
[5] Oukseh Lee and Kwangkeun Yi. Proofs about a Folklore Let-polymorphic Type Inference Algorithm. ACM Transactions on Programming Languages and Systems,
20(4):707–723, 1998.
[6] Bruce J. McAdam. On the Unification of Substitutions in Type Inference. In Kevin
Hammond, Anthony J.T. Davie, and Chris Clack, editors, Implementation of Functional Languages (IFL’98), London, UK, volume 1595 of LNCS, pages 139–154.
Springer-Verlag, September 1998.
[7] Laurence Rideau and Laurent Thery. Interactive programming environment for ml.
Technical Report 3193, Institut National de Recherche en Informatique et en Automatique, March 1997.
[8] Zhong Shao and Andrew W. Appel. Smartest recompilation. In Twentieth Annual
ACM Symposium on Principles of Programming Languages, pages 439–450. ACM
Press, January 93.
[9] Helen Soosaipillai. An explanation based polymorphic type-checker for standard ml.
Master’s thesis, Department of Computer Science, Heriot-Watt University, 1990.
[10] Janet A. Walz and Gregory F. Johnson. A maximum flow approach to anomaly isolation in unification-based incremental type inference. In Conference Record of the
Thirteenth Annual ACM Symposium on Principles of Programming Languages, pages
44–57. ACM Press, January 1986.
[11] Mitchell Wand. Finding the source of type errors. In Conference Record of the
Thirteenth Annual ACM Symposium on Principles of Programming Languages, pages
38–43. ACM Press, January 1986.
[12] Jun Yang and Greg Michaelson. A visualisation of polymorphic type checking. Journal of Functional Programming, Appear.
[13] Jun Yang, Greg Michaelson, and Phil Trinder. Explaining polymorphic types through
visualisation. In S. Alexander and U. O’Reilly, editors, Proceedings of 7th Annual
Conference on the Teaching of Computing, pages 73 – 77. University of Ulster, Jordanstown, August 1999.
AE : TypeEnv TypeEnv Expression Substitution Type AssumEnv
Def AE T E DTNAMET E e =
case e of
x
if x Dom T E then
let ID be indentical substitution
σ T E x
τ Instance σ in ID τ DTNAMET E else
let γ be a new type variable, ID be indentical substitution
in ID γ x γ , )
endif
λx e let
T E T E x S1 τ1 A1 AE T E DTNAMET E x S2 τ2 A2 AE T E DTNAMET E e S Uni f yEnv S2 A1 A2 in SS2 S1 S τ1 τ2 S A2 x e1 e2 let
S1 τ1 A1 AE T E DTNAMET E e1 S2 τ2 A2 AE T E DTNAMET E e2 β be a new type variable
S3 Uni f y S2 τ1 τ2 β Senv
Uni f yEnv S1 T E S3 S2 T E Senv Uni f yEnv Senv
S3 S2 A1 ! Senv
S3 A2 in Senv Senv
S3 S2 S1 Senv Senv
S3 β Senv Senv
S 3 A1 " A2 let x = e1 in e2 let
S1 τ1 A1 AE T E DTNAMET E e1 new
DTNAMET E " A1
DTNAME
TE
new
T Enew # x : close T E " DTNAME
τ1 $
TE
new
S2 τ2 A2 AE S1 T Enew ! S1 DTNAME
! e2 TE
Senv Uni f yEnv A1 A2 in Senv S2 S1 Senv τ2 Senv S2 A1 " A2 endcase
FIGURE 2.
The %
AE
Algorithm
simpli f y & τ α '() α τ $*+ α τ $ means to unify α and τ
simpli f y &+ τ τ $' 0/
simpli f y ) τ1 τ1 τ2 τ2 $, simpli f y &) τ1 τ2 $+- simpli f y ) τ1 τ2 $
simpli f y &+ τ τ./-102/ simplity τ τ3)- simpli f y 402
uni f y 506 S 7 let 0 simpli f y 402
in case 0 of
φ S
) α α8-90 let S # α α3
in uni f y S 350 : S ; S ) α τ $/-10 if α τ then fail
else let S ( α τ in uni f y S 350 <! S ; S < Def S ; S means apply substitution S first then apply substitution S.
Def S 5027(0>= ? αα @ dom A:BDC S α $
Def Uni f y 502 uni f y 506 ID Def Instance ? α1 E EE αn . t1 EE E tm let β1 E EE βn are fresh type variables
S #) β1 α1 $/- E EE -F+ βn αn $
in S t1 EE E tm Def Uni f yEnv F1 F2 / Uni f y & instance F1 x instance F2 x = x dom F1 +G dom F2 $
FIGURE 3.
The Uni f y and Uni f yEnv functions in Figure 2
(fn x x+1) ((fn y if y then true else false )
false )
——————–Possible Error Site ——————–
Error in subexpression:
+
Error at use of the operator.
required type: bool * int ’k
’l
operator type: ’l * ’l ——————–Possible Error Site ——————–
Error in subexpression:
(fn y if y then true else false)( false)
Type inconsistent with requirement at the expression.
required type: int
actual type : bool
FIGURE 4.
Explanation of type conflicts of application by HIJH .
Def (T E , e) =
case e of
e1 e2 handle fail ( AE , e1 ) or e1 is not a function,
app ( T E , e1 te2 β) to find
more detailed type error reasons for e1 ;
handle fail ( AE e ) app ( T E , e, β)
handle fail ( app , e2 ) reports the type conflicts in e1 by assuming
the usage of e2 is correct ;
reports the type conflicts in e2 by assuming
the usage of e1 is correct ;
e1 e2 e3 /
handle fail ( AE e ) app ( T E , e)
handle fail ( app , e2 ) app ( T E , e1 , te2 );
handle fail ( app , e3 ) ( app ( T E , e2 , te3 );
app ( T E , e1 , te ) )
3
endcase
FIGURE 5.
The incremental type error inference HIJH for application.
AE: TypeEnv TypeEnv Expression Substitution Type AssumEnv
Def AE(T E DTNAMET E e =
case e of
e1 e2 let
S1 τ1 A1 K AE(T E DTNAMET E e1 β β1 β2 are new type variables
S1 ’ = Uni f y τ1 β1 β2 handle Unify let T E ’ T E " A1
in app T E ’ e β S2 τ2 A2 K AE(T E DTNAMET E e2 S3 Uni f y S2 τ1 τ2 β handle Unify letT
E ’ = T E " A1 " A2
in app T E ’ e β Senv
Uni f yEnv S1 T E S3 S2 T E S3 S2 A1 Senv
S 3 A2 Senv Uni f yEnv Senv
S3 S2 S1 Senv Senv
S3 β Senv Senv
S 3 A1 " A2 in Senv Senv
endcase
FIGURE 6.
Switch to L
app
in application subexpression.
Download