Explaining Type Errors by Finding the Source of a Type Conflict Jun Yang ceejy1@cee.hw.ac.uk Department of Computing and Electrical Engineering, Heriot-Watt University Abstract Typically a type error is reported when unification fails, even though the programand mer’s actual error may have occurred much earlier in the program. The inference algorithms report the site where a type conflict is detected, but the error message is isolated information: it is not clear what the relationship is between the site where error is reported and the context in which the subexpression was typed. As a result, the error message may give little help to locate the source of the error. This report investigates better methods of explaining type conflicts. We aim to find a method that may be effective even when the user has little knowledge of type checking. The philosophy of our approach is to find sources of type errors by reporting which parts of the program conflict, rather than isolated error sites. We implement two new inference algorithms with this philosophy: the Unification of Assumption Environments ( AE) and the Incremented Error Inference ( ). 1 INTRODUCTION It is known that the place where a type error is detected is often not the place where the type error actually originated. The source of a type error may be far from where it is detected. This is because conventional type inference algorithm proceed from left to right and the context is refined by the previous type-checking processes . This is called left-to-right bias [6]. For example, after type checking fn x map x [x +2] Standard ML of New Jersey, Version 110.0.6 reports the x +2 as the error site. The error is the type conflicts between the monomorphic usages of λ bound variable x in two sites, the first as a function, the second as an integer. Even worse, when the algorithm succeeds at every subexpression and fails at the top application expression, for example, with (fn x x +1) ((fn y if y then true else false) false ) , it reports the whole expression as the error site, this large error source is not informative. There are several type inference explanation systems that explain how the inference process concludes a type error [2, 4, 9] . But there are some inherent problems in textual explanations: 1. Internal type variables which act as the bridge between instances, are actually the central point of the explanation, since they are the ones that get refined, but a programmer would not be concerned with those type variables. To understand the explanations, it is necessary to remember from which program variable the type variable is inferred and where it is refined. If there are more than a few type variables, it becomes impossible to remember what entity a type variable represents. 2. Sometimes the textual explanation has so much information, it rapidly becomes tedious. Experts usually find this explanation too detailed to be of real help, though they find very valuable information about the different positions in the programs that contribute to the type given by the tool [7]. 3. The user needs sufficient knowledge about type checking to understand the explanations. To overcome some limitations of textual explanations, we have implemented a visualisation of polymorphic type checking [12]. An evaluation of the visualisation shows that it has the same power as the textual representation [13]. The operations of icons in the visualisation can be improved by better explanation algorithms . There are also several attempts directly address the location of the source of type errors: 1. Mitchell Wand’s approach [11] keeps a record of the pieces of code that contribute to each type deduction when type errors are detected, using the information to explain why the errors happen. The algorithm records substitutions together with function applications which are the causes of the substitution. A type error consists of two types that cannot be unified, and both are derived by some substitutions: the algorithm reports each application that caused one of these substitutions as a possible cause of the error. Where the inconsistency is found depends on the arbitrary order of traversal of the syntax tree during type analysis. Consequently, the number of candidate error sites is also decided by the programming style. The system lists the possible error sites; some of them are not the direct source of the type inconsistency. It is not clear if there is any relationship between the candidate error sites: the user needs to check the types of the proposed error sites against their own intentions. 2. Greg F. Johnson and Janet Walz [10] give a maximum-flow approach to decide which usage is the most likely error source. The usage that is in the minority is a candidate for the mistake. But for many errors there is one correct usage and one incorrect usage and it is not clear how often the minority can be isolated, and sometimes the minority usage may be the correct type. 3. Bernstein and Stark give an method of debugging type errors [3, 8]. This provides a method for diagnosing type errors where the user can probe the internal type information which the ordinary inference algorithm hides. This is done manually. Sometimes it is difficult to make out the connections between each use of subexpressions. We aim to find a method that may be effective even when the user has little knowledge about type checking. The philosophy of our approach is to find sources of type errors that conflict, rather than isolated error sites. We have implemented a previously proposed inference algorithm ( AE), Unification of Assumption Environments [1], proposed and implemented another inference algorithm incremental error inference ( ) . 2 UNIFICATION OF ASSUMPTION ENVIRONMENTS ( AE) 2.1 Introduction The primary advantage of the AE is that it eliminates the left-to-right bias, and can point out all type conflicts automatically. The key idea is to type each subexpression in an application independly. In a unification-based type inference system, when unification fails, we have detected a type conflict. And it has been observed that where type conflicts are detected is often far from the site of the real error, because of the type-checking left-to-right bias. The algorithm removes the left-to-right bias by unifying the assumption environments at the top level of AST: in this way every subexpression is treated equally. This idea is due to Johan Agat and Jörgen Gustavsson, Chalmers Univ of Technology, June 1999 [1]. But ours is the first implementation of the idea. First, we show how the method works on a simple example, Shown in Figure 1. After type checking fn x map x [ x +2] , the algorithm reports the error in x+2, the AE algorithm explains that the λ-bound argument x, which is monomorphic, is used inconsistently in different subexpressions. At the first site map x, x is required to be a function type; at the second site [x+2] , x is required to be int type. 2.2 The AE Algorithm Two principles are observed when reports type errors: 1. Type error messages are reported in the context of use. The philosophy is fn x map x [x+2] ——————– Error ——————– Type conflicts in subexpressions: map (x) +(x, 2) the common program variables have type conflicts in different sites from the first expression x: ’c ’d but from the second expression x: int FIGURE 1. Explanation of type conflicts of λ bound variable that a subexpression in an expression has connections with other subexpressions in some manner; to report the relationships of its use in context is better than to report just an isolated subexpression as an error site. 2. Error explanations should be complete and concise; the system should highlight the sources of type conflicts which directly contribute to the conflict but nothing more; In the algorithm, predefined functions such as map and operators such as + which are free variables in an expression are excluded from the assumption environments. Their types are supplied by the type environment T E instead. It is more efficient when the predicates with known types are excluded from the assumption environments, because the assumption environments are smaller. The unification procedures Uenv and Usum are explained in Figure 3, and make sure the program variables are consistent in each subexpression . There are two environments, the context T E which is passed downwards, the assumption environment FENV which is passed upwards during type checking. 3 3.1 INCREMENTAL ERROR INFERENCE Introduction We propose an incremental error inference algorithm that finds the type conflicts in application expressions when the AE algorithms cannot. The algorithm fails only at a function application, and an erroneous expression is often successfully type-checked long before its consequence collides at an application. When the algorithm succeeds at every subexpression and fails at the top application expression, it reports the whole expression as the error site, implying some of its subexpressions are ill-typed. This error message is not very useful for locating the sources of type errors. The folklore algorithm - the algorithm always stops earlier than the algorithm [5] when there is a type error and it can be used to cure the problem by reporting a fine grain error site. The algorithm brings the context of an expression down to its sub-or-sibling expressions. It keeps carrying a type constraint( or an expected type) that each expression must satisfy where the expression appears. For example, in the case of application, if the required type for application expression e1 e2 is real from the context, then the required type for e1 is β real, and the required type for e2 is β. The algorithm reports a finer grain site, at a constant, a variable, or a lambda expression. For example, in Figure 4, the algorithm stops at true. But we still do not know where the type conflicts are. The reported error site is often not the real error site. In the implementation, we combine the algorithm and the AE algorithm on-the-fly. When the AE algorithm fails at the top expression, the system switches to the algorithm, which always stops at a site which is smaller than the whole expression. If the algorithm stops at an argument site, it means the argument has a type conflict between the function and its arguments. For example, in the example as - Figure 4, when the M algorithm stops at true, it means that the function (fn x x + 1) has a type conflict with the type of ((fn y if y then true else false ) false ). We find the conflict by assuming that the site where the algorithm found a type inconsistency is type correct, and then type checking the function node on the AST under the assumption. Hence we can find another conflicting site in the function application, and the reason for the conflicts. In this way, we erase the type-checking left-to-right bias. 3.2 Example Consider (fn x x+1) ((fn y if y then true else false ) false ) . In the example, there are no common program variables in the subexpressions, the algorithm reports the whole expression as an error site, the algorithm reports an isolated error site true, the AE algorithm behaves in the same way as the algorithm. The incremental error inference algorithm gives error explanation messages by finding a pair of directly type conflicting sites, and showing the reasons for their conflicts. reports the + and the subsequent application expression as the directly conflicting usages - Figure 4. gives the reason of the conflicts as well. The required type for the operator + is bool * int ’k, means that if the subsequent application expression is correct, then it is assumed that the argument of (fn x x +1) is of bool type. So the the usage of + is not correct, which means that + should be replaced by another function or its two arguments x and 1 are not correct usages. Alternatively, if (fn x x +1 ) is correct, then the required type for the subsequent application expression is of int type, but the application expression actually is of bool type. 3.3 The algorithm The AE algorithm may be improved in the case of applications - Figure 6. When the algorithm succeeds at every subexpression and there are no common free variables in every subexpression, if the algorithm fails at the top level expression, then it is switched to the modified M algorithm . The incremental type error inference algorithm for application subexpression is shown in Figure 5. 4 CONCLUSIONS We have implemented two new type error inference algorithms, both with the philosophy of reporting conflicting sites rather than by an isolated error site. Compared to other type inference explanation methods, our methods automatically locates the sources of type conflicts, and gives a pair of direct conflicting error sites rather than individually isolated error sites. The reason for a type error can be found in the relationship of the conflicting uses. Our algorithms also remove the left-to-right type-checking bias of the algorithm. The AE algorithm analyses the uses of common variables in each subexpression, to see if they are consistent in each subexpression. When there are no common variables between subexpressions, finds the source of conflicts from the context. Both new algorithms can be implemented in interactive programming environments. We hope to prove the equivalence of AE and algorithms, soundness and completeness of AE. 5 ACKNOWLEDGMENTS Thanks to Greg Michaelson and Phil Trinder for their great help. I also wish to thank Joe Wells for his helpful comments on the AE algorithm. REFERENCES [1] Johan Agat and Jörgen Gustavsson. Personal communication. Chalmers University of Technology, June 1999. [2] Mike Beaven and Ryan Stansifer. Explaining type errors in polymorphic languages. ACM Letters on Programming Languages and Systems, 2:17–30, March 1993. [3] Karen L. Bernstein and Eugene W. Stark. Debugging type errors (full version). Technical report, State University of New York at Stony Brook, 1995. [4] Dominic Duggan and Frederick Bent. Explaining type inference. Science of Computer Programming, 27:37–83, 1996. [5] Oukseh Lee and Kwangkeun Yi. Proofs about a Folklore Let-polymorphic Type Inference Algorithm. ACM Transactions on Programming Languages and Systems, 20(4):707–723, 1998. [6] Bruce J. McAdam. On the Unification of Substitutions in Type Inference. In Kevin Hammond, Anthony J.T. Davie, and Chris Clack, editors, Implementation of Functional Languages (IFL’98), London, UK, volume 1595 of LNCS, pages 139–154. Springer-Verlag, September 1998. [7] Laurence Rideau and Laurent Thery. Interactive programming environment for ml. Technical Report 3193, Institut National de Recherche en Informatique et en Automatique, March 1997. [8] Zhong Shao and Andrew W. Appel. Smartest recompilation. In Twentieth Annual ACM Symposium on Principles of Programming Languages, pages 439–450. ACM Press, January 93. [9] Helen Soosaipillai. An explanation based polymorphic type-checker for standard ml. Master’s thesis, Department of Computer Science, Heriot-Watt University, 1990. [10] Janet A. Walz and Gregory F. Johnson. A maximum flow approach to anomaly isolation in unification-based incremental type inference. In Conference Record of the Thirteenth Annual ACM Symposium on Principles of Programming Languages, pages 44–57. ACM Press, January 1986. [11] Mitchell Wand. Finding the source of type errors. In Conference Record of the Thirteenth Annual ACM Symposium on Principles of Programming Languages, pages 38–43. ACM Press, January 1986. [12] Jun Yang and Greg Michaelson. A visualisation of polymorphic type checking. Journal of Functional Programming, Appear. [13] Jun Yang, Greg Michaelson, and Phil Trinder. Explaining polymorphic types through visualisation. In S. Alexander and U. O’Reilly, editors, Proceedings of 7th Annual Conference on the Teaching of Computing, pages 73 – 77. University of Ulster, Jordanstown, August 1999. AE : TypeEnv TypeEnv Expression Substitution Type AssumEnv Def AE T E DTNAMET E e = case e of x if x Dom T E then let ID be indentical substitution σ T E x τ Instance σ in ID τ DTNAMET E else let γ be a new type variable, ID be indentical substitution in ID γ x γ , ) endif λx e let T E T E x S1 τ1 A1 AE T E DTNAMET E x S2 τ2 A2 AE T E DTNAMET E e S Uni f yEnv S2 A1 A2 in SS2 S1 S τ1 τ2 S A2 x e1 e2 let S1 τ1 A1 AE T E DTNAMET E e1 S2 τ2 A2 AE T E DTNAMET E e2 β be a new type variable S3 Uni f y S2 τ1 τ2 β Senv Uni f yEnv S1 T E S3 S2 T E Senv Uni f yEnv Senv S3 S2 A1 ! Senv S3 A2 in Senv Senv S3 S2 S1 Senv Senv S3 β Senv Senv S 3 A1 " A2 let x = e1 in e2 let S1 τ1 A1 AE T E DTNAMET E e1 new DTNAMET E " A1 DTNAME TE new T Enew # x : close T E " DTNAME τ1 $ TE new S2 τ2 A2 AE S1 T Enew ! S1 DTNAME ! e2 TE Senv Uni f yEnv A1 A2 in Senv S2 S1 Senv τ2 Senv S2 A1 " A2 endcase FIGURE 2. The % AE Algorithm simpli f y & τ α '() α τ $*+ α τ $ means to unify α and τ simpli f y &+ τ τ $' 0/ simpli f y ) τ1 τ1 τ2 τ2 $, simpli f y &) τ1 τ2 $+- simpli f y ) τ1 τ2 $ simpli f y &+ τ τ./-102/ simplity τ τ3)- simpli f y 402 uni f y 506 S 7 let 0 simpli f y 402 in case 0 of φ S ) α α8-90 let S # α α3 in uni f y S 350 : S ; S ) α τ $/-10 if α τ then fail else let S ( α τ in uni f y S 350 <! S ; S < Def S ; S means apply substitution S first then apply substitution S. Def S 5027(0>= ? αα @ dom A:BDC S α $ Def Uni f y 502 uni f y 506 ID Def Instance ? α1 E EE αn . t1 EE E tm let β1 E EE βn are fresh type variables S #) β1 α1 $/- E EE -F+ βn αn $ in S t1 EE E tm Def Uni f yEnv F1 F2 / Uni f y & instance F1 x instance F2 x = x dom F1 +G dom F2 $ FIGURE 3. The Uni f y and Uni f yEnv functions in Figure 2 (fn x x+1) ((fn y if y then true else false ) false ) ——————–Possible Error Site ——————– Error in subexpression: + Error at use of the operator. required type: bool * int ’k ’l operator type: ’l * ’l ——————–Possible Error Site ——————– Error in subexpression: (fn y if y then true else false)( false) Type inconsistent with requirement at the expression. required type: int actual type : bool FIGURE 4. Explanation of type conflicts of application by HIJH . Def (T E , e) = case e of e1 e2 handle fail ( AE , e1 ) or e1 is not a function, app ( T E , e1 te2 β) to find more detailed type error reasons for e1 ; handle fail ( AE e ) app ( T E , e, β) handle fail ( app , e2 ) reports the type conflicts in e1 by assuming the usage of e2 is correct ; reports the type conflicts in e2 by assuming the usage of e1 is correct ; e1 e2 e3 / handle fail ( AE e ) app ( T E , e) handle fail ( app , e2 ) app ( T E , e1 , te2 ); handle fail ( app , e3 ) ( app ( T E , e2 , te3 ); app ( T E , e1 , te ) ) 3 endcase FIGURE 5. The incremental type error inference HIJH for application. AE: TypeEnv TypeEnv Expression Substitution Type AssumEnv Def AE(T E DTNAMET E e = case e of e1 e2 let S1 τ1 A1 K AE(T E DTNAMET E e1 β β1 β2 are new type variables S1 ’ = Uni f y τ1 β1 β2 handle Unify let T E ’ T E " A1 in app T E ’ e β S2 τ2 A2 K AE(T E DTNAMET E e2 S3 Uni f y S2 τ1 τ2 β handle Unify letT E ’ = T E " A1 " A2 in app T E ’ e β Senv Uni f yEnv S1 T E S3 S2 T E S3 S2 A1 Senv S 3 A2 Senv Uni f yEnv Senv S3 S2 S1 Senv Senv S3 β Senv Senv S 3 A1 " A2 in Senv Senv endcase FIGURE 6. Switch to L app in application subexpression.