[laughter] >> Shaz Qadeer: Okay, alright, it’s my great pleasure to welcome Thomas Wies and Zvonimir Pavlinovic, who is sitting over there. A lot of you have … probably already know Thomas; he has been a visitor at MSR many times. After he graduated from Andreas Podelski’s group, he joined New York University a few years ago. He works on decision procedures, automated theorem-proving problems, and program analysis, and Zvonimir is his PhD student—working together for about two years—and today, you’re gonna get two talks for the price of one. Thomas will go first, and that will be followed by a presentation from Zvonimir. Okay? >> Thomas Wies: Okay, thanks a lot for the introduction and the invitation. Yeah, today, I’m going to talk about application of formal methods to a software engineering problem—namely, automated debugging—and so you will get two talks related to that, but in a slightly different context. So think this is a quite nice application domain for formal methods, so if you look about what’s the time that debuggers actually spend on debugging, then there have been some studies on this, and roughly fifty percent of the total time is actually just sort of wasted on fixing bugs and program understanding bugs. So I think that this is a problem where automated tools can have a high practical impact. And so just for the overview: so I’m going to … so there will be two parts of this talk, and I’ll start with the first part, which will be about a technique I call fault abstraction—so that’s a technique for slicing error traces for imperative programs using certain static techniques—and then in the second part, Zvonimir will talk about finding type error sources in functional programs or actually, any programming language that supports automated type inference. So just to get you started, here’s a little example program; you don’t really have to understand what’s going on here. So this is a faulty implementation of a shell sort procedure, and so this is a … this program just takes a sequence of integers as input and supposed to return the sorted sequence as output. And well, if you run this program—say—with the input sequence fourteen, eleven, then what it will return is zero, eleven, so it’s a sorted sequence, but it’s not a permutation of the input. And so what’s going on here? Well, it turns out there is a call to the actual sorting routine here that takes the array that is supposed to be sorted, and it’s supposed to take the length of the array, but well, the programmer actually provided—well—the number of command line arguments, and if you’re familiar with C programs, that actually includes the name of the programs. So essentially, the length here is off by one, and that’s why there will be some array out of bounds excerpt sort of pulling in some bad value. Now ideally, what you want to do is do some sort of root cause analysis, where … well, ideally we just want to point the programmer to this line in the program, saying, “Okay, so here’s what you have to do to fix the program.” Now, when you look at what people have done in this context, then actually, most of the work has been on the sort of dynamic range of techniques; so people have used testing to essentially compare failing and successful runs of the program, trying to narrow down the points where things go wrong, and essentially then, just provide some kind of ranking of suspicious statements in the program, where sort of the most likely soil of the error is ranked first, and then you just get this list. And well, the problem with this is that, well, for this to actually work well, you really need a good test suits. And well, if you don’t have good test suits, then the quality of these techniques really degrades quickly. And the alternative is, of course, to not do testing-based techniques, but actually use a static analysis of the error traces. And so here, essentially, we are just going to look at a single failing execution of the program, and then, just trying to see what can we learn from that single execution trying to help the programmer figuring out what’s going on. And so the advantage of this is that we won’t actually need any test suits to get some good results out of this. There’s another reason why static techniques are interesting; of course, sort of just providing these rankings of suspicious statements, it’s sort of unclear whether this really helps the programmer figuring out what’s going on—I mean, just having sort of a single statement out of context, then the programmer still has to somehow see what … why is that related to the bug? So what’s really going on here? And so in practice, you see that, well, when programmers fix bugs, often, they don’t really understand the true cause of the bug, and actually introducing more bugs in the process when they try to fix the program. So perhaps, what we can really do is trying to provide some kind of explanation of what’s going on in the failing trace, so the programmer has some better idea of how to actually fix the bug. So it’s not just about finding this fault localization, but more about providing an explanation of the bug. So that’s what fault abstraction is going to give you. So it’s a technique that essentially takes an error trace as input—so the error trace is the actual failing program execution, so a sequence of statements in your program—then an input state and an expected output state. Okay? And well, you can generalize this to, say, having a pre-condition and an—say—a post-condition you want to verif … satisfy. So this is … so the same … the technique works … generalizes to sort of symbolic representations of inputs and outputs. And so then, what the technique produces is what I call an abstract slice, and that’s essentially a subtrace of the input trace—so essentially, a subsequence of the original statements that are relevant to understanding the bug—and in addition, we get assertions explaining sort of what’s going on in between these relevant statements, sort of giving information about the state of the program that is useful for understanding what is going on. So if you go back to our shell sort program, so this is sort of what this technique produces, so it produces this sequence of statements in the program and these assertions that hold in between. And so why is this useful? Well, there’s a bunch of information you can actually extract from this; so for instance, if you go here—so this is the actual, sort of, main loop of the sorting routine—so there’s this outer do while loop here, and if you look at these assertions, so we can see—for instance—so here, we have this … it says that h must be one. And if we look at the loop condition, so we can now extract that essentially, the error sort of happens in the last iteration of the outer loop. So this is already provided by the assertions here. And similarly, so we have this outer for loop here, inside of the do while loop, and so we have to loop while i is smaller than size, and here we have i is two and size is three, so we know that this is the last iteration of the outer for loop and so forth. And then we can see: so here, we have the expected output, and—sorry, this is the final output we actually get—and so we can see sort of, now, how actually this array cell here received that value zero—namely, it was because of these two updates here. So we set a of j to v, and well, v is zero and j is zero, so this is where this value is written, and so you can see: so v is this—the value that we assign—and it’s read from the ith entry in the array, and so here, we can see that i is two and a of two is zero, and that’s sort of how we pulled in this wrong value. So this is sort of the point where the actual read happens. And then we can see, well, this is because we have that i is two and our array—so we are sorting an array of size two—so we—actually— we are reading out of … outside of the bounds of the array. So this is where the out-of-bounds read happens, and well, the reason for this is that size is three and it’s supposed to be just two, so we know when we assign size … and this is the true cause of the error. So you can see we get a lot of information out of this abstract slice here, and in particular, we can sort of use this information, for instance, to steer a debugger, right? So we don’t necessarily present this to the user, but we can—for instance—extract information about where to set break points inside the program, where … which variables to watch. So just by looking at this … these formulas, we can use that information to help the programmer debug the program. Yes? >>: I understand it seems that the … I mean, if we’re relying on interpreting formulas, formulas are so nondirectional, right? So the error could have been that, well, the size was just fine, but you didn’t make the array big enough. >> Thomas Wies: Sure, yeah. >>: So how do you determine which …? >> Thomas Wies: Well, it’s … in general, of course, you cannot just by looking at a single trace. So the point here is really just to help the programmer understand what’s really happening in that trace and sort of trying to eliminate all the information that is irrelevant for understanding how—say—a specific output was generated. Right? So it’s really not … it’s less about localizing faults in the program, but rather explaining behavior in a faulty trace. >>: Are you able to generalize for single credits, because it’s very hard to understand which conditions are relevant to a error, right? Presumably. >> Thomas Wies: Yeah, so I mean, in principle, you can also look at … so we … I’ll … if I have time, I’ll talk about an application where, actually, we do something like that with … we look at more like a program fragment, trying to un … to explain certain behavior in that fragment. And also, so I’m going to ignore certain issues—for instance, how to deal with wrong branching conditions—and all of this can be handled in … sort of using slightly different encodings and so on. Yeah? >>: And you assume that an execution is already given to you. >> Thomas Wies: That’s right. >>: Okay. >> Thomas Wies: Yeah, so can just be a failing test case or—say—a trace generated by a static analysis. Okay, so let me explain you the basic idea behind this approach, and so I’m using this very simple model, where I’m explaining this in terms of finite automata. Okay, so—say—look at this automaton here, and this is a word that is accepted by this automaton, so you read a; you then bcd, and a, and three times b, and a—so this is the accepting state. And well, one thing we can do is we can sort of slice away these sub-words here that correspond to traversing these loops in the automaton, right? So then we get this simpler word here, and that’s, of course, still a word that’s accepted by the automaton. So we can think of this simpler word as being a … an explanation why this longer word is accepting—right—so we can say, “Okay, the longer one is accepted essentially because it has three a’s in it.” And that’s essentially the idea we’re going to use for the fault abstraction. So how does this relate to programs? Well now, what we’re doing is: essentially, we think of programs as automata—right—so we have the error traces essentially as finite words of program statements. So what the … a program is essentially an automaton that accepts error traces, and the fault localization, then, is this very simple idea of taking the trace and eliminating loops in the error trace. But of course, you might ask, “Okay, what does it actually mean to eliminate loops in an error trace?” Right? Where … when I say a program is an automaton, what I have in mind is really not the control flow automaton, but rather the transition system that is the semantics of the program—so this is typically an sort of infinitestate … an infinite graph, where we have out really long traces. And in particular, if we look at—if you think of error traces in this graph, then typically, there will not be any loops, right? So we will not visit exactly the same state twice in a single trace—right, that’s very … or would be very uncommon in an actual program that you really visit the same state twice. So there’s always some states change. Okay, so here is essentially where abstraction comes in. So now, what we’re going to do is use this very simple idea of—essentially—predicate abstraction, where we partition the state space into finitely many partitions and from that, build a finite abstraction of the program, right? So just to remind you: predicate abstraction means we’re given a finite number of predicates over the program states—so some assertions, say, of this form: x smaller equal to zero, y equal to zero, and so on—and then we partition the states into equivalence classes, depending on which of these predicates they satisfy. So if you have two states satisfying the same conjunction of predicates, then they fall into the same box here, and then, well, we get an—essentially—transition between these boxes whenever there’s a concrete transition between actual states inside of these boxes. Okay, so normally, well, this is used for computing—say—invariants of programs and doing program verification, but how does this help us in our setting? Well, suppose we now have such an error trace here in our program—so we somehow visit the … some error state—and what we’re going to do is: well, we define an appropriate abstraction of this trace—so certain states inside of this trace will essentially end up in the same box—and then we do this extense … existential abstraction, and we get these—so—transitions on these boxes, and then you can see that, well, at certain points, we’ll introduce loops by doing this abstraction, right? Essentially, you could think of this as taking a transition inside of the trace that does not really make progress towards the error—right—and that sort of tells you: okay, so as long as we are staying within that box, we don’t really make any progress, and only when we sort of step outside of the box, that’s where something interesting is happening. So that’s why we now do the slicing where we remove all the loops in the trace, and so what we end up with is essentially just this sequence of statements where we move from one box to the other, and then, the boxes themselves, they will correspond to these assertions that explain what’s going on in the error. Okay, so of course, the question is now: how do we actually come up with this abstraction? Right? We cannot just introduce some arbitrary predicates that partition the state space; it has … somehow has to do something that has to do something with the actual error, so what you want is to somehow be … two states to end up being equivalent—to end up being in the same box—if they’re sort of … from these two states you can reach the error state for the same reason, right? And so next, what I’m going to do is sort of formalize this notion of equivalence. Okay, so this is where what I call error invariants come in. So let’s look at this simple example here. So we have a very simple trace; this is the input state; this is the output state; and well, you can see when we execute this thing, then x should be minus one—I think—or zero, actually. So we not … we are not satisfying this property here. So here is the actual execution; so you can see—yeah—here, so x is zero, and this violates the expected output. Now, what is an error invariant? So an error invariant is formula and an assertion for a specific point in this error trace; it has to satisfy two properties. So the first one is: when you execute the trace up to this position here, then every state that you can reach at this point has to satisfy the error invariant—so essentially, it’s an over-approximation of the reachable states at this point. And the second is: well, if you continue execution from this point with any state that satisfies the error invariant, you still have to hit the error—so you still have to violate the property at the end. So you can think of the error invariant as being an explanation of why the trace fails from the perspective of that position in the error trace. So here’s some other error invariants in the program, and so there’s some interesting thing here: so if you look at these two positions, they actually satisfy the same error invariant now. So essentially, it means that, well, what you sort of do in between doesn’t really matter for reaching the error in the end, right? So well, in this case, it’s sort of obvious, right? Because you’re assigning a variable y that is sort of irrelevant for the value of x. And this is sort of what the slicing criterion gives us; so the error invariants is what defines this abstraction of the trace. So once we have the error invariants, sort of you go back … we go to this finite abstraction. So now, we have these boxes corresponding to the actual error invariants, and then we have transitions between these boxes, and we can see here the loop that we can now slice away, and that’s the output of the analysis. Now of course, error invariants are not unique; so you can say—so for instance, if I go back here—you would see that just by taking the actual reachable states here as our error inv … as formulas, that would also be, potentially, error invariants, and that would, of course, not give us any abstraction at all, and so we essentially would end up with the same trace that we started with. And so here’s another set of error invariants, where they’re just sort of slightly more abstract—where now, I just say x plus a is smaller equal to zero—and now, we can see that this error invariant actually holds throughout all these three positions. So now, where I do the abstraction, I end up with a slightly more compact explanation, where I say, “Okay, the only relevant statement is this assignment to x, and before I do that assignment, this formula will hold throughout the entire trace.” Okay, so this leaves us with the questions whether, first of all, we can actually automatically compute error invariants from a given error trace, and the second thing is can we actually obtain useful error invariants that really give us useful explanations of what’s going on in the trace? And this is where—sort of—constraint solving comes in; so now we’re going to take the error traces, convert them into logical formulas, and then do a reasoning about these formulas to extract error invariants from them. Okay, so here’s how this works—so this is probably something you have all seen—so we start off with a trace; we do some kind of logical encoding, using static single assignments; you can see now whenever we have an assignment to one of the variables, we are going to introduce fresh logical variables for these … for the resulting value. And well, this formula now essentially encodes exactly the actual executions of that trace that, in the end, satisfy that expected outcome. So well, this essentially means that since this is an error trace that violates this property, this formula here will be unsatisfiable—so there’s no execution actually satisfying x2 greater than zero at the end. So now, what does it mean to actually check that a given assertion is an error invariant? And well, we can do that by looking at these trace formulas—so we just look at this conjunction—and say we now have a error invariant for position p inside of the trace; well, what we do is we split the trace into these two parts, A and B, and then I is an error invariant for p—well—if it’s implied by A—so if it over-approximates the reachable states at that point—and the conjunction of I and B is still unsatisfiable—right, so this means we still hit the error when we execute the suffix of the trace. Well, you might already have guessed this sort of connects the question of computing error invariants to Craig interpolation. So now, if you look at this condition that we had: so we have that I is an error invariant for position p if—well—it’s implied by A, and it’s still inconsistent in conjunction with B. So this is exactly the condition that a Craig interpolant for this conjunction A and B satisfied … satisfies, so essentially, this means we can use an—say—an interpolating SMT solver, given A and B, to automatically generate these error invariants, I. So essentially, we just take the trace, encode it as a formula, and use an SMT solver to compute interpolants along the way as candidates—sort of—for seeding this slicing, and then we just use it to compute the abstract slice. So if you go back to our example—say, this is our trace, and these are the interpolants we get back from our SMT solver—and all we do is, essentially, we propagate these error invariants along the trace, trying to see whether they hold at any other points along the trace, and this is how we define this abstraction of the trace. So in this case, well, we get our formula x plus a smaller equal to zero, and this is the one that holds all the way to here. So essentially, we compute some kind of smallest coverage of the trace with the given interpolants from the SMT solver, and that what’s … that’s what defines the actual abstraction—the actual abstract slice—we give back to the user. Okay, so we’re actually implementing this in a tool, and we did some preliminary studies where we looked at some real-world examples. So this is taken from the UNIX tool SED, where we had some seeded bugs inside of there, and so this is one example, where—say—we have … so the whole thing is roughly like twelve thousand lines of code—the entire program—and the error trace is around one thousand lines of code, and it’s really spread out over most of the actual source code. So actually understanding what’s going on there just by looking at the trace is a rather difficult problem. >>: What … where did you get the bug from? >> Thomas Wies: So I mean, this was some examples that have been used in some previous work. Yeah, so it’s essentially for comparison of these fault localization techniques. But the actual bug was a missing initialization of a global variable, and so this was pulled in into some … at some point, and then there was some invalid pointer access somewhere in the program, leading to a segmentation fault. And this is sort of the abstract slice you get back using our techniques, and it’s really quite nice. So you can see this is essentially the initialization of this global variable—last_regex—so it’s set to null, essentially, and then you can see how that value propagates through the program to the point where some pointer is dereferenced, leading to the segmentation fault. Okay, so—yes—before I finish, let me just talk about another application of these techniques in a slightly different setting. So this is a … taken from a … an open-source Java program, and so this program is not wrong per se, but there’s something funny happening. So if you look at this conditional here, saying that e is null, can … actually, can you see whether this value is ever null in this program? So is … does this test ever succeed when that point is reached? >>: [indiscernible] It can be cleaning where it’s crashed earlier. >> Thomas Wies: Yeah, exactly. So that you can see up there, in the first conditional, there is this call to getID on e, right? So then, of course, if e was null, there would already be a null pointer exception at this … in this point here. So whenever you get to that conditional, then e will be non-null, so there’s something strange going on here, and we call these kind of situation inconsistent codes. So essentially, the programmer has some assumptions about the programs states, and well, these certain … these assumptions essentially can never be true—so for instance, their conditional branch can never be taken, because the branch condition will always be false. And so we previously worked on a tool that can detect these kind of situations, and the question is can we also explain them to the programmer? Right? And this is also something where … that we can do with error invariants. Perhaps just a … so why are these kind of consistencies interesting? Well often, they are actually related to true bugs in the program; so here, essentially, what is … what the problem is that these two conditionals here, they— these two if conditions—they should really be swapped in the program; so this is the test that should happen first; and then, the other one should actually happen. Okay, so well, what we did is we applied error-invariant-based techniques to this problem. So essentially, whenever we get such a program, you compute an abstract slice of this entire program, and this gives us, essentially, some relevant statements and some assertions over these statements. And we’ve made some human studies where we used these abstract slices to let programmers understand … so try to see whether programmers understand these inconsistencies. And well, it turns out that using these error invariants, it’s much easier for them to actually understand what’s going on in the program. I mean, it’s not surprising if you see that you really sort of end up with the relevant statements that explain the inconsistency together with these assertions telling you what’s going on. >>: So do you have your tool running both for C and for Java? >> Thomas Wies: Yeah, so it’s two different tools; so there’s one, it is sort of running on Java, and we are working on another one for C programs. So this … I mean, this tool is really just—for now—looking at these inconsistent code in Java programs, and then explaining them using error invariants. >>: So just to get an idea of the running time, like well, the set example that you showed with about a thousand; how long does it take to find the explanation? >> Thomas Wies: So the—I mean—the actual interpolation is in a matter of a few seconds, so still—I mean, even for these sizes of traces—it’s still pretty fast. >>: Okay. And that’s all there is to it. Now, after that, you have to propa … do something [indiscernible] >> Thomas Wies: Yeah, exactly, yeah. >>: That’s all very cheap? >> Thomas Wies: Yeah, so I mean, it’s … you could do some kind of binary search on the trace, so it’s—I mean—sort of logarithmic in the size of the trace. Okay, so let me stop here, and then Zvonimir can take over. So just to summarize: what I talked about is this new technique called fault abstraction that is essentially a static technique for doing fault localization—or more accurately, actually, fault explanation. And what’s the key difference to the existing work? Well essentially, it allows you to avoid … so it doesn’t really require any kind of comparison between failing and successful executions of your program, so you don’t actually rely on testing to get good results out of these techniques. And well, we did some case studies, and we were—actually—we were able to compute concise explanations of errors in real programs, and well, the key concept you should remember is this idea of error invariants that is sort of closely tied to interpolation. And that’s sort of how all this ties back to formal methods and verification techniques that we can apply in this—so—software engineering context. Okay, so well, if you have more questions, I’m happy to take them now and … >>: Well, I just want to … so one more question. >>: Earlier question that … but looking at just one craze you, can miss out on false that compare to the true true, and you’re supposed to go some other way. >> Thomas Wies: Yeah. >>: So are you going to be sort of … are you going to rule out all relevant statements or are you are you going to report spoonless statements? >>: So what we actually do is we use a slightly different encoding of the trace into a formula. So essentially, whenever you take a branch, you make the conjunction of the bran … or the … of the statements in the branch, you make that conditional on the branching condition. So here, you’ve got some kind of implication encoding of the trace; essentially, whenever you … whenever your error sort of depends on all the information inside of that branch, you sort of have to also explain why you were taking that branch, and if you … if it doesn’t depend, then—sort of—it doesn’t matter in the end. So essentially, in this case, you will also get relevant statements that explain why a certain branch condition was true at a specific point. So I’ll be happy to take more questions after the talk, and then, otherwise, Zvonimir can take over for the second part. >> Zvonimir Pavlinovic: Okay, so thanks for invitation. Again, I’m Zvonimir Pavlinovic, and I’m a student at NYU; Thomas is my advisor; and today, I’ll show you: what are minimum type error sources? Why are they important? And how to find them; so this is also joint work with Tim King, who was a NYU student also at the time. So I apologize if my voice breaks; it seems like I was not prepared for this amount of rain in Washington. So in this work, actually focused on a slightly different problem. So we were interested in localizing type errors in programs written in languages that support type inference. So to introduce you to this problem—or remind you if you’re already familiar with it—let’s consider the following OCaml program; so if you’re not familiar with OCaml, I’ll just walk you through it. So on the first line—so actually, this was written by a student who was completely new to OCaml—so on the first line, the student is defining a polymorphic list data type—so basically, it says that lists can be constructed with a Null constructed, where we … basically says with an empty list or with a Cons that takes the first parameter to the head of the list and then the remainder of the list. As you can see here, we are representing a type symbolically, which means we have a list of integers, list of strings, or list of lists of integers, but every element in the list has to have the same type. So more interestingly, on the next slide, the student is defining x to be a list having a single element, number three, which means that this x has a type of list of integers. And then on the last line, the student is trying to print this x with a function print_string that is defined in the standard library, and it has the following type signature: so it accepts strings, and it returns a unit—to unit is similar to void in, for example, Java and C. So if you focus on this program, you can see that the student didn’t provide any type annotations; so for example, an x, you … but you would usually see in C and Java, it would say, “x has to be a list of integers,” and then define a value. So in OCaml, you don’t have to do that; so the way that OCaml deals with this is that it infers types of program expressions, just based on how they’re used in the program— so this is usually called type inference. So the way that compiler actually deal with this program is pretty much the same as I just explained you. So when it comes to the x expression on the last line, it sees that x has a type of list of integers, but a print_string accepts strings. So in OCaml, this is not allowed; so this is a … so this program has a type error, and the OCaml compiler here stops immediately, reports x as the error source—basically—and says to the programmer, “You should fix x.” However, maybe print_string is the actual source of the error; in other words, maybe the student should have write some other function that just takes lists and then prints every element, or maybe student defined x inappropriate in the previous line; maybe she should have written some string. But if you focus on these error sources, it’s unlikely that x is wrong, because she explicitly write a list having a single el … integer, number three; it also … it’s kind of unlikely that x is the actual source of the error, because it was just defined on the previous line, and it’s the only variable in the program. So it seems that print_string is the actual error source, and it really is; so the student later wrote her own function that takes lists and then prints every element. So the main problem is the following. So the way that these common type inference algorithms one can find in ML, OCaml, Haskell is that they take the empty program, and in a top-down fashion, they infer types of program expressions based on how they’re used, and the moment they see a conflict in a way a program expression has been used, they immediately stop, report the location of the conflict as the error source. Other error sources, like definition of x and print_string here, are completely disregarded—they’re never considered. Now, let’s consider in another example … so again, this was not my program; this is really weird program; it was, again, written by a student who was really new to the language. And again, if you focus on the definition of loop function, we can again see there are no type annotations. So we know that print_string takes strings, returns a unit, which means that else branch now valuates to a unit value. Now, in OCaml, both branches of if-then-else statement have to valuate the value of same type, which means that this accumulator funct … variable—acc—also has to have unit type, and then the whole loop function, at the very end, returns a unit. But in the last line, the student is calling … this loop function is passing a list of pairs of floats instead of a unit. So the compiler sees this and generates the following tera … error report; so it blames this location to be the source of the error and says, “You should fix this; you should pass a unit here.” However, maybe—again—print_string is the source of the error; in other words, maybe there should be used a function that takes strings and returns a list of pairs of floats. So believe it or not, this was actually the error source, but the reason why this example is particularly interesting is about the location that the compiler suggested. Suppose we actually pass a unit here; so what happens? Well, the call to a loop function is well-typed, but since loop returns a unit, this is passed through this traverse that accepts lists, so even fixing the location that a compiler suggested doesn’t make the error go away—we still have a type error. So what is the problem? The problem is that these type error reports can often get really tricky and cryptic, and they just basically don’t help the programmer fix the error. So this decreases debugging time, and it’s really difficult for novice programmers to learn a language. So how can we do better? Well, we could consider all error sources, rank them by—and compilers—rank them by some criterion they find useful, and then, show the top-ranked error source to the programmer. So is this a new problem? Well, it’s not; actually it’s a … quite an old problem, and in fact, it’s quite a recurring problem—every couple of years, you can see a couple of papers on the subject. So basically, the solutions range from showing the slice of type inference that is involved in the error to showing the program slice that’s involved in the error to specially designing type systems that trace the error. However, among other things, the drawbacks of these research is—of these proposed solutions—is that what they actually do is they would propose a this criterion, and then they would design and implement a sysol centered around this criterion and then evaluate on benchmarks and say, “We believe this would work good in practice.” But if you want to change this criterion, you have to change the whole system. Also, these solutions would usually focus on a specific type system, and they will also require large compiler modifications. So in this work, we were not actually interested in providing yet another criterion for ranking error sources that we believe might be useful in practice; we actually ask us how something more general. So can you enable compilers localize type errors, but in such a way that first, we can abstract from a specific ranking criterion? What this means is we want to enable compilers easily plug-in and plug-out criterions, so they can compare them, disseminate them, choose the one they think gonna work best in practice—basically, change them at any point they want. Also, can we design it so that we can support different type system—oh, sorry—different type systems? Because we would like this approach to be available to ML, OCaml, and Haskell as well. And also, can we design it in such a way that the implementation overhead is kept low, because we have to have in mind that if we propose this to compile developers, they’ll probably reject any solution that changes the whole compiler infrastructure. So this work, we actually propose a general framework for type error localization that meets these requirements; it’s based on constraint solving. Now, before showing you the actual framework, I think if we first need to define this notion of an error source more precisely. So again, we have a really simple kind of program, and we can see it’s not welltyped, because Boolean negation is applied to value has string type. So in our framework, when we say that x is an error source, so what we actually mean is that … is there is a fix to this program location that makes the error go away—it makes the program well-typed. For example, instead of x, we could have write some Boolean constant, and that will make the program well-typed. So in our framework, if we can represent a generic fix for a program location by repre … by replacing the program expression with a hole—so this hole is merely a placeholder; it doesn’t constrain the typing relation; itself, cannot be the source of a type error—so now, when we replace x with a hole, this program is well-typed, which means that x is a potential error source. So we’re gonna define an error source to be a set of program expressions that, once corrected—or replaced by holes—they yield a well-typed program. Now, as already earlier, we can have multiple error sources; for example, not x is also an error source. We can just write x instead of not x, and this would be a well-typed program. But if you’re compiler developer, you might—sorry, yeah? >>: I have a question. >> Zvonimir Pavlinovic: Sure. >>: When you use the phrase: “this is a well-typed program,” well-typed according to which type system? >> Zvonimir Pavlinovic: The type is a … the programming language; okay, so to give you … >>: Question mark is doesn’t … type system does not understand that question mark, right? >> Zvonimir Pavlinovic: Okay, sure, so—yeah—I wanted to stray from that, but the way we actually formalize this is this is a programming construct, and each time it is used, it gets a fresh-type variable. So basically, it will always unify with any type you give it. >>: Right. >>: So to OCaml, it’s actually like erasing an exception, for instance … >> Zvonimir Pavlinovic: Erasing an exception in OCaml. >>: … and stuff like … >>: Oh, I see what … >> Zvonimir Pavlinovic: So basically, if you go to OCaml, and you write a program expression that erases an exception and look at a type, it is gonna be a type variable. >>: Mmhmm. >> Zvonimir Pavlinovic: So this is how you formalize it, but as a few … as we will see shortly, we are actually not going to implement this in the language; we’re gonna do something relatively simpler, but it’s … but it has the same effect. Okay, so like I said, we can have multiple error sources, so … but if you are a compiler developer, you white just want to show x instead of not x as an error source. But in general, there should be a way for compilers to prefer some error sources over the others, or in other words, there should be a way for compilers to rank error sources by some criterion that they find useful. And in our framework, this criterion is comp … is corporated by compilers of providing a function from program expression to weights. Now, the smaller the weight, the bigger chances are that the program expression contributes to the error, which means that this top-ranked error source is the error source that has minimum cumulative weight, and we call this a minimum error source. Now, note that the … we didn’t put any constraints on the weights; it could be positive/negative, but for simplicity, we just gonna assume we have positive weights, which means that minimum error sources are also minimal, so we don’t have to consider error sources that are not minimal—this part is of exposition. Now, to make this concept of ranking criteria more precise—more concrete—let’s consider an example. Suppose that we are compiler developers, and we want to prefer those error sources that require fewer corrections. So what would be our criterion? Well, it could be a function that, for … to each program expression assigns a weight that is equal to the size of the program expression in AST form. So if you go back to our example, so whenever x itself is an error source, now it gets weight one, because that is its size in AST form. Not is also an error source, because we could have just used some other function that just takes strings, for example; it also has weight one, because that is its size in AST form. Not x we know is an error source, but now, it has weight three. “Hi” is also an error source, because we could have defined x to be a Boolean constant, for example—that would be a well-typed program—it also has weight one. And in most extreme case, the whole program is an error source—I could just write one— and it has weight five. So what are error sources are these three expressions separately, because they’re error sources, and they have a minimum cumulative weight, one. And indeed, they correspond to these error sources that require fewer corrections. So know that this criterion that I just showed you is just an example of a possibly useful criterion; the actual criterion is the responsibility of the compiler. Okay, so the problem that we are actually trying to solve in this work is the one of computing minimum type error sources. So given an input program and a compiler-provided ranking criterion, we want to find an error source that’s as minimum subject to this criterion. So how can we do this? So first note it’s actually an optimization problem; so we want to take the input program; we’re gonna plug in holes in all possible ways, then consider those holes that basically represent error sources, and then find that error so that has minimum cumulative weight. So we’re not going to write our own algorithm to do this; in fact, we’re going to reduce this to a well-known problem for which we already have available tools. In fact, we’re going to reduce it to weighted maximum satisfiability modulo theories. So this is our framework: so compiler provides an input program and a ranking criterion to this typing constraint generation procedure. So what this procedure does is that it takes the input program, and using the typing rules of the programming language, each … for each program expression, it generates typing assertions that basically denote the typing information of the expression, and that—the set of these assertions—is a part of something called typing constraint. Also, this typing constraint contains encoded program locations, and to each this program location, we propagate the weight given to it by the ranking criterion. Now, this typing constraint is then passed through the weighted MaxSMT solver, which, in turn, produces minimum error sources. Now, before showing you the actual details—how this actually works—let just see how this introduction to weighted MaxSMT gives us. So since we were using weighted MaxSMT solver to compute minimum error sources, we can support different type system by relying on rich theories SMT solver’s support. Also, we are using the weighted MaxSMT solver merely as a black box, so we are not imposing substantial compiler modifi … requirements for modifications. And also note: the ranking criterion now is just passed on a different parameter—it’s not fixed once and for all—so each time the framework is called, you can pass a different criterion. Okay, so let’s define the weighted MaxSMT problem. >>: Can I? >> Zvonimir Pavlinovic: Sure. >>: Please go back. I mean, it seems to me that you are doing a lot of changes to the compiler. I mean, look at … you’ll have to modify the compiler so you can encode the left-hand arrow and this arrow … >> Zvonimir Pavlinovic: Okay, so the compilers, in a sense, they already produce these kind of typing assertions. >>: Okay. >> Zvonimir Pavlinovic: Basically, what they do: they produce a typing assertion and does unification to solve it. So what, indeed, we have to do is just put it in a side and encode it into an SMT formula. It actually … we did it, and it is ridiculo … relatively simple. So they already produce it; we just need to go to SMT and put it on the side. Sure. >>: The other question I have is that … how is the ranking criterion specified? >> Zvonimir Pavlinovic: So it’s a function from program expressions to weights. >>: Of the expressions of the input program. >> Zvonimir Pavlinovic: Exactly. So basically, each time it’s called, you can pass a function that accepts a program expression and just returns a weight. >> Okay. >> Zvonimir Pavlinovic: So it will be abstract from a concrete criterions—just some function from program expressions to weight. What is the actual function is the responsibility of the compiler. Okay? Any other questions? Okay, so let’s define a weighted MaxSMT problem. So as input, we have a set of hard clauses which must hold; we had a set of soft clauses, where each clause is assigned a weight, and each clause belongs to some first … some fixed first-order theory, like algebraic data types for now. As output, we had a subset of soft clauses that are the same time satisfiable and have maximum cumulative weight, and this is going to be computed for us by the solver. So we might now wonder, “Okay, the solution are a set of soft clauses that have maximum cumulative weight, but we want to compute minimum type error sources. So how do we do that?” To see that, we need to actually look in our encoding. So remember that I mentioned that what we’re actually trying to do is we’re trying to plug in holes in all possible ways and then find that set of holes that are … presents a minimum error source. So we actually want the solver to do this for us. So somehow, we need to encode the structure of the input program as well as the typing information. So the way we gonna do that is, basically, we’re going to simulate the abstract syntax tree of the program, where the contents of the node are going to be typing assertions that encode a typing information of the program expression as well as to point us to the children. So now, for the whole expression, let—the whole program—we have a root node, which we denote with a propositional variable, Tlet, and basically, in this proposition variable implies the content of the node. The reason why we use implication is gonna be … I going to explain in just a minute. So this typing assertion says that a type of the variable—the type of the let expression, denoted by the typed variable alpha let—is equal to the type of the not x function of application result; now, we do the same for the children. Now, for x equals “hi,” again, we have propositional variables standing for the node, and the contents are that a type of the x is string, and we do the same for function application, and then we say that the function … the type of the function being applied is type of not, and that a type of parameter being passed is type of x. And note that not function is defined in the standard library, so somehow, I need to encode, also, this. So we have another clause that basically says, “Type of the not function is—basically—it’s function from Booleans to Booleans,” and this oppositional variable stands for the location in the standard library—so this is going to be set as a hard clause. So what are soft clauses? These are propositional variables; so to each propositional variable—basically, it represents a program location—the weight assigned to the program location by the criterion is propagated to these soft clauses. So for simplicity, let just assume we have a super-dumb criterion; this assigns weight one to each program expression. So I give this the solver; the first thing the solver is gonna do is going to set all propositional variables to true. So what is … what this means that these upper hard clauses gonna reduce the following set of constraints—and they are unsatisfiable, and this corresponds to the fact that input program, as it is right now, is not well-typed. To see this, note that not is a function from Booleans to Booleans and that in this function application, we are using this not function, but a type the parameter being passed is string, so this is not satisfiable. Okay, sure? >>: I don’t understand your formula at all. What’s … the long arrow is implication [indiscernible] >> Zvonimir Pavlinovic: Yeah, it’s just implication. >>: So … right, and then, what’s the relative binding power between implication and conjunction? >> Zvonimir Pavlinovic: So basically, this and then, basically, this—have a … hold on … okay, maybe I should have made this more clear. Okay, so for example, this part should be together. >>: That line … that should be … so line breaks are significant in [indiscernible] >> Zvonimir Pavlinovic: Yeah. >>: Yeah, so this is … really just corresponds to this sub-job over here. >> Zvonimir Pavlinovic: Just correspond to the structure. >>: The sub-expression then, so that type of constraints are higher than corresponding … >>: But what if you set Tx to false, for example? >> Zvonimir Pavlinovic: Yeah, so I … I’ll come to that, actually. Okay, so—right—so this is unsatisfiable— again, corresponds to the fact that the whole program is not well-typed—and actually, these typing assertions we have here is something that, actually, compilers—in their type-checking problem— consider. Basically, like I said, in a top-down fashion, they generate these assertion; then they solve it using unification, and if they fail, they just report this location as the source of the error. But here, we had an optimization problem. So what the solver is gonna do is it’s going to set some proposition variables to false, and let’s say it sets Ti to false. Now, since we set Ti to false, because implication, this typing assertion can basically be disregarded, and as a consequence, this is now satisfied. And this corresponds to the fact that this program expression is replaced by a hole. So the effect of setting some proposition variables to false is that the corresponding program expression is replaced by a hole. And now, since Ti has a … so these propositional variables in green are set to true, and basically, now we have a maximum cumulative weight solution. So basically, minimum error sources are actually complements to weighted MaxSMT solutions. Okay. So we have implemented this framework, and we were targeting OCaml—actually, the Caml part of it, and which, at its core, has a Hindley-Milner type system—and constraint generation was done using an EasyOCaml tool—this is a tool built in the previous research in the subject—and these typing assertions were basic encoded in a theory of inductive data types. So the weighted MaxSMT solver we had to build on our own—basically, to the simple ramp … wrap-around CVC4 SMT solver and Sat4j Max Sat services. The reason for that is that we had … there’s Yices—this solver that supports weighted MaxSMT—but didn’t really perform well, but actually, it seems that in Microsoft Research is actually putting some efforts into providing these facility for [indiscernible] which is very nice. And so we also make evaluation on these benchmarks from the previous research in the subject—we had three hundred and fifty programs—in fact, these benchmarks are OCaml programs written by students, and they were developed to do DUB. Actually, those two examples at the very beginning were actually from that benchmarks. So note that our framework here the … our contribution is a framework that abstracts from a specific criterion—basically formalizes this problem of error localization as an optimization problem—but we also wanted to see how good at it actually is at pinpointing the actual error sources against—compared to—the standard OCaml compiler, and we saw a fifteen increase in accuracy for our random sample. And the actual criterion we used was the one where, to each program expression, we assign weight equal to the size of the expression in AST form. So more interestingly, actually, were execution times— so how much time does it take to compute a single minimum error source? So on the x axis, basically, we took the benchmark, and we broke it into the groups based on the size of the code. So this item basically says that we are here considering those programs from zero to fifty lines of code, and we have forty-seven of them. And we can see the medians are pretty good, but maximums can be pretty high. So what is the reason for this? Okay, so our weighted MaxSMT solver can be definitely improved—this is a relatively naïve implementation. However, the problem is more fundamental, and that is that the size of the typing constraint, measured in the number of assertions, is actually … actually get exponential, and the reason this is as follows. So we have here a relatively simple program—basically, we are defining a id function, and then you supplying it to an integer embolem. This is a well-typed program. So here, I’m not gonna use our encoding with propositional variables, just for simplicity. So what happens is that these are the typing assertions associated to this function; it basically says the type of id is function from some input type to some output type, where the input type and the output type are the same. So in this id 1, what are the typing constrai … typing associations generated for that expression are the following. So the most interest part is this: so what happened is that we took these typing assertions, copied them, and instantiated them with fresh-typed variables, and we did the same for this part. Why do we have to do that? If we haven’t done that—if we just use alpha id in both cases—then we would restrict alpha i in first … first time to integer and the second time to Boolean. So basically, each time a polymorphic function is used, the set of typing assertions has to be copied and instantion refresh type variables. This is how you support polymorphism in constraint-based type inference. The … in fact—the matter of fact—I’m not using our encoding yet, just the encoding that compilers use generally, shows that this is not a problem that is tied to our approach. In fact, type checking—believe it or not—is exponential time-complete. So you can design a small OCaml of six lines of code, and it basically take forever to type check. But the way that the compilers deal with this is that in the … when they compute this set of typing assertions associated to the polymorphic function, they actually compute a principal type for this polymorphic function, which is basic like a type summary, and then, they instantiate this principal type each time they use. However, we can’t exactly use the same trick in our problem, because type checking is a decision problem, and we have an optimization problem. So if you compute a type summary and this … just use it every time we have a polymorphic function, then in the solver sets hole in the definition of polymorphic function, this is how should be propagated to the summary, and this is where … particularly, what we’re currently doing. And in the paper, we actually propose two solutions: lazy quantifier-based instantiation and lazy unification-based instantiation; unfortunately, due to time constraints, I don’t … I won’t be showing this in more detail. And also, we’re breaking off something else—a third approach. However, even if you saw that, it’s possible that you just have so much typing assertions that you—you had a huge program like this one, hundred thousand lines of code. What we actually observed is the following: so suppose we had a program that is like ten thousand lines of code—it’s perfectly welltyped—and now you write an additional function—completely new, unrelevant to the previous code—it has a type error. Now, our framework has to pull in all typing assertions, and some of them—the great majority—are complete irrelevant for this problem. So basically, we designed in this paper and actually experimented with two further optimizations: constraint slicing and preemptive cutting. So what constraint slicing does is that it runs the standard OCaml type checker, which reports a location of the error, which we that it is not correct, probably, but basically, what it allows us is to take the location and compute a slice of the constraint that is relevant to the error and then, just consider that slice. And prique … preemptive cutting is something else in … based on the fact that programmers are now used to the fact that when they have a type error, they’re always looking for the solution in the upper part. So this is what we did: we had a type error, and this part below, we completely disregard; we just consider the upper part. And in fact, we can actually see four to five times increase in execution times. So as a conclusion, it’s about contributions. So we believe this is a clean formulation of type error localization, where basically, we reduce this problem to optimization problem, and we can abstract from ranking criteria, which actually opens new research directions. So basically, we are separating this search for the minimum error source for the actual definition of a minimum, which we find is very useful. Also, we find … we propose a solution to this algorithm by reducing it to the weighted MaxSMT problem, and by the fact that we can use the solvers, we can actually support various type systems, and we can actually … does require substantial compiler modifications. So these are also topics you can also find in the paper, and I’m ready to take any questions. Sure. >>: So are there situations where you can have … you need to make multiple holes from different parts of AST? >> Zvonimir Pavlinovic: Yes, yes. >>: Would you just make the single hole that … like posing …? >> Zvonimir Pavlinovic: No, it’s a … if your ranking criterion is such that putting two different holes is— and that is the actual minimum error source—then we compute it. So the guarantee we provide you: that we’re going to find that error source that has minimum cumulative weight—so that is a formal guarantee that we provide you. So we’ll not do any approximations; we’ll actually compute the actual minimum error source, whatever it is. >>: The problem is how do you evaluate? When you did that experimentation, did you have users look at them or did you …? >> Zvonimir Pavlinovic: So these benchmarks that I mentioned, so we took random sample and evaluate. Basically, the way that this benchmark is organized is that the students were starting to write program, and then, you could just look … the ribbons of the program, and then, that’s how you see— you know—what they actually fixed. And this is how you actually observe—you know—what was the actual source of the error, but it was kind of difficult, because students who are very new sometimes tend to just—you know—give up and com … write the program from scratch, but—yeah—we were able to take a random sample and validate. Sure. >>: So—yeah—based on the same question; so how often does this happen that minimum error source is not actually the real source? >> Zvonimir Pavlinovic: So … >>: The real thing is you’re supposed to do more changes to the program. >> Zvonimir Pavlinovic: Right, so sure. So consider the following program OCaml: not 2. It is not welltyped, right? So what is actual source of the error? Is it 2? Is it not? You just don’t have enough information to program text. So basically, all we can do is just heuristically try to guess, and that’s what previous research would do; they would just propose different heuristics. So in our framework, these heuristics are actually these ranking criterions. So we—in our work—we weren’t focusing on yet providing another criterion; we actually providing a framework. And you can just experiment with this different criterions, change them, choose the ones that you can aver … work best, and actually, you can easily prototype this. So sure. >>: So the [indiscernible] class of ranking functions I can give here, is it any arbitrary function or …? >> Zvonimir Pavlinovic: Any function from program effects to weights. >>: But it has to be translated to weights for encoding, right? >> Zvonimir Pavlinovic: Sorry? >>: It has to be translated to weights for the encoding process. >> Zvonimir Pavlinovic: Yes, exactly. That’s true. >>: So I can’t give any complicated thing. How can I give anything? >> Zvonimir Pavlinovic: Well, in our paper, we actually propose a—this AST, we just gave an example— but we also so … also show some other criterions. So let me just go back. >> Thomas Wies: Let me just step in. So I saw that Z3 act … I think now supports doing these kind of optimization problems where you have lots of graphic weights, essentially. That might be one thing that’s interesting to see. So you have, say, the size is your first criterion that you just add up more criteria you can narrow it down to something useful. >>: Yeah, so one thing writing machine learning techniques … >> Zvonimir Pavlinovic: Exactly. >>: [indiscernible] radio basis functions so can I use them here? So complicated as financial … >> Zvonimir Pavlinovic: Okay, let me actually answer that, actually. So see here: when you set it … when we set our propositional variables to false, the fact you have is that the corresponding program expression is replaced by a hole, which means that program expression should be fixed. So what it means if said … setting this variable to false. It means you should fix the implementation of not in the standard library, but this is never the case, right? This is not what you want to show. So the way you could completely disregard such error sources is making them hard assertions. So you—for example— so this way, you can completely disregard error sources that come from the standard libraries. So also in the paper, we show how to actually give actual constraints themself as a part of a criterion, and basically, you can support even richer … an easier way you can support rich criterions. A better question: machine learning. So what we actually observe this is a … is that in our programming, the type errors we get actually form a pattern. So when I program, I have type errors, I can actually see a pattern, which maybe means that I’m a bad programmer, but nevertheless, if we could apply machine learning techniques that basically learn each time I fixed my errors and update this criterion, then I would be able to actually customize type error reports. But a problem—actually, Wies wanted to do this—the problem is these benchmarks are never representative, and we’re currently trying to get in contact with people from, like, Jane Street and Wall Street—which basically, they developed the whole infrastructure in OCaml—and they actually are really interested. So we hope that we’re gonna get some good benchmarks, so we can actually experiment with this. >>: I also work with a bunch of programmers; you might get data from them. >> Zvonimir Pavlinovic: Yeah, yeah, sure. So the … definitely, definitely. So actually, the … we find that this is very important for—you know—adopting the language. So people from Jane Street, they actually told us that—you know—once we got experience, you get better off at resolving errors, but still from time to time, it can get really annoying. But this is the most problem is for students, and they can get really frustrated. Actually, you can see from the benchmarks: it’s tries once, two, three times, and then just gives up—doesn’t solve the task. And we feel that the … at least for these novice programmers, it can actually really help. Any other question? >>: I think of error localization; I understand that the people who built a cloud infrastructure and services, sometimes, when a failure happens, it can span on the order of days and by what we know. >> Zvonimir Pavlinovic: Yeah, so … >>: I need the expert programmers. [laughter] >> Zvonimir Pavlinovic: So make sure we were interesting is that we know we can actually generalize this to bury CAS different error localization problems than … as this one. And now maybe you will have a really nice framework to think about this problem, which sometimes can get actually really, really useful. >> Shaz Qadeer: More questions? Okay, let’s thank the speakers. [applause]