>> Nikolaj Bjorner: It gives me great pleasure to introduce Arie Gurfinkel, who is visiting from the Software Engineering Institute by CMU, and he's here all week, until at least Thursday afternoon, maybe even Friday morning. So after the talk, or during the talk, ask him questions, and after the talk, he sits, what was it, 3123? >> Arie Gurfinkel: Forty-three. >> Nikolaj Bjorner: Forty-three, and you can grab Arie and ask him even more questions. Okay, go ahead. >> Arie Gurfinkel: Okay, thank you, Nikolaj. So I'll talk about VINTA, which is a joint work with Aws Albarghouthi and Marsha Chechik, and then some of the work I'll talk about is also joint work with Sagar Chaki, who works me at the SEI, and Yi Li, who is also in Toronto with Marsha. I have to show this. So anything I say, there's no warranty on it. So my work is in automated software analysis, and I'm not going to spend any time motivating why this is important, so I'm assuming you all are already motivated enough, so I'm just going to get directly to the point. So what I work on is this box, called automated analysis, and from my perspective, the view of the world is very simple. You have a program and maybe some notion of correctness, maybe inside the program, maybe as a separate property, and you have to come up with one of two answers, either correct, and maybe proof of correctness, or incorrect, and maybe a counterexample. From my perspective, there are sort of two ways to solve this problem, two big approaches, and one is called Software Model Checking, and it's been pioneered a professor at Clark, among others, and Abstract Interpretation, which was pioneered by Chris Holt. Last year, I was here, and I was talking about UFO, which was sort of a model-checking-driven view of the program verification, and today I'll talk about VINTA, which is more or less the same story, but now from the abstract interpretation perspective. So the motivation that we had, on one hand, abstract interpretation is the most scalable approach to program analysis. That's the one that scales to programs with millions of lines of code, but in practice, there is a price for that. It's very scalable because it's imprecise, and because it's imprecise, it has lots of false positives. And two other problems with it -- it gives you no counterexamples and no refinement, so if it claims that there is a bug, there is no actual execution you can look at and say, "Oh, yeah, I see what's going on." And there is no refinement. If you have a false positive, you have a false positive. That's it. That's sort of the motivation we wanted to do. We wanted to go and reach into a model-checking view of the world, where you have abstraction refinement, and try to add it to abstract interpretation. So what I'm going to do in the rest of the talk is I'll talk about numeric abstract interpretation. Does anybody here not know what numeric abstract interpretation is? Great. Okay, and I'll go slowly over that part, so I'll just give you my view about what it looks like, and I'll talk about VINTA, so what did we do that's different, how it changes this picture. I'll talk about how it's implemented, and then sort of the biggest claim of fame of VINTA is that it did really well at the recent software verification competition, so instead of showing you how we did on some benchmark that we've tried it, I'll talk about what it did in a competition and how much responsible it was for the performance of the tool. If I have time, I'll talk a little bit about the secret sauce, so all of the things that went on top of the basic algorithm in order to make it competitive, and then finish with some of the current work that I'm doing. I'll try and make it informal. Please feel free and ask questions or interrupt in the middle, as I go. So numeric abstract interpretation, what is it? The basic idea is that you restrict the set of facts that you're going to use when analyzing a program to something called an abstract domain, and you do all your reasoning inside this set of facts. So the thing to note here is that the abstract domain is a possible infinite set of predicates, so you have to deal somehow with that, but the predicates come from a fixed theory, so you know a priori all the shapes of the predicates, and you have some efficient way to do the abstract operations. So here are some examples of numeric abstract domains, so the first one is sort of the simplest one, is assigned domain, where for each variable, you're going to keep track of whether it's positive, negative or zero, so the only the fact that you can express is in one of those things. The next one is box and interval domain, where the only fact that you're allowed to say about each variable, what is its upper and lower bound. And you can go down and have something like octagons, where you can express any two variables, whether they're equal, and are they equal, how far apart they are. And then the most expressive numeric domain is polyhedral, which allows you to use any linear inequalities. So that's the first part. That's the domain. So let's see how it works. So this is going to be a small program, and we're going to run an abstract interpreter on it. So we're going to do an interval, a box domain, here. So, first, we'll go into a loop and we'll ask, "Well, what do we know at this point in our abstract domain?" And we know that y1 is between three and four. We then go through these statements, and we'll say, "Well, what do we know after executing those statements?" And we know that x1 is between one and two, x2 is between five and six, and y1 is still between three and four. We then can go into another branch of this If statement, and again ask, "What do we know here?" We know that y2 is between three and four. It just comes from this If condition. Apply those operations, and we know that x1 is between one and two, x2 is between five and six. Now, we have to do a joint at this point, so we have to say, when we come to this point, what do we know? What is the set of facts that we know? And we can only express them as conjunction of upper and lower bounds. So what we know is that x1 is between one and two, because that's the same thing here, and x2 is between five and six, and we don't know anything about y1, and we don't know anything about y2, because in one branch of the conditional, we know that y1 is bounded, but in the other branch, we know nothing about it, and the opposite is true about y2. So when we come to a joint point, we just forget y1. And so in this point, we check whether this implies our assertion. It does, and we're done, so pretty straightforward. Now, let's do another example, so what happens -- the first example was really simple. There were no loops. We just propagated things and we're done. So now we'll have a loop. So we're going to do the same thing. Initially, we know that x is zero. As we go into loop, well, x is zero and also less than 1,000. As we increment, now x is one. We come back to the loop. We have to join, so we knew that x was zero. The second time we come, x is one. We can express it as this interval constraint, x is between zero and one, and we can continue and do this again and again and again. And so what's the problem here? The problem, it might take quite a long time, or even forever, to just go and sort of execute the program like this. And so at this point, what the abstract interpretation typically does is pulls a rabbit out of a hat -- it's called widening -- it says, somehow, let's guess where this is going. So here, for example, we saw that x was increasing, and we decided to guess that it's going to increase forever, and then we know that we only enter a loop when x is bounded by 1,000, so let's use that as the upper bound. And we go through another execution of abstract interpretation, now carrying this constraint, so we know if we enter a loop, x has to be less than 1,000. When we increment by one, it's at most 1,000, and now we know that we've converged. There's no new states being generated. So we found an inductive invariant. We come to our assertion, and it's satisfied. So if we want to look at an abstract interpretation sort of from an operational perspective, and it really gives us this interface, we have a notion of an abstract domain. It knows about a set of variables that we keep track of. There's a set of abstract elements, which is how we represent our abstract values. There's expressions, which we use to represent program states, and statements. And the functions that we have from an abstract domain is we have an abstraction function to go from expressions to elements of abstract domain. We have a concretization function, so that we can say what the result is, how to understand an abstract value. We have an order, so that we know what does it mean that we've converged -- no more new states -- so what abstract set is a subset of another abstract set. And we have an abstract performer. This is a function that takes a concrete statement, like x gets x plus one, and changes it into a statement that operates on abstract values, that takes an upper and lower bound and tells you how plus one transforms an upper and lower bound. And then we have the meet, which is intersection, join, which is union, and widen, which is our return [ph] to go with this. So here is an example of how an abstract domain looks like, in the case of a box abstract domain. So our expressions are all statements of the form x, with an upper and lower bound. Our abstraction gives us an abstract element, which will be just a two pole, representing the lower and upper bound of a value. Our concretization takes us back. The operations are very straightforward. If we want to take an intersection of two intervals, then we just take the max of their lower bounds and the mean of their upper bounds. If we want to take a union of two intervals, then we take the mean of their means and max of their maxes. If we want to do an abstract transformer, in most cases, you apply the operation to the lower and upper bound, and that's it. Note that, here, we have some imprecision. So when we do a join, if, for example, we have an interval on one, three, and an interval seven, 12, a join of them is the interval one to 12. The sum over approximation, we added more things which were not actually part of this interval, and this is where the scalability of the abstract interpretation is. Because we restrict ourselves to very simple elements, the abstract domain, we can manipulate them very efficiently, but we pay the price. We can't express everything we want to express. So let's see how this influences the results, so we're going to run, again, an abstract interpreter in this program, and it's going to fail. So at first, we assume that i is either one or two. We can represent that as i is between one and two. We go into first branch of an If, we know i is one at this point. X gets i, so x1 is one at this point. The other branch, i is two. X2 is minus four. Got that. And now we have to do a join, and as we do a join, we now lose some of our precision, because we know on one branch, on this branch of the If statement, we had something about x1. On this branch of an If, we had something about x2. When we join, we know nothing about x1 or x2, because we don't know which way we got to the join point. And so we'll only learn that i is between one and two, because i was one here and two here. And now, as we go into our second If, we get i is one and i is two, so both of those things are reachable, but we don't know anything about x1 or x2, so we're going to create a false positive. We're going to say that this assertion, as far as we know abstractly, we cannot prove that they're unreachable. But if you go through the program, they aren't, so that's a false positive, so that's the situation that we want to fix. The solution that we're proposing is something which we called VINTA, which wraps an abstract interpretation into an abstraction refinement cycle, so at a high level, it's really a very simple idea. You start with a program, and you run an abstract interpreter. Well, what an abstract interpreter will do, it will compute to some inductive invariant. You check whether this inductive invariant happens to be safe. If it is, you're done. The abstract interpreter has proven that your program is correct, and you're done. On the other hand, if the invariant is not safe, then you know that the abstract interpreter looked at part of your program and found some counterexample. So you switch to the refinement phase, and you check all of the execution that the abstract interpreter has looked and check whether they have a concrete counterexample. If they do, then you terminate, and you say the program is unsafe, and you have a counterexample. Otherwise, you use interpolation to figure out what the facts has the abstract interpreter forgotten that a part that the bounded program that it had looked at that makes it safe. And then you use those facts to strengthen the state of the abstract interpreter. So you have an inductive invariant here that's not safe. You come here, you check it, and you find out that there is no counterexample, and you come back, and now you have something which is safe -- an error state is unreachable -- but no longer inductive. And then you restart an abstract interpreter from that point to compute the next fixed point, to add more reachable states until the set becomes inductive. And then, when it's inductive, you check if it's safe. You're done. If it's not safe, you go back and you possibly strengthen it again, and you keep iterating back and forth. There are several novel things we had to do in order to get this to actually work, and so I'm listing here sort of what's normal from the abstract interpretation perspective and what's normal from the refinement perspective. So from the abstract interpretation perspective, one novel element is that we used cutpoint graphs instead of control flow graphs. What that means, that we compute an abstract value for every loop head as opposed to for every basic block, as opposed to every state. It also means that when we say that a program makes a step, it's a very large step. It's basically some kind of loop-free program segment, as opposed to a single thing, like x gets x plus one, which makes our abstract interpretation step a little bit more complicated, but it can also be a lot more precise, because it can be exact on these intermediate steps. Now, the second thing that we do is that, instead of just computing what's reachable at any program location, we compute an unfolding of a control flow graph. If we come to a loop multiple times, we keep multiple copies of it as sort of an unrolling of a control flow graph that the abstract interpreter had looped. And it has two side effects. One is that you know everything that abstract interpreter has explored. This unfolding at the end is sort of the bounded program that the abstract interpreter is basing its result on, and that's what will be used in refinement. And second, it means that we always compute disjunctive invariance. So what I've showed you before, the box domain, for example, it will compute an upper and lower bound for every variable and every location, but because we unfold the control flow graph and because the allocation that is part of a loop will appear multiple times in a graph, we automatically compute a disjunctive invariant, because the invariant for that location will be the union of the labels at all locations corresponding to it. So this ->>: Do you map? Do you also keep track of the number of iterations that was used for that disjunct? >> Arie Gurfinkel: I'll show you in the next slide how it looks like, so this is sort of a side effect of keeping it unfolding, because we keep the whole unfolding. You could get all of this information from it. We don't need to keep as a separate number, say, in what iteration was this label computed. It's simply an annotation of an unfolding graph, and then in this graph, you know, given each loop -- you know, say, how far it is from the initial state. But one of the side effects of this is sort of what we do. Given any abstract domain in here will lift it into disjunctions. We lift it into a power set domain, and so we had to fiddle and get the widening to work correctly, and so we ended up creating a different widening than any of the existing ones, that's tightly coupled with the exploration strategy. >>: When you say disjunctive invariance, is there a fixed bound of the disjuncts? If there is now bound, then is that related to widening? What makes it converge? >> Arie Gurfinkel: What makes it converge? >>: You trade disjuncts for [indiscernible] is the typical. >> Arie Gurfinkel: Yes, so I don't have a slide for this. It's a little bit trickier, but I don't think we put a bound on the number of disjuncts, but at the same time, we can prove that there is some progress following it, that what's going to happen is you have multiple disjuncts, but you're going to widen. Whenever you want to add a new element to this set of disjuncts, you're going to widen it with one of the existing elements, and then what you have to show is that you get this set of disjuncts growing, but at the same time, there's always a path where at least one of the disjuncts is being constantly widened. And, therefore, that whole chain can never go through. >>: So widening gives you the convergence? >> Arie Gurfinkel: Widening gives us the convergence, but whenever you do power set widening, you always have to be careful, because you may end up that the number of disjuncts grows. Also, each chain for each disjunct is final, and then you would still not converge. >>: So in some sense, you rely on the refinement to make it somehow permanent at some point, right? >> Arie Gurfinkel: Well, the abstract interpretation has determined it, and then the refinement will push back on it. >>: But you could... >> Arie Gurfinkel: Okay, so let me wave my hand and say it does converge, but actually, I don't have slides on widening. I didn't think that would be that interesting, but if you want to ask me later, I can explain how it works. And then, from the refinement perspective, what we do that's interesting is we use an SMT solver to check for whether the counterexample is feasible, which is fairly novel in the case of abstract interpretation. And then we use this concept of DAG interpolation for refinement, which I'll go more about, and what's interesting in this particular work is that this interpolation procedure, sort of guided by abstract interpretation, they play together in order to figure out how to refine. And the overall effect is quite interesting. On one hand, the abstract interpretation is all about throwing away constraints in order to get convergence, and then the refinement part ends up adding constraints that were thrown that actually are necessary to prove that certain counterexamples are infeasible. So let me just illustrate how this works in an example. So I'm going to take this program and I'm going to take it through different steps and show how they work. So a few things to know -- when we explore the program, we use weak topological order, which means we always go for inner loops and wait until they converge until we go out. We're going to use an abstract domain of intervals in here, and look around that the side effect of our abstract analysis will be this labeled unfolding. So we start abstract interpretation the usual way. One means this control node one. Initially, it's labeled with true. Everything is reachable initially. We make one step, we figure out that x is 10. We make another step. We figure out that x is less than or equal than 10, so we did a little bit of widening here, and we went from eight to 10. Then, we go one more through the loop. The effect was less than 10, then we subtract two, it remains less than 10, so we got it to converge. We don't get any more reachable states. We're done with this loop. We now look what's reachable other than the loop, so we'll get the three edges to location three, and right now we can prove that when you get here, x is less than or equal than 10. Therefore, this If condition is possible, you go in, x is nine, and you hit an error. Okay, so now we're going to raise the alarm. So this is the end of abstract interpretation phase. So we run abstract interpretation phase, as usual, but look what we get as a side effect, is this unrolling. So instead of, for example, maintaining that x is less than 10 for location two, we have this full unfolding, and we know that our invariant now at location two is a disjunction of those levels. Does that answer your question? >>: The first question, yes. >> Arie Gurfinkel: So that's how it goes. So, now, when we have an alarm, we want to check whether this alarm was spurious or not, and basically what we're checking is, whether in this bounded program, this location is virtual. And to do that, we switch to an SMT, just generating verification conditions for this bounded problem, and checking it using SMT. So we start with unfolding, we forget about our abstract interpretation labels, and we start generating a verification condition. For each edge here, we'll just assign what the action of that edge is, and then we build a control flow encoding where, for each node in the control flow graph, we'll add a Boolean variable and basically specify that if you happen to be in one location, then in order to get to another location, the edge condition has to be true. Then the same for each individual node. So this encoding is really straightforward in the sense that it's linear in the size of this unfolding, so it's very easy to generate, and then we give it to an SMT solver to check if there is a counterexample. There's two answers. Either there is a counterexample and then the abstract interpretation phase gave us a good alarm, you can produce this counterexample. The other result is, no, there is no counterexample, and we need to do some kind of refinement. So to do refinement, we're using Craig interpolation, and I know -- are people here familiar with Craig interpolation? I see somebody smiling. So Craig interpolation is an old theorem that says something pretty simple. It says, roughly, if you have A that implies B, then there exists an i in the middle such that A implies i and i implies B, and i is in the language which is common to A and B. And here, I'm stating it sort of in a slightly different way, where I'm saying, well, A implies not B, just because that's a more convenient way for software logic and application. So Craig theorem has been around for a long time, and we also know that it's quite operational. It's quite easy to construct the Craig interpolant, given, say, a SAT proof or an SMT proof, given a resolution proof. And model checking, it's used to over-approximate the set of reachable states, but let's see how we want to use it here for refinement. So this statement is not quite good enough for us, because it talks only about two parts. We have some prefix and some suffix, and we can compute what's reachable by a prefix before a suffix, but in other cases, we don't have simple two parts, A and B, and we don't even have a path. We have a directed cyclograph that represents our unfolding. And so the problem that we want to solve is something that we called a DAG interpolation problem, and roughly speaking, it says something like this. It says, given a graph, a DAG, where each edge of a graph is annotated by a formula, so formula pi applies in here, decides that on any pass, a conjunction of the pi formula is unsatisfiable. What we're looking for is the set of these orange labels, i, such that each label is an interpolant between every prefix and every suffix, and every label that, together with any edge, implies the next label. So here is an example. So, for example, i2 has to be an interpolant between pi-1 and pi-8, because of this, but also it has to be an interpolant between pi-1 and pi-2, pi-3, pi-6 and pi-7. And the second condition is that, for example, i2 and pi-2 has to imply i3, and i2 and pi-8 has to imply i7 and so on. So, if you think about these conditions carefully, you'll see that really what we're looking for, a sort of Horn style proof of correctness of this program. We're looking for intermediate states i1 to i-n, such that the initial state is true, the final state is false, and any state together with any statement implies the next state. Now, the question is, how do we compute such a thing? And there are multiple ways to do so. so one way is to just cast this as sort of a Horn problem, where you treat each individual i as an unknown predicate, and you just pose these restrictions that you want as a Horn satisfiability problem. But another way is to turn it into an interpolation problem, into a linear interpolation problem, that you encode this whole program just as a normal verification condition, give it to SAT solver, get a result and then mine the result as proof for all the labels. And this is... >>: Isn't fair to say that the linear interpolation methods predates this view, this other view? >> Arie Gurfinkel: Right. The linear interpolation method predates. The linear interpolation method is in 1957. >>: The tools that you're using also predates. >> Arie Gurfinkel: Right, and the tools that we are using predate the Horn formulation, at least in SMT world. So the way we actually solve this problem is sort of in this very simple way. We translate this whole problem into a sequential interpolation problem that's already supported by existing tools. And the idea is quite simple. At first, we take this graph and we build a verification condition, and we realize that the verification condition has this form, where you have assertions for different locations in the control flow graph, and you can order those assertions in a topological order, so you have a linear sequence, that each element further in the sequence means you're deeper inside the graph. We then can compute a sequence interpolant for each -- for the cut between each location and the following locations, so sort of cutting the graph in different places. And then this will give us almost the result that we want, except, if you follow the definition of interpolation, you may have variables which are out of scope at a given location. If you have two nodes of a graph which are siblings of one another, then variables that are only available for one sibling may appear in an interpolant for another, and that's a problem for us, because at the end of the day, you want to get something that's maybe inductive, and so you don't want to have any free variables, any out-ofscope variables in the expression. Otherwise, they will have to be quantified out, and you have to deal with quantified formulas. And so we go through this cleaning process to somehow get rid of them. So that's the price we pay for using a sequential interpolation method, so we get an easy encoding into it, but the output is not quite what we want, and we have to spend some time on this cleaning phase. Actually, we've spent quite a while playing with various versions of this. The original one used quantifier elimination during cleaning, when all heuristics failed. We have a new encoding that now can do clean completely using projection and no quantifier elimination at all. And I'm looking at various other ways, using, for example, the Horn procedure that three has, allows us to solve this whole problem at once, and also to solve just the cleaning part of that problem. So that's that. Let's go back to our running example and see where we are. So what DAG interpolation will give us is we have this graph. We know that the result is unsatisfiable, so when we crank the DAG interpolation handle, we're going to get this i1 to i3, in this case, where i1 will be true, because everything is reachable here, and i3 will be false, because in this particular example, error is unreachable. And this is true for every edge. So now, we could have taken these i labels and used them to strengthen the result of abstract interpretation. We can go to every abstract label and add those labels to it and say that's also true, and that prevents an error from happening. There's one catch here. By the time we got to this point, we already forgot anything that abstract interpreter has computed, and so we end up redoing a lot of the work that it has done. And we found this approach to be really inefficient when you get larger and larger unfoldings generated by the abstract interpreter. So what we really want to know is we want to see, well, the abstract interpreter already found some bound on the reachable state. What we really want to know is what else we have to add to that bound in order for the program to be safe. So we only want to know how to sort of help the abstract interpreter, as opposed to redo its job. And the solution is quite simple. We simply say, well, then what it means that what we want is sort of a restricted DAG interpolant. We want to take our program, take the result of the abstract interpretation, and just say, "Well, let the program assume them." So at every program step, we can just add an assertion and say, well, if an abstract interpreter said x has this bound, just assume that x always has this bound at this particular point. That's not going to change our program, because we know that the abstract interpreter computed a set of over-approximation, but it will change what a solver has to discover, because those facts are already there. They're already available, and they can be used. And then the solution to this problem, the solution to the interpolation problem will be, well, what other facts do I need to add in order for this problem to be unsatisfiable. >>: Is there a step that takes these i's and conditions them to the AI domain that is being used, because these i's may not be discovered in the same form that the domain of AI is? So is there a step that conditions them, and does that lead to -- maybe lead to loss of... >> Arie Gurfinkel: This is actually some of the things that we're looking at right now. I have a couple of slides at the end to address this question. So, yes, you absolutely are -- the question is that these labels may be in arbitrary form, arbitrary SMT formulas, whereas for an abstract domain, we need something like a conjunction of linear inequalities, a convex hull of some sort. So how do you go from one to another? And the answer, in practice, how do we go between one and the other? We just have a very simple heuristic to get something in an abstract domain that over-approximates each label, but really solving this problem for real is quite difficult, and we're working a little bit on it. >>: Simple question on the side is that, when you conjoin AIs to the edges on the right, doesn't it correspond to just conjoining AI to the precondition applications? I'm wondering why you did not just add AI as a conjunction to the implication, next to TIJ? >> Arie Gurfinkel: Except you're thinking about it in terms of a Horn presentation, so you're thinking about i as the uninterpreted predicate and this is given to you, whereas I am saying this is what's given to me and I produce i's, which is the solution to my Horn problem, and it satisfies this predicate. >>: But it satisfies a... >> Arie Gurfinkel: Yes, it satisfies. >>: Thank you. >> Arie Gurfinkel: But this is where the problem statement is. This is the statement about the result, whereas in a Horn formulation, you state that you want a result that satisfies this and the solver gives it to you. But the point is that the output that you get, you don't get a DAG interpolant. You don't get something that's good enough on its own, but something that's good enough together with abstract interpretation. So if we go back to our running example, and we would have run this whole process here, this is what we get. We have the blue labels, which is what the abstract interpreter has computed, and we have these orange labels, or the non-blue labels, which is what we've computed by the interpolation. And so what you have to note here is that, for example, this label happens to be true, which basically says the abstract interpreter already knows enough about this particular location because it knows that x is 10. I don't need to add any more. And here, I want to add that x is not just less than 10, but it's also eight, and here I want to add that x is not just less than 10, but it's also six. And so now, once we've strengthened the results, so we still have sort of a reasonable abstract interpretation state, and in this case, all of the constraints happen to be inside the abstract domain, but one thing is broken. And the one thing that's broken is that this result is now safe, but it's no longer inductive. So we know that because we know that this edge is no longer true. This relationship, that the set of reachable states here is a subset of the reachable states here is no longer true. X is six is not a subset of x is eight, and so we can restart the abstract interpretation right from this point. So this is where we need this ability to take this label, convert it back to an abstract domain, and then we can run the abstract interpreter again. Here it will find out that, well, if x was equal to six and we run one step, it's less than six. We converge. We come here, it's less than six, and x is never nine. That's it. So just to give you a high-level overview of what's happening here is we run an abstract interpretation, but at the same time we keep our unfolding. We keep the bounded program that the abstract interpreter has given us, and we get blue labels. If an error is reachable, we go to refinement phase, which basically asks, give me a Horn-style proof why you think this bounded program is safe. That's our orange labels. We then mix the orange labels with the blue labels, which now proves that this bounded part is safe, so we have something that's safe but is no longer inductive, and if it's not inductive, it means that there are some loops which are not covered, some loops where we haven't explored all the reachable states. We find out those loops, and we restart the abstract interpretation from this point. And that's it. So this is implemented in a tool which we call UFO. It's available here, with source, so feel free to look through it, run it, try it, and contact me if you have any sort of questions or need help. This is the architecture of the tool. We have this big front end, which I'll talk more about, that basically goes from C into an intermediate representation via LLVM, and then the part of the tool is this art constructor that builds this graph, which we call an abstract reachability graph, and you can control it by giving an expansion strategy, how you want to expand the graph, what abstract domain you want to use and how exactly you want to refine. There's lots of chokes on how to employ DAG interpolation. It uses two SMT solvers. It uses Z3 almost for everything, and it uses Mathsat for interpolation. Are there any questions so far? >>: All of your examples were type safe and very simple, and I was expecting to hear that, no, this only works on type-safe language, but you said C. how do you deal with type unsafe? >> Arie Gurfinkel: I pretend to not know about it. >>: Okay. >> Arie Gurfinkel: So this has to go about the software verification competition. So all the examples I've shown you were small, just to fit on the slide, but I'm claiming this works because of the software verification competition, not because of the small examples. And now one of the -- so, okay, let me tell you a little bit about the competition. That's a very serious question, and I don't have any good answers. So the software verification competition... >>: So I'm trying to figure out how applicable it is. >> Arie Gurfinkel: So the software verification competition started in 2012, and so this year was the second year. It's held as part of ETAPS and collocated with TACAS conference in ETAPS. Its goal is to provide a snapshot of state-of-the art software verification tools. It had quite a few participants and quite a few benchmarks. So one of the issues with any competition like this, so the decision was made to use C as the language, so that the tools actually support some realistic language, not a toy, not an intermediate language. But the problem is that we don't have formal semantics, even for type safety, even just settling on the formal semantics of the language is impossible. And here a decision was made that we'll just the semantics which are reasonable for all the benchmarks to which all of the participants agree. And the way it works is that there's a large collection of programs, everybody runs their tool, they get different -- benchmarks are marked whether they should have an error or don't have an error, everybody runs their tool, decides on the results, and if people happen to disagree, there is a discussion and we decide whether the benchmark is kept, removed, or what to do with this particular case. So in that sense, the answer is sort of very pragmatic and driven by the benchmarks, as opposed to trying to answer a more general question. >>: [Indiscernible]. >> Arie Gurfinkel: You could, so the way we actually do, we live inside LLVM. We let LLVM get to an intermediate representation, at which point we only work at the level of numeric registers that LLVM provides, so LLVM compiles the program down to infinitely many numeric registers and memory. We treat memory as completely nondeterministic and numeric registers, that's what we analyze. So it can give you a formal semantics of what we verify with respect to that, but tying it back to C is quite difficult. >>: Do you not handle pointers at all? >> Arie Gurfinkel: We don't handle pointers at all, but it's not to say that these programs don't have pointers, that we don't handle programs with pointers. That is to say that pointers, a lot of the pointers are removed at the preprocessing hub, when the program is compiled down to registers. >>: By what? >> Arie Gurfinkel: By LLVM and by our process. If you can prove that a certain location is not alias, then it will be compiled down to a register, and then all the pointer references will be out. So this program, so the benchmark consists of a large collection of C programs, ranging from sort of things that people traditionally use for their tools to a big set of Linux device drivers. So they're not toy programs. They don't use toy parts of a language. They do whatever they want, but most of the properties that we want to prove are the kinds that don't really depend on deep point reasoning or type safety, things like that. >>: But you still have some kind of deep domain for the abstract interpretation, right? So is that sound, in a sense? >> Arie Gurfinkel: Yes. It is sound, because I assume that heap is completely nondeterministic. So if you write something to heap and then read something from heap, you get a nondeterministic value, so that's sound. It's just maybe not precise for anything interesting that you want to say I put something in the heap, I build the link list, and then I want to navigate through it. The tool will just tell you you'll end up in an arbitrary place. >>: Okay, so you get aliases which might not be aliases, for instance. >> Arie Gurfinkel: Right. But the reason why this works is that the front end takes care of lowering a lot of the pointer reasoning. Any sort of shallow pointer reasoning gets lowered into registers by the front end. >>: So there are model checkers who do the same thing, which define the predicates based on counterexamples, so at a high level, what is the advantage of adding abstract interpretation giving? Is it more precise somehow? It converges faster because of widening, it wins competitions? >> Arie Gurfinkel: It wins competitions. It wins competitions. That's Nikolaj's answer. So the tools that were participating, so the ones that are involved, are the ones that somehow saved the same problems as our tool solves. The tool is called UFO. There are three tools which are sort of in a different -- CSeq and Threader do concurrency and only concurrency, and we don't, so we're never compared with them. And Predator does sort of deep memory things, which, again, we haven't competed in this nature. But for all other benchmarks, UFO seems to perform a lot better. >>: Why is my question. >> Arie Gurfinkel: Well, it's hard to dish out why, if you pick a particular point and you say, "Is this particular thing important?" I can say, "Yes. If I turn it off, it doesn't perform as well. But if you just take this particular thing and remove everything else, you don't know. So why is abstract interpretation important? Well, because it can discover facts out of an infinite domain of predicates. If you use predicate abstraction, you may get stuck trying to figure out what is the right predicate, whereas here, you start with an infinite set of restricted predicates, and you use widening to find out what is a good set. But widening can overshoot, and so you use sort of the same refinement techniques to bring you back into a safe region. So any more questions? Okay, so this is how the competition was run, so the way the scoring was done, you get points for solving things correctly. You get negative points for solving things incorrectly, and it was deemed that finding a counterexample is easier than providing something correct, so you get more points if you prove something correct when it is correct, and you get more negative points if you prove something correct which isn't. And the distinction in these numbers was partly influenced by the previous year, where a toolset would do bounded model checking, would look at only a few executions, would really win everything because they would look at a few executions and say the program is safe, and the distribution of benchmarks was such that if you have an error, it's a very shallow error. So those tools would constantly win. Now, this year, there's many more benchmarks, and that's no longer true, but also the scoring was changed so that simply guessing things would penalize you more. So, well, that's our outcome of this, so we've participated in four categories, which is Control Flow Integers, which has all sorts of the traditional benchmarks used by software model checkers, and then Product Lines, which comes from some set of examples where there is the same program but different configurations of it, Device Drivers, which comes from Linux device driver verification projects, and SystemC, which is SystemC programs converted to C with a scheduler. So SystemC is a hardware description language that describes concurrent systems. In all of those cases, the tool performed much better than just predicate abstraction by itself, and the abstract domains were sort of crucial to this, so if we look at the benchmarks, you'll see that if you want to find a bug, then it seems to be really good to use VINTA with a box abstract domain. And the intuition here is that the box domain is not very precise, but if it can tell you that you need to unroll a loop, that loop is easy to know that it's safe for a small bound, and so this domain seems to be really good at figuring out how much to unroll the system so that a bounded model checker will find a counterexample. And then we have VINTA with a boxes domain, which I'll talk about in a second, which is a domain that allows us to have disjunction in addition to intervals, and that domain very closely mimics the typical predicates that a software model checker finds, except it has all of them at once, and it seems to be really good at proving safety. So let's see if I can bring up the results. So this is, if you want to go to that link, you'll see lots of details about who did what. This is our view, four column. I know I want to go down into numbers, but if you're interested, you could go and you could click on all of the numbers, and it will give you a comparison of tools and draw you graphs, and you can try and tease out what's the difference between different techniques. >>: [Indiscernible]. >>: Arie Gurfinkel: Yes, actually, we got a negative score overall, and we didn't quite realize what this category was. So the idea for the overall category, we saw that it was sort of the addition of all the other things, but in fact, the overall category was all the tools surrounding all the benchmarks, even the ones that they didn't know anything about, and if the tool gives you a wrong answer as opposed to unknown, it's penalized. But you are allowed, if you didn't know, to check the benchmarks. For example, if the question was about memory, you could just say unknown, whereas we didn't, and so that's what separates this column. Okay. So let me tell you more about what else. So VINTA I think was the main reason why we did so well in the competition, but there were a number of other things that really influenced, and those things which we don't typically publish. They were small things that end up making a big difference. So the secret sauce, the important things were at the front end. I'll tell you more about it. Combining with abstract interpretation. The boxes abstract domain I'll talk about in a second. The civility of computing DAG interpolation, so computing a solution to this whole problem at once, as opposed to doing it pass at a time, lazily, or any other way. And then, at the end of the day, we do run a lot of things in parallel, so we don't have to worry about which particular setting is the best. So the front end is something that in principle is really, really simple, but in practice, it's extremely messy. So what ended up happening is, it's really a huge mess. So there is a CIL pass that goes through the code and normalizes it, and this is basically where I try to fight against the software competition semantics, because their semantics have nothing to do with what a compiler thinks legal C should do. For example, in a software competition, it was assumed, if a variable is uninitialized, it can have an arbitrary value. When a compiler sees an uninitialized variable, it says it means it's undefined, and therefore it's legal for it to do anything. So if you have a branch on an undefined variable, you could pick the easiest path you want to take and optimize the other possibilities. So there is a CIL pass that sort of tries to do this at a syntactic level to try and fix all of those things. For example, we had just calls to nondeterministic functions to initialize every variable. Then there is an LLVM-GCC pass, which uses GCC to take the benchmarks and convert them into LLVM land. And then there is LLVM optimizer, based on an old version of LLVM that has a good optimizer, but not too good, that we use in order to simplify memory and do all of those things. And the problem here, if you switch to a new version of LLVM, the optimizations are too smart, and so again, you get into this clash. So, for example, in the competition, it was assumed that if you get a nondeterministic integer, using a NonDetIn function, and you cast it into long, you get a nondeterministic long. Well, what a compiler will tell you, no, you still get a nondeterministic int, and the top bits of this long will be zero. The new LLVM knows about this and simplifies based on that. Older LLVMs didn't. So after you get through all of this, you could look at what happens to the benchmarks, and this is quite interesting, that out of 1,592 SAFE benchmarks, the front end happens to prove 1,321 of them, so many of them are just simple things that you could do by constant propagation and simple simplifications. For the UNSAFE examples, there's fewer of them that the front end can remove. So a big part of this is just having a really good front end. Now, the second thing is this boxes abstract domain, which is unique to our tool, and the idea here is to say that, in most cases, when we think about programs where software model checker did well, we want to find an invariant that's not a convex hull. It's not an upper and lower bound to every variable, but it's more a disjunction of such things. And so you'd want an abstract domain that's not a single box, not a single min-max, but rather a bunch of such boxes. If you start thinking about it, it's easy to see what this domain is, but the difficulty is how do you maintain it? How do you represent this as a formula so that it doesn't block? And the solution that we were using is using decision diagrams. So a while back, we've extended decision diagrams linear arithmetic, so that's decision diagrams where each node, in addition to having the true and false child, is labeled by a linear arithmetic formula. And the diagram knows about it and does simplifications based on that. The boxes domain basically uses this data structure, where each node represents a single minmax constraint. The interesting thing about this representation, that it's canonical for this domain, so it behaves virtually like BDDs and can be implemented really efficiently and lets you compact to represent those things. And then, finally, we run everything in parallel, so we looked at various settings that the algorithm has using an abstract interpretation, using predicate abstraction, using different forms of refinement, and we couldn't find out any sort of best strategy. Since we were running out of time and the competition had eight cores, we figured out we could run eight different things in parallel and see what happened. And so this is what we ended up with. There's a whole bunch of different combinations. So, for example, this one does -- uses a lot of LLVM optimizations, uses the boxes abstract domain and uses very nonaggressive widening, so it lets you go deep into the loops, but it's run for a very short time. So that's really good for cases where there's a couple of iterations of a loop, maybe 10 or so, that you have to get through, and the boxes domain can keep track of the state really precisely, and then you converge. But if you don't, then that's probably a bad strategy. And so if you see all of them, they use either a strong abstract domain or a weak abstract domain and try and decide whether to run for a short time or for a longer time. So to conclude, more or less, this is where VINTA came from. This is VINTA's family. We started a while back with an algorithm called Whale, which was using interpolation to try and reason about recursive programs. We've sort of abandoned that work because we didn't have any good benchmarks to tune this algorithm in, and then we switched to the competition benchmarks. We first developed this UFO algorithm which eventually became our framework on which we do our development that introduces concept of DAG interpolation and combining predicate abstraction with interpolation-based verification. And so VINTA pushed it a little bit further by adding abstract interpretation components, and the difference between these two, conceptually, is quite similar work, but the details, there's quite a lot of difference. So we had to really work out how the widening would work, how to interface with the abstract interpreter, and the abstract interpreter really put a lot of stress on our DAG interpolation component. We really had to tune it a lot more, just because here the abstract interpreter would go much deeper into any loop than the predicate abstraction. Well, are there any more questions? If not, I'm just going to talk about some of the -- one current work, and just maybe get your opinion, maybe offline. So one of the problems that you actually asked this question of what we call a symbolic abstraction, how do you build an abstract domain on top of an SMT solver, or how do you go from an SMT formula into an abstract domain representation? And this is something I'm working with Yi Li is the main student working on this, working together with Aws, Zach, >>: So here is the statement of a problem. So VINTA wants an abstract function, which basically takes an SMT formula, say over a linear arithmetic to make it simple, and gives you a result in an abstract domain, maybe a conjunction of linear inequalities, maybe a conjunction of min-max constraints. And if you think about it for a bit, implementing it is really an optimization problem. So you could look at it in this way. So, formally, we have a quantify free-arithmetic formula phi, and we have a bunch of terms, and we have a bunch of terms, t1 to tk, and what we know is we want to find value for min and max such that phi implies a conjunction of those both, and we want min and max to be the tightest, and this would be a value in the abstract domain. This would be a value in the abstract domain of box, but really it's more general, because t can be an arbitrary term, and of course without loss of generality, we can say, well, since phi is an SMT formula, it's enough for t to be variable. If you want to make it a term, we'll just add a new variable to phi. There is a naive solution, which is to convert phi into DNF, take a disjunct, and then optimize using either Simplex or Fourier-Motzkin and then go on to the next stage. But that doesn't -- at least the very naive version doesn't scale, and the problem here is that typically what we want phi to be is a bunch of loop-free executions of a program, so taking a DNF is quite large. So this is the solution that we've been playing with. It seems to be fairly effective. So, in a nutshell, we want to do something really simple. We want to successfully underapproximate phi by enumerating its models. But what happens is, if phi happens to be unbounded, if it's not a polytope, then if we just enumerate the models, we may go forever into some unbounded region. And so what we tried to do is opportunistically check by basically shooting a ray into a direction whether it happens to be unbounded. So this is how it looks like on an example. So say we have these two. This is our phi, this particular shape, and we're going to start under-approximating. So we're going to give this phi to an SMT solver and ask for a model, and we may get a point in here. And then we can ask for another model, and we're optimizing x and y, so we'll ask for a model where x increases, and we can always ask for a model that's more on the faces of this polytope, by just taking a term corresponding to any of the faces and saying, "Find me a solution on it." So the next solution could be here. And you ask for a bigger solution. Okay, so before asking for a bigger solution, this is what we're going to use in order to decide whether something is unbounded. So we have this fact that if you happen to have two points, p1 and p2, where x is increasing, so there's a race for p1 and p2, and they happen to be in the same hyperplane, so phi, and there isn't p3 on which x increases, but that's not on more hyperplanes than p3 -- than p2 -- then this direction is unbounded. So here, for example, if I look into this direction, I have this point. I have more points where x increases, but I have this point that lies on more hyperplanes. It lies on the second hyperplane. So right now I can't conclude whether the x direction is unbounded. I ask for another point where x is bigger. I'll get a point here. Well, now I know that the only way for x to be bounded, if there is another point on this hyperplane that intersects some other hyperplane, I can check for that. I can have the equations of all of the hyperplanes, I can just check -- does there exist a value where x is bigger? Yes. And that there exists a value where x is bigger, yes, and that there exists a value where x is bigger as it intersects one more hyperplane. And if the answer is no, then I know I have this whole direction. And once I have got this, I now go and look for other values for x, so maybe I can ask for the lower value of x, and I'll get here. That also happens to be a lower value for y. There isn't one. I ask for a higher value for y. Maybe I'm lucky and I get here, and now I have my convex hull. So that's our current solution. Okay, looking for opinions for how to make it better. So one sort of limitation here is that, when we get to this point, we don't have any control. It seems like if we're using an external Simplex, we could always get a point which maximizes x, for example, in a given polytope. But when we use an SMT solver as the interface we have now, we just ask for a point, and then we ask for another point in a completely separate query, as opposed to trying to say, "Well, give me a point and then try to find another point in the polytope you're already in that maximizes that direction." That's it. >> Nikolaj Bjorner: Cool, you're done with the talk? >> Arie Gurfinkel: Yes. >> Nikolaj Bjorner: Well, thank you.