Document 17864766

advertisement
>> Nikolaj Bjorner: It gives me great pleasure to introduce Arie Gurfinkel, who is visiting from
the Software Engineering Institute by CMU, and he's here all week, until at least Thursday
afternoon, maybe even Friday morning. So after the talk, or during the talk, ask him questions,
and after the talk, he sits, what was it, 3123?
>> Arie Gurfinkel: Forty-three.
>> Nikolaj Bjorner: Forty-three, and you can grab Arie and ask him even more questions.
Okay, go ahead.
>> Arie Gurfinkel: Okay, thank you, Nikolaj. So I'll talk about VINTA, which is a joint work
with Aws Albarghouthi and Marsha Chechik, and then some of the work I'll talk about is also
joint work with Sagar Chaki, who works me at the SEI, and Yi Li, who is also in Toronto with
Marsha. I have to show this. So anything I say, there's no warranty on it.
So my work is in automated software analysis, and I'm not going to spend any time motivating
why this is important, so I'm assuming you all are already motivated enough, so I'm just going to
get directly to the point. So what I work on is this box, called automated analysis, and from my
perspective, the view of the world is very simple. You have a program and maybe some notion
of correctness, maybe inside the program, maybe as a separate property, and you have to come
up with one of two answers, either correct, and maybe proof of correctness, or incorrect, and
maybe a counterexample.
From my perspective, there are sort of two ways to solve this problem, two big approaches, and
one is called Software Model Checking, and it's been pioneered a professor at Clark, among
others, and Abstract Interpretation, which was pioneered by Chris Holt. Last year, I was here,
and I was talking about UFO, which was sort of a model-checking-driven view of the program
verification, and today I'll talk about VINTA, which is more or less the same story, but now from
the abstract interpretation perspective.
So the motivation that we had, on one hand, abstract interpretation is the most scalable approach
to program analysis. That's the one that scales to programs with millions of lines of code, but in
practice, there is a price for that. It's very scalable because it's imprecise, and because it's
imprecise, it has lots of false positives. And two other problems with it -- it gives you no
counterexamples and no refinement, so if it claims that there is a bug, there is no actual
execution you can look at and say, "Oh, yeah, I see what's going on." And there is no
refinement. If you have a false positive, you have a false positive. That's it. That's sort of the
motivation we wanted to do. We wanted to go and reach into a model-checking view of the
world, where you have abstraction refinement, and try to add it to abstract interpretation.
So what I'm going to do in the rest of the talk is I'll talk about numeric abstract interpretation.
Does anybody here not know what numeric abstract interpretation is? Great. Okay, and I'll go
slowly over that part, so I'll just give you my view about what it looks like, and I'll talk about
VINTA, so what did we do that's different, how it changes this picture. I'll talk about how it's
implemented, and then sort of the biggest claim of fame of VINTA is that it did really well at the
recent software verification competition, so instead of showing you how we did on some
benchmark that we've tried it, I'll talk about what it did in a competition and how much
responsible it was for the performance of the tool.
If I have time, I'll talk a little bit about the secret sauce, so all of the things that went on top of the
basic algorithm in order to make it competitive, and then finish with some of the current work
that I'm doing. I'll try and make it informal. Please feel free and ask questions or interrupt in the
middle, as I go.
So numeric abstract interpretation, what is it? The basic idea is that you restrict the set of facts
that you're going to use when analyzing a program to something called an abstract domain, and
you do all your reasoning inside this set of facts. So the thing to note here is that the abstract
domain is a possible infinite set of predicates, so you have to deal somehow with that, but the
predicates come from a fixed theory, so you know a priori all the shapes of the predicates, and
you have some efficient way to do the abstract operations. So here are some examples of
numeric abstract domains, so the first one is sort of the simplest one, is assigned domain, where
for each variable, you're going to keep track of whether it's positive, negative or zero, so the only
the fact that you can express is in one of those things. The next one is box and interval domain,
where the only fact that you're allowed to say about each variable, what is its upper and lower
bound. And you can go down and have something like octagons, where you can express any two
variables, whether they're equal, and are they equal, how far apart they are. And then the most
expressive numeric domain is polyhedral, which allows you to use any linear inequalities. So
that's the first part. That's the domain.
So let's see how it works. So this is going to be a small program, and we're going to run an
abstract interpreter on it. So we're going to do an interval, a box domain, here. So, first, we'll go
into a loop and we'll ask, "Well, what do we know at this point in our abstract domain?" And we
know that y1 is between three and four. We then go through these statements, and we'll say,
"Well, what do we know after executing those statements?" And we know that x1 is between
one and two, x2 is between five and six, and y1 is still between three and four. We then can go
into another branch of this If statement, and again ask, "What do we know here?" We know that
y2 is between three and four. It just comes from this If condition. Apply those operations, and
we know that x1 is between one and two, x2 is between five and six. Now, we have to do a joint
at this point, so we have to say, when we come to this point, what do we know? What is the set
of facts that we know? And we can only express them as conjunction of upper and lower
bounds. So what we know is that x1 is between one and two, because that's the same thing here,
and x2 is between five and six, and we don't know anything about y1, and we don't know
anything about y2, because in one branch of the conditional, we know that y1 is bounded, but in
the other branch, we know nothing about it, and the opposite is true about y2. So when we come
to a joint point, we just forget y1.
And so in this point, we check whether this implies our assertion. It does, and we're done, so
pretty straightforward. Now, let's do another example, so what happens -- the first example was
really simple. There were no loops. We just propagated things and we're done. So now we'll
have a loop. So we're going to do the same thing. Initially, we know that x is zero. As we go
into loop, well, x is zero and also less than 1,000. As we increment, now x is one. We come
back to the loop. We have to join, so we knew that x was zero. The second time we come, x is
one. We can express it as this interval constraint, x is between zero and one, and we can
continue and do this again and again and again. And so what's the problem here? The problem,
it might take quite a long time, or even forever, to just go and sort of execute the program like
this. And so at this point, what the abstract interpretation typically does is pulls a rabbit out of a
hat -- it's called widening -- it says, somehow, let's guess where this is going.
So here, for example, we saw that x was increasing, and we decided to guess that it's going to
increase forever, and then we know that we only enter a loop when x is bounded by 1,000, so
let's use that as the upper bound. And we go through another execution of abstract interpretation,
now carrying this constraint, so we know if we enter a loop, x has to be less than 1,000. When
we increment by one, it's at most 1,000, and now we know that we've converged. There's no new
states being generated. So we found an inductive invariant. We come to our assertion, and it's
satisfied.
So if we want to look at an abstract interpretation sort of from an operational perspective, and it
really gives us this interface, we have a notion of an abstract domain. It knows about a set of
variables that we keep track of. There's a set of abstract elements, which is how we represent our
abstract values. There's expressions, which we use to represent program states, and statements.
And the functions that we have from an abstract domain is we have an abstraction function to go
from expressions to elements of abstract domain. We have a concretization function, so that we
can say what the result is, how to understand an abstract value. We have an order, so that we
know what does it mean that we've converged -- no more new states -- so what abstract set is a
subset of another abstract set. And we have an abstract performer. This is a function that takes a
concrete statement, like x gets x plus one, and changes it into a statement that operates on
abstract values, that takes an upper and lower bound and tells you how plus one transforms an
upper and lower bound.
And then we have the meet, which is intersection, join, which is union, and widen, which is our
return [ph] to go with this. So here is an example of how an abstract domain looks like, in the
case of a box abstract domain. So our expressions are all statements of the form x, with an upper
and lower bound. Our abstraction gives us an abstract element, which will be just a two pole,
representing the lower and upper bound of a value. Our concretization takes us back. The
operations are very straightforward. If we want to take an intersection of two intervals, then we
just take the max of their lower bounds and the mean of their upper bounds. If we want to take a
union of two intervals, then we take the mean of their means and max of their maxes. If we want
to do an abstract transformer, in most cases, you apply the operation to the lower and upper
bound, and that's it.
Note that, here, we have some imprecision. So when we do a join, if, for example, we have an
interval on one, three, and an interval seven, 12, a join of them is the interval one to 12. The sum
over approximation, we added more things which were not actually part of this interval, and this
is where the scalability of the abstract interpretation is. Because we restrict ourselves to very
simple elements, the abstract domain, we can manipulate them very efficiently, but we pay the
price. We can't express everything we want to express. So let's see how this influences the
results, so we're going to run, again, an abstract interpreter in this program, and it's going to fail.
So at first, we assume that i is either one or two. We can represent that as i is between one and
two. We go into first branch of an If, we know i is one at this point. X gets i, so x1 is one at this
point. The other branch, i is two. X2 is minus four. Got that. And now we have to do a join,
and as we do a join, we now lose some of our precision, because we know on one branch, on this
branch of the If statement, we had something about x1. On this branch of an If, we had
something about x2. When we join, we know nothing about x1 or x2, because we don't know
which way we got to the join point. And so we'll only learn that i is between one and two,
because i was one here and two here. And now, as we go into our second If, we get i is one and i
is two, so both of those things are reachable, but we don't know anything about x1 or x2, so we're
going to create a false positive. We're going to say that this assertion, as far as we know
abstractly, we cannot prove that they're unreachable. But if you go through the program, they
aren't, so that's a false positive, so that's the situation that we want to fix.
The solution that we're proposing is something which we called VINTA, which wraps an abstract
interpretation into an abstraction refinement cycle, so at a high level, it's really a very simple
idea. You start with a program, and you run an abstract interpreter. Well, what an abstract
interpreter will do, it will compute to some inductive invariant. You check whether this
inductive invariant happens to be safe. If it is, you're done. The abstract interpreter has proven
that your program is correct, and you're done. On the other hand, if the invariant is not safe, then
you know that the abstract interpreter looked at part of your program and found some
counterexample. So you switch to the refinement phase, and you check all of the execution that
the abstract interpreter has looked and check whether they have a concrete counterexample. If
they do, then you terminate, and you say the program is unsafe, and you have a counterexample.
Otherwise, you use interpolation to figure out what the facts has the abstract interpreter forgotten
that a part that the bounded program that it had looked at that makes it safe. And then you use
those facts to strengthen the state of the abstract interpreter. So you have an inductive invariant
here that's not safe. You come here, you check it, and you find out that there is no
counterexample, and you come back, and now you have something which is safe -- an error state
is unreachable -- but no longer inductive. And then you restart an abstract interpreter from that
point to compute the next fixed point, to add more reachable states until the set becomes
inductive. And then, when it's inductive, you check if it's safe. You're done. If it's not safe, you
go back and you possibly strengthen it again, and you keep iterating back and forth.
There are several novel things we had to do in order to get this to actually work, and so I'm
listing here sort of what's normal from the abstract interpretation perspective and what's normal
from the refinement perspective. So from the abstract interpretation perspective, one novel
element is that we used cutpoint graphs instead of control flow graphs. What that means, that we
compute an abstract value for every loop head as opposed to for every basic block, as opposed to
every state. It also means that when we say that a program makes a step, it's a very large step.
It's basically some kind of loop-free program segment, as opposed to a single thing, like x gets x
plus one, which makes our abstract interpretation step a little bit more complicated, but it can
also be a lot more precise, because it can be exact on these intermediate steps.
Now, the second thing that we do is that, instead of just computing what's reachable at any
program location, we compute an unfolding of a control flow graph. If we come to a loop
multiple times, we keep multiple copies of it as sort of an unrolling of a control flow graph that
the abstract interpreter had looped. And it has two side effects. One is that you know everything
that abstract interpreter has explored. This unfolding at the end is sort of the bounded program
that the abstract interpreter is basing its result on, and that's what will be used in refinement.
And second, it means that we always compute disjunctive invariance. So what I've showed you
before, the box domain, for example, it will compute an upper and lower bound for every
variable and every location, but because we unfold the control flow graph and because the
allocation that is part of a loop will appear multiple times in a graph, we automatically compute a
disjunctive invariant, because the invariant for that location will be the union of the labels at all
locations corresponding to it.
So this ->>: Do you map? Do you also keep track of the number of iterations that was used for that
disjunct?
>> Arie Gurfinkel: I'll show you in the next slide how it looks like, so this is sort of a side effect
of keeping it unfolding, because we keep the whole unfolding. You could get all of this
information from it. We don't need to keep as a separate number, say, in what iteration was this
label computed. It's simply an annotation of an unfolding graph, and then in this graph, you
know, given each loop -- you know, say, how far it is from the initial state.
But one of the side effects of this is sort of what we do. Given any abstract domain in here will
lift it into disjunctions. We lift it into a power set domain, and so we had to fiddle and get the
widening to work correctly, and so we ended up creating a different widening than any of the
existing ones, that's tightly coupled with the exploration strategy.
>>: When you say disjunctive invariance, is there a fixed bound of the disjuncts? If there is now
bound, then is that related to widening? What makes it converge?
>> Arie Gurfinkel: What makes it converge?
>>: You trade disjuncts for [indiscernible] is the typical.
>> Arie Gurfinkel: Yes, so I don't have a slide for this. It's a little bit trickier, but I don't think
we put a bound on the number of disjuncts, but at the same time, we can prove that there is some
progress following it, that what's going to happen is you have multiple disjuncts, but you're going
to widen. Whenever you want to add a new element to this set of disjuncts, you're going to
widen it with one of the existing elements, and then what you have to show is that you get this
set of disjuncts growing, but at the same time, there's always a path where at least one of the
disjuncts is being constantly widened. And, therefore, that whole chain can never go through.
>>: So widening gives you the convergence?
>> Arie Gurfinkel: Widening gives us the convergence, but whenever you do power set
widening, you always have to be careful, because you may end up that the number of disjuncts
grows. Also, each chain for each disjunct is final, and then you would still not converge.
>>: So in some sense, you rely on the refinement to make it somehow permanent at some point,
right?
>> Arie Gurfinkel: Well, the abstract interpretation has determined it, and then the refinement
will push back on it.
>>: But you could...
>> Arie Gurfinkel: Okay, so let me wave my hand and say it does converge, but actually, I don't
have slides on widening. I didn't think that would be that interesting, but if you want to ask me
later, I can explain how it works. And then, from the refinement perspective, what we do that's
interesting is we use an SMT solver to check for whether the counterexample is feasible, which
is fairly novel in the case of abstract interpretation. And then we use this concept of DAG
interpolation for refinement, which I'll go more about, and what's interesting in this particular
work is that this interpolation procedure, sort of guided by abstract interpretation, they play
together in order to figure out how to refine. And the overall effect is quite interesting. On one
hand, the abstract interpretation is all about throwing away constraints in order to get
convergence, and then the refinement part ends up adding constraints that were thrown that
actually are necessary to prove that certain counterexamples are infeasible. So let me just
illustrate how this works in an example.
So I'm going to take this program and I'm going to take it through different steps and show how
they work. So a few things to know -- when we explore the program, we use weak topological
order, which means we always go for inner loops and wait until they converge until we go out.
We're going to use an abstract domain of intervals in here, and look around that the side effect of
our abstract analysis will be this labeled unfolding. So we start abstract interpretation the usual
way. One means this control node one. Initially, it's labeled with true. Everything is reachable
initially. We make one step, we figure out that x is 10. We make another step. We figure out
that x is less than or equal than 10, so we did a little bit of widening here, and we went from
eight to 10. Then, we go one more through the loop. The effect was less than 10, then we
subtract two, it remains less than 10, so we got it to converge. We don't get any more reachable
states. We're done with this loop. We now look what's reachable other than the loop, so we'll
get the three edges to location three, and right now we can prove that when you get here, x is less
than or equal than 10. Therefore, this If condition is possible, you go in, x is nine, and you hit an
error. Okay, so now we're going to raise the alarm.
So this is the end of abstract interpretation phase. So we run abstract interpretation phase, as
usual, but look what we get as a side effect, is this unrolling. So instead of, for example,
maintaining that x is less than 10 for location two, we have this full unfolding, and we know that
our invariant now at location two is a disjunction of those levels. Does that answer your
question?
>>: The first question, yes.
>> Arie Gurfinkel: So that's how it goes. So, now, when we have an alarm, we want to check
whether this alarm was spurious or not, and basically what we're checking is, whether in this
bounded program, this location is virtual. And to do that, we switch to an SMT, just generating
verification conditions for this bounded problem, and checking it using SMT. So we start with
unfolding, we forget about our abstract interpretation labels, and we start generating a
verification condition. For each edge here, we'll just assign what the action of that edge is, and
then we build a control flow encoding where, for each node in the control flow graph, we'll add a
Boolean variable and basically specify that if you happen to be in one location, then in order to
get to another location, the edge condition has to be true. Then the same for each individual
node.
So this encoding is really straightforward in the sense that it's linear in the size of this unfolding,
so it's very easy to generate, and then we give it to an SMT solver to check if there is a
counterexample. There's two answers. Either there is a counterexample and then the abstract
interpretation phase gave us a good alarm, you can produce this counterexample. The other
result is, no, there is no counterexample, and we need to do some kind of refinement. So to do
refinement, we're using Craig interpolation, and I know -- are people here familiar with Craig
interpolation? I see somebody smiling. So Craig interpolation is an old theorem that says
something pretty simple. It says, roughly, if you have A that implies B, then there exists an i in
the middle such that A implies i and i implies B, and i is in the language which is common to A
and B. And here, I'm stating it sort of in a slightly different way, where I'm saying, well, A
implies not B, just because that's a more convenient way for software logic and application.
So Craig theorem has been around for a long time, and we also know that it's quite operational.
It's quite easy to construct the Craig interpolant, given, say, a SAT proof or an SMT proof, given
a resolution proof. And model checking, it's used to over-approximate the set of reachable
states, but let's see how we want to use it here for refinement. So this statement is not quite good
enough for us, because it talks only about two parts. We have some prefix and some suffix, and
we can compute what's reachable by a prefix before a suffix, but in other cases, we don't have
simple two parts, A and B, and we don't even have a path. We have a directed cyclograph that
represents our unfolding. And so the problem that we want to solve is something that we called
a DAG interpolation problem, and roughly speaking, it says something like this. It says, given a
graph, a DAG, where each edge of a graph is annotated by a formula, so formula pi applies in
here, decides that on any pass, a conjunction of the pi formula is unsatisfiable. What we're
looking for is the set of these orange labels, i, such that each label is an interpolant between
every prefix and every suffix, and every label that, together with any edge, implies the next label.
So here is an example. So, for example, i2 has to be an interpolant between pi-1 and pi-8,
because of this, but also it has to be an interpolant between pi-1 and pi-2, pi-3, pi-6 and pi-7.
And the second condition is that, for example, i2 and pi-2 has to imply i3, and i2 and pi-8 has to
imply i7 and so on. So, if you think about these conditions carefully, you'll see that really what
we're looking for, a sort of Horn style proof of correctness of this program. We're looking for
intermediate states i1 to i-n, such that the initial state is true, the final state is false, and any state
together with any statement implies the next state.
Now, the question is, how do we compute such a thing? And there are multiple ways to do so.
so one way is to just cast this as sort of a Horn problem, where you treat each individual i as an
unknown predicate, and you just pose these restrictions that you want as a Horn satisfiability
problem. But another way is to turn it into an interpolation problem, into a linear interpolation
problem, that you encode this whole program just as a normal verification condition, give it to
SAT solver, get a result and then mine the result as proof for all the labels. And this is...
>>: Isn't fair to say that the linear interpolation methods predates this view, this other view?
>> Arie Gurfinkel: Right. The linear interpolation method predates. The linear interpolation
method is in 1957.
>>: The tools that you're using also predates.
>> Arie Gurfinkel: Right, and the tools that we are using predate the Horn formulation, at least
in SMT world. So the way we actually solve this problem is sort of in this very simple way. We
translate this whole problem into a sequential interpolation problem that's already supported by
existing tools. And the idea is quite simple. At first, we take this graph and we build a
verification condition, and we realize that the verification condition has this form, where you
have assertions for different locations in the control flow graph, and you can order those
assertions in a topological order, so you have a linear sequence, that each element further in the
sequence means you're deeper inside the graph.
We then can compute a sequence interpolant for each -- for the cut between each location and the
following locations, so sort of cutting the graph in different places. And then this will give us
almost the result that we want, except, if you follow the definition of interpolation, you may have
variables which are out of scope at a given location. If you have two nodes of a graph which are
siblings of one another, then variables that are only available for one sibling may appear in an
interpolant for another, and that's a problem for us, because at the end of the day, you want to get
something that's maybe inductive, and so you don't want to have any free variables, any out-ofscope variables in the expression. Otherwise, they will have to be quantified out, and you have
to deal with quantified formulas.
And so we go through this cleaning process to somehow get rid of them. So that's the price we
pay for using a sequential interpolation method, so we get an easy encoding into it, but the output
is not quite what we want, and we have to spend some time on this cleaning phase. Actually,
we've spent quite a while playing with various versions of this. The original one used quantifier
elimination during cleaning, when all heuristics failed. We have a new encoding that now can do
clean completely using projection and no quantifier elimination at all. And I'm looking at
various other ways, using, for example, the Horn procedure that three has, allows us to solve this
whole problem at once, and also to solve just the cleaning part of that problem. So that's that.
Let's go back to our running example and see where we are. So what DAG interpolation will
give us is we have this graph. We know that the result is unsatisfiable, so when we crank the
DAG interpolation handle, we're going to get this i1 to i3, in this case, where i1 will be true,
because everything is reachable here, and i3 will be false, because in this particular example,
error is unreachable. And this is true for every edge. So now, we could have taken these i labels
and used them to strengthen the result of abstract interpretation. We can go to every abstract
label and add those labels to it and say that's also true, and that prevents an error from happening.
There's one catch here. By the time we got to this point, we already forgot anything that abstract
interpreter has computed, and so we end up redoing a lot of the work that it has done. And we
found this approach to be really inefficient when you get larger and larger unfoldings generated
by the abstract interpreter.
So what we really want to know is we want to see, well, the abstract interpreter already found
some bound on the reachable state. What we really want to know is what else we have to add to
that bound in order for the program to be safe. So we only want to know how to sort of help the
abstract interpreter, as opposed to redo its job. And the solution is quite simple. We simply say,
well, then what it means that what we want is sort of a restricted DAG interpolant. We want to
take our program, take the result of the abstract interpretation, and just say, "Well, let the
program assume them." So at every program step, we can just add an assertion and say, well, if
an abstract interpreter said x has this bound, just assume that x always has this bound at this
particular point.
That's not going to change our program, because we know that the abstract interpreter computed
a set of over-approximation, but it will change what a solver has to discover, because those facts
are already there. They're already available, and they can be used. And then the solution to this
problem, the solution to the interpolation problem will be, well, what other facts do I need to add
in order for this problem to be unsatisfiable.
>>: Is there a step that takes these i's and conditions them to the AI domain that is being used,
because these i's may not be discovered in the same form that the domain of AI is? So is there a
step that conditions them, and does that lead to -- maybe lead to loss of...
>> Arie Gurfinkel: This is actually some of the things that we're looking at right now. I have a
couple of slides at the end to address this question. So, yes, you absolutely are -- the question is
that these labels may be in arbitrary form, arbitrary SMT formulas, whereas for an abstract
domain, we need something like a conjunction of linear inequalities, a convex hull of some sort.
So how do you go from one to another? And the answer, in practice, how do we go between one
and the other? We just have a very simple heuristic to get something in an abstract domain that
over-approximates each label, but really solving this problem for real is quite difficult, and we're
working a little bit on it.
>>: Simple question on the side is that, when you conjoin AIs to the edges on the right, doesn't it
correspond to just conjoining AI to the precondition applications? I'm wondering why you did
not just add AI as a conjunction to the implication, next to TIJ?
>> Arie Gurfinkel: Except you're thinking about it in terms of a Horn presentation, so you're
thinking about i as the uninterpreted predicate and this is given to you, whereas I am saying this
is what's given to me and I produce i's, which is the solution to my Horn problem, and it satisfies
this predicate.
>>: But it satisfies a...
>> Arie Gurfinkel: Yes, it satisfies.
>>: Thank you.
>> Arie Gurfinkel: But this is where the problem statement is. This is the statement about the
result, whereas in a Horn formulation, you state that you want a result that satisfies this and the
solver gives it to you. But the point is that the output that you get, you don't get a DAG
interpolant. You don't get something that's good enough on its own, but something that's good
enough together with abstract interpretation. So if we go back to our running example, and we
would have run this whole process here, this is what we get. We have the blue labels, which is
what the abstract interpreter has computed, and we have these orange labels, or the non-blue
labels, which is what we've computed by the interpolation. And so what you have to note here is
that, for example, this label happens to be true, which basically says the abstract interpreter
already knows enough about this particular location because it knows that x is 10. I don't need to
add any more. And here, I want to add that x is not just less than 10, but it's also eight, and here
I want to add that x is not just less than 10, but it's also six.
And so now, once we've strengthened the results, so we still have sort of a reasonable abstract
interpretation state, and in this case, all of the constraints happen to be inside the abstract
domain, but one thing is broken. And the one thing that's broken is that this result is now safe,
but it's no longer inductive. So we know that because we know that this edge is no longer true.
This relationship, that the set of reachable states here is a subset of the reachable states here is no
longer true. X is six is not a subset of x is eight, and so we can restart the abstract interpretation
right from this point. So this is where we need this ability to take this label, convert it back to an
abstract domain, and then we can run the abstract interpreter again. Here it will find out that,
well, if x was equal to six and we run one step, it's less than six. We converge. We come here,
it's less than six, and x is never nine. That's it.
So just to give you a high-level overview of what's happening here is we run an abstract
interpretation, but at the same time we keep our unfolding. We keep the bounded program that
the abstract interpreter has given us, and we get blue labels. If an error is reachable, we go to
refinement phase, which basically asks, give me a Horn-style proof why you think this bounded
program is safe. That's our orange labels. We then mix the orange labels with the blue labels,
which now proves that this bounded part is safe, so we have something that's safe but is no
longer inductive, and if it's not inductive, it means that there are some loops which are not
covered, some loops where we haven't explored all the reachable states. We find out those loops,
and we restart the abstract interpretation from this point. And that's it.
So this is implemented in a tool which we call UFO. It's available here, with source, so feel free
to look through it, run it, try it, and contact me if you have any sort of questions or need help.
This is the architecture of the tool. We have this big front end, which I'll talk more about, that
basically goes from C into an intermediate representation via LLVM, and then the part of the
tool is this art constructor that builds this graph, which we call an abstract reachability graph, and
you can control it by giving an expansion strategy, how you want to expand the graph, what
abstract domain you want to use and how exactly you want to refine. There's lots of chokes on
how to employ DAG interpolation. It uses two SMT solvers. It uses Z3 almost for everything,
and it uses Mathsat for interpolation. Are there any questions so far?
>>: All of your examples were type safe and very simple, and I was expecting to hear that, no,
this only works on type-safe language, but you said C. how do you deal with type unsafe?
>> Arie Gurfinkel: I pretend to not know about it.
>>: Okay.
>> Arie Gurfinkel: So this has to go about the software verification competition. So all the
examples I've shown you were small, just to fit on the slide, but I'm claiming this works because
of the software verification competition, not because of the small examples. And now one of the
-- so, okay, let me tell you a little bit about the competition. That's a very serious question, and I
don't have any good answers. So the software verification competition...
>>: So I'm trying to figure out how applicable it is.
>> Arie Gurfinkel: So the software verification competition started in 2012, and so this year was
the second year. It's held as part of ETAPS and collocated with TACAS conference in ETAPS.
Its goal is to provide a snapshot of state-of-the art software verification tools. It had quite a few
participants and quite a few benchmarks. So one of the issues with any competition like this, so
the decision was made to use C as the language, so that the tools actually support some realistic
language, not a toy, not an intermediate language. But the problem is that we don't have formal
semantics, even for type safety, even just settling on the formal semantics of the language is
impossible. And here a decision was made that we'll just the semantics which are reasonable for
all the benchmarks to which all of the participants agree. And the way it works is that there's a
large collection of programs, everybody runs their tool, they get different -- benchmarks are
marked whether they should have an error or don't have an error, everybody runs their tool,
decides on the results, and if people happen to disagree, there is a discussion and we decide
whether the benchmark is kept, removed, or what to do with this particular case. So in that
sense, the answer is sort of very pragmatic and driven by the benchmarks, as opposed to trying to
answer a more general question.
>>: [Indiscernible].
>> Arie Gurfinkel: You could, so the way we actually do, we live inside LLVM. We let LLVM
get to an intermediate representation, at which point we only work at the level of numeric
registers that LLVM provides, so LLVM compiles the program down to infinitely many numeric
registers and memory. We treat memory as completely nondeterministic and numeric registers,
that's what we analyze. So it can give you a formal semantics of what we verify with respect to
that, but tying it back to C is quite difficult.
>>: Do you not handle pointers at all?
>> Arie Gurfinkel: We don't handle pointers at all, but it's not to say that these programs don't
have pointers, that we don't handle programs with pointers. That is to say that pointers, a lot of
the pointers are removed at the preprocessing hub, when the program is compiled down to
registers.
>>: By what?
>> Arie Gurfinkel: By LLVM and by our process. If you can prove that a certain location is not
alias, then it will be compiled down to a register, and then all the pointer references will be out.
So this program, so the benchmark consists of a large collection of C programs, ranging from
sort of things that people traditionally use for their tools to a big set of Linux device drivers. So
they're not toy programs. They don't use toy parts of a language. They do whatever they want,
but most of the properties that we want to prove are the kinds that don't really depend on deep
point reasoning or type safety, things like that.
>>: But you still have some kind of deep domain for the abstract interpretation, right? So is that
sound, in a sense?
>> Arie Gurfinkel: Yes. It is sound, because I assume that heap is completely nondeterministic.
So if you write something to heap and then read something from heap, you get a
nondeterministic value, so that's sound. It's just maybe not precise for anything interesting that
you want to say I put something in the heap, I build the link list, and then I want to navigate
through it. The tool will just tell you you'll end up in an arbitrary place.
>>: Okay, so you get aliases which might not be aliases, for instance.
>> Arie Gurfinkel: Right. But the reason why this works is that the front end takes care of
lowering a lot of the pointer reasoning. Any sort of shallow pointer reasoning gets lowered into
registers by the front end.
>>: So there are model checkers who do the same thing, which define the predicates based on
counterexamples, so at a high level, what is the advantage of adding abstract interpretation
giving? Is it more precise somehow? It converges faster because of widening, it wins
competitions?
>> Arie Gurfinkel: It wins competitions. It wins competitions. That's Nikolaj's answer. So the
tools that were participating, so the ones that are involved, are the ones that somehow saved the
same problems as our tool solves. The tool is called UFO. There are three tools which are sort
of in a different -- CSeq and Threader do concurrency and only concurrency, and we don't, so
we're never compared with them. And Predator does sort of deep memory things, which, again,
we haven't competed in this nature. But for all other benchmarks, UFO seems to perform a lot
better.
>>: Why is my question.
>> Arie Gurfinkel: Well, it's hard to dish out why, if you pick a particular point and you say, "Is
this particular thing important?" I can say, "Yes. If I turn it off, it doesn't perform as well. But
if you just take this particular thing and remove everything else, you don't know. So why is
abstract interpretation important? Well, because it can discover facts out of an infinite domain of
predicates. If you use predicate abstraction, you may get stuck trying to figure out what is the
right predicate, whereas here, you start with an infinite set of restricted predicates, and you use
widening to find out what is a good set. But widening can overshoot, and so you use sort of the
same refinement techniques to bring you back into a safe region.
So any more questions? Okay, so this is how the competition was run, so the way the scoring
was done, you get points for solving things correctly. You get negative points for solving things
incorrectly, and it was deemed that finding a counterexample is easier than providing something
correct, so you get more points if you prove something correct when it is correct, and you get
more negative points if you prove something correct which isn't. And the distinction in these
numbers was partly influenced by the previous year, where a toolset would do bounded model
checking, would look at only a few executions, would really win everything because they would
look at a few executions and say the program is safe, and the distribution of benchmarks was
such that if you have an error, it's a very shallow error. So those tools would constantly win.
Now, this year, there's many more benchmarks, and that's no longer true, but also the scoring
was changed so that simply guessing things would penalize you more. So, well, that's our
outcome of this, so we've participated in four categories, which is Control Flow Integers, which
has all sorts of the traditional benchmarks used by software model checkers, and then Product
Lines, which comes from some set of examples where there is the same program but different
configurations of it, Device Drivers, which comes from Linux device driver verification projects,
and SystemC, which is SystemC programs converted to C with a scheduler. So SystemC is a
hardware description language that describes concurrent systems.
In all of those cases, the tool performed much better than just predicate abstraction by itself, and
the abstract domains were sort of crucial to this, so if we look at the benchmarks, you'll see that
if you want to find a bug, then it seems to be really good to use VINTA with a box abstract
domain. And the intuition here is that the box domain is not very precise, but if it can tell you
that you need to unroll a loop, that loop is easy to know that it's safe for a small bound, and so
this domain seems to be really good at figuring out how much to unroll the system so that a
bounded model checker will find a counterexample.
And then we have VINTA with a boxes domain, which I'll talk about in a second, which is a
domain that allows us to have disjunction in addition to intervals, and that domain very closely
mimics the typical predicates that a software model checker finds, except it has all of them at
once, and it seems to be really good at proving safety. So let's see if I can bring up the results.
So this is, if you want to go to that link, you'll see lots of details about who did what. This is our
view, four column. I know I want to go down into numbers, but if you're interested, you could
go and you could click on all of the numbers, and it will give you a comparison of tools and draw
you graphs, and you can try and tease out what's the difference between different techniques.
>>: [Indiscernible].
>>: Arie Gurfinkel: Yes, actually, we got a negative score overall, and we didn't quite realize
what this category was. So the idea for the overall category, we saw that it was sort of the
addition of all the other things, but in fact, the overall category was all the tools surrounding all
the benchmarks, even the ones that they didn't know anything about, and if the tool gives you a
wrong answer as opposed to unknown, it's penalized. But you are allowed, if you didn't know, to
check the benchmarks. For example, if the question was about memory, you could just say
unknown, whereas we didn't, and so that's what separates this column.
Okay. So let me tell you more about what else. So VINTA I think was the main reason why we
did so well in the competition, but there were a number of other things that really influenced, and
those things which we don't typically publish. They were small things that end up making a big
difference. So the secret sauce, the important things were at the front end. I'll tell you more
about it. Combining with abstract interpretation. The boxes abstract domain I'll talk about in a
second. The civility of computing DAG interpolation, so computing a solution to this whole
problem at once, as opposed to doing it pass at a time, lazily, or any other way. And then, at the
end of the day, we do run a lot of things in parallel, so we don't have to worry about which
particular setting is the best.
So the front end is something that in principle is really, really simple, but in practice, it's
extremely messy. So what ended up happening is, it's really a huge mess. So there is a CIL pass
that goes through the code and normalizes it, and this is basically where I try to fight against the
software competition semantics, because their semantics have nothing to do with what a
compiler thinks legal C should do. For example, in a software competition, it was assumed, if a
variable is uninitialized, it can have an arbitrary value. When a compiler sees an uninitialized
variable, it says it means it's undefined, and therefore it's legal for it to do anything. So if you
have a branch on an undefined variable, you could pick the easiest path you want to take and
optimize the other possibilities. So there is a CIL pass that sort of tries to do this at a syntactic
level to try and fix all of those things. For example, we had just calls to nondeterministic
functions to initialize every variable.
Then there is an LLVM-GCC pass, which uses GCC to take the benchmarks and convert them
into LLVM land. And then there is LLVM optimizer, based on an old version of LLVM that has
a good optimizer, but not too good, that we use in order to simplify memory and do all of those
things. And the problem here, if you switch to a new version of LLVM, the optimizations are
too smart, and so again, you get into this clash. So, for example, in the competition, it was
assumed that if you get a nondeterministic integer, using a NonDetIn function, and you cast it
into long, you get a nondeterministic long. Well, what a compiler will tell you, no, you still get a
nondeterministic int, and the top bits of this long will be zero. The new LLVM knows about this
and simplifies based on that. Older LLVMs didn't.
So after you get through all of this, you could look at what happens to the benchmarks, and this
is quite interesting, that out of 1,592 SAFE benchmarks, the front end happens to prove 1,321 of
them, so many of them are just simple things that you could do by constant propagation and
simple simplifications. For the UNSAFE examples, there's fewer of them that the front end can
remove.
So a big part of this is just having a really good front end. Now, the second thing is this boxes
abstract domain, which is unique to our tool, and the idea here is to say that, in most cases, when
we think about programs where software model checker did well, we want to find an invariant
that's not a convex hull. It's not an upper and lower bound to every variable, but it's more a
disjunction of such things. And so you'd want an abstract domain that's not a single box, not a
single min-max, but rather a bunch of such boxes. If you start thinking about it, it's easy to see
what this domain is, but the difficulty is how do you maintain it? How do you represent this as a
formula so that it doesn't block? And the solution that we were using is using decision diagrams.
So a while back, we've extended decision diagrams linear arithmetic, so that's decision diagrams
where each node, in addition to having the true and false child, is labeled by a linear arithmetic
formula. And the diagram knows about it and does simplifications based on that.
The boxes domain basically uses this data structure, where each node represents a single minmax constraint. The interesting thing about this representation, that it's canonical for this
domain, so it behaves virtually like BDDs and can be implemented really efficiently and lets you
compact to represent those things.
And then, finally, we run everything in parallel, so we looked at various settings that the
algorithm has using an abstract interpretation, using predicate abstraction, using different forms
of refinement, and we couldn't find out any sort of best strategy. Since we were running out of
time and the competition had eight cores, we figured out we could run eight different things in
parallel and see what happened. And so this is what we ended up with. There's a whole bunch
of different combinations. So, for example, this one does -- uses a lot of LLVM optimizations,
uses the boxes abstract domain and uses very nonaggressive widening, so it lets you go deep into
the loops, but it's run for a very short time. So that's really good for cases where there's a couple
of iterations of a loop, maybe 10 or so, that you have to get through, and the boxes domain can
keep track of the state really precisely, and then you converge. But if you don't, then that's
probably a bad strategy.
And so if you see all of them, they use either a strong abstract domain or a weak abstract domain
and try and decide whether to run for a short time or for a longer time. So to conclude, more or
less, this is where VINTA came from. This is VINTA's family. We started a while back with an
algorithm called Whale, which was using interpolation to try and reason about recursive
programs. We've sort of abandoned that work because we didn't have any good benchmarks to
tune this algorithm in, and then we switched to the competition benchmarks. We first developed
this UFO algorithm which eventually became our framework on which we do our development
that introduces concept of DAG interpolation and combining predicate abstraction with
interpolation-based verification. And so VINTA pushed it a little bit further by adding abstract
interpretation components, and the difference between these two, conceptually, is quite similar
work, but the details, there's quite a lot of difference. So we had to really work out how the
widening would work, how to interface with the abstract interpreter, and the abstract interpreter
really put a lot of stress on our DAG interpolation component. We really had to tune it a lot
more, just because here the abstract interpreter would go much deeper into any loop than the
predicate abstraction.
Well, are there any more questions? If not, I'm just going to talk about some of the -- one current
work, and just maybe get your opinion, maybe offline. So one of the problems that you actually
asked this question of what we call a symbolic abstraction, how do you build an abstract domain
on top of an SMT solver, or how do you go from an SMT formula into an abstract domain
representation? And this is something I'm working with Yi Li is the main student working on
this, working together with Aws, Zach,
>>: So here is the statement of a problem. So VINTA wants an abstract function, which
basically takes an SMT formula, say over a linear arithmetic to make it simple, and gives you a
result in an abstract domain, maybe a conjunction of linear inequalities, maybe a conjunction of
min-max constraints. And if you think about it for a bit, implementing it is really an
optimization problem. So you could look at it in this way. So, formally, we have a quantify
free-arithmetic formula phi, and we have a bunch of terms, and we have a bunch of terms, t1 to
tk, and what we know is we want to find value for min and max such that phi implies a
conjunction of those both, and we want min and max to be the tightest, and this would be a value
in the abstract domain. This would be a value in the abstract domain of box, but really it's more
general, because t can be an arbitrary term, and of course without loss of generality, we can say,
well, since phi is an SMT formula, it's enough for t to be variable. If you want to make it a term,
we'll just add a new variable to phi.
There is a naive solution, which is to convert phi into DNF, take a disjunct, and then optimize
using either Simplex or Fourier-Motzkin and then go on to the next stage. But that doesn't -- at
least the very naive version doesn't scale, and the problem here is that typically what we want phi
to be is a bunch of loop-free executions of a program, so taking a DNF is quite large. So this is
the solution that we've been playing with. It seems to be fairly effective.
So, in a nutshell, we want to do something really simple. We want to successfully underapproximate phi by enumerating its models. But what happens is, if phi happens to be
unbounded, if it's not a polytope, then if we just enumerate the models, we may go forever into
some unbounded region. And so what we tried to do is opportunistically check by basically
shooting a ray into a direction whether it happens to be unbounded. So this is how it looks like
on an example. So say we have these two. This is our phi, this particular shape, and we're going
to start under-approximating. So we're going to give this phi to an SMT solver and ask for a
model, and we may get a point in here. And then we can ask for another model, and we're
optimizing x and y, so we'll ask for a model where x increases, and we can always ask for a
model that's more on the faces of this polytope, by just taking a term corresponding to any of the
faces and saying, "Find me a solution on it." So the next solution could be here. And you ask
for a bigger solution. Okay, so before asking for a bigger solution, this is what we're going to
use in order to decide whether something is unbounded. So we have this fact that if you happen
to have two points, p1 and p2, where x is increasing, so there's a race for p1 and p2, and they
happen to be in the same hyperplane, so phi, and there isn't p3 on which x increases, but that's
not on more hyperplanes than p3 -- than p2 -- then this direction is unbounded. So here, for
example, if I look into this direction, I have this point. I have more points where x increases, but
I have this point that lies on more hyperplanes. It lies on the second hyperplane. So right now I
can't conclude whether the x direction is unbounded.
I ask for another point where x is bigger. I'll get a point here. Well, now I know that the only
way for x to be bounded, if there is another point on this hyperplane that intersects some other
hyperplane, I can check for that. I can have the equations of all of the hyperplanes, I can just
check -- does there exist a value where x is bigger? Yes. And that there exists a value where x is
bigger, yes, and that there exists a value where x is bigger as it intersects one more hyperplane.
And if the answer is no, then I know I have this whole direction. And once I have got this, I now
go and look for other values for x, so maybe I can ask for the lower value of x, and I'll get here.
That also happens to be a lower value for y. There isn't one. I ask for a higher value for y.
Maybe I'm lucky and I get here, and now I have my convex hull. So that's our current solution.
Okay, looking for opinions for how to make it better. So one sort of limitation here is that, when
we get to this point, we don't have any control. It seems like if we're using an external Simplex,
we could always get a point which maximizes x, for example, in a given polytope. But when we
use an SMT solver as the interface we have now, we just ask for a point, and then we ask for
another point in a completely separate query, as opposed to trying to say, "Well, give me a point
and then try to find another point in the polytope you're already in that maximizes that direction."
That's it.
>> Nikolaj Bjorner: Cool, you're done with the talk?
>> Arie Gurfinkel: Yes.
>> Nikolaj Bjorner: Well, thank you.
Download