>> Tom Ball: So we'll get started. I'm... and visited us. Actually, it was March 3rd. ...

advertisement
>> Tom Ball: So we'll get started. I'm Tom Ball. And about a year ago Ken McMillan came
and visited us. Actually, it was March 3rd. We checked this morning. I was -- if you go back to
Resnet, check me out, 50 pounds heavier introducing Ken. But Ken hasn't changed at all in the
last year. No, I think he has.
>> Kenneth McMillan: A little gray hair.
>> Tom Ball: Ken has changed a little bit I think. And so he's going to tell us a little bit about I
guess what's changed in the last year since he came and talked to us.
And we welcome Ken McMillan very much back to Microsoft Research. Please, take it away.
>> Kenneth McMillan: Thank you. Well, I'm not sure, actually, how much will be different.
We'll see by the time we get to the end if there's anything new that I have to say.
I mean, what I'm going to talk about is how we cope with complexity in dealing with and
verifying hardware systems and software systems sort of at a fairly high philosophical level with
a little bit of technical detail.
And mainly I'll be discussing what we know about the problem of relevance in verification and
where that comes from. And since this is a job talk, I'll include a few advertisements along the
way of things that I've worked on.
So okay. I think everybody in the room probably has a good sense of the complexity of the kinds
of systems that we develop in hardware and software. I think hardware and software systems are
probably the most complex objects that human beings have ever created.
And you might ask, well, how do we manage to create such complex things and have them work
most of the time. And you probably also are familiar with the answer, we do it by debugging.
That is to say, we design something that's approximately correct and we test it and we fix it
where it breaks until its moderately reliable.
So the consequence of that is that the primary task of design of a complex system turns out to
actually be verification; that is, finding the bugs so that we can fix them.
And in fact if you look at a hardware design project, if you look at a chip design, people are
spending some say as much as three-quarters of the engineering effort overall on just verifying,
just doing simulations of the chip and trying to pull out the bugs.
And of course as we know, when we fail at that, the costs of even very small errors can be very
large. So in a chip design it could be half a billion dollars conceivably, as happened in the
1990s.
And we've also seen that in software projects. A very small error in the code could result in a
security vulnerability, they could have a very large economic cost.
So because of this process of design that we use, of design by debugging, it turns out that when
you buy a piece of software, you get no warranty, you get no guarantee that the software does
anything at all. And that's because fundamentally we do not know how to design correct
systems.
So I would say, and I would hope that some people in this room would agree, that correct design
is one of the remaining unsolved grand challenges in computing. It's something really worth
putting our effort into.
Now, naturally, you might imagine that the way to attack that problem would be to apply logic in
some way, to apply logical proofs in order to help us to design correct systems, not just by
debugging. But in practice, of course, constructing proofs can be an overwhelming task. The
proof can be even substantially more complex than the system itself. So it's been argued that we
need to have automation to help us do that.
Well, that brings me to the topic of model checking, which is an automated method to help us do
proofs about systems.
Now, I'm afraid that those familiar with model checking will have to bear with me for a few
slides here. But a model checker is a tool that takes some model of your system, typically a
finite-state model, and it takes a logical specification.
So if my system were a server and P was a condition of receiving a request and Q was the
condition of sending a reply, then this specification would say always when a request arrives
eventually in the future we receive a reply.
So that's the kind of temporal specification that we can typically make in model checking, and
we put these things into a model checker. And we can get one of two answers, either yes, every
behavior of that system satisfies all logical specification, or, no, and here is a behavioral
counterexample, which is an execution trace of the system that would show, for example, a
request being received without a reply being sent.
So it's been argued that this ability of model checking to produce behavioral counter examples is
a major advantage because it helps us to debug our systems, it helps us to find out what's going
on.
So model checking is not that complicated. The simple case -- simplest case of it is, of course,
just a simple safety property which says there's some bad condition that I want never to happen.
So if I have a simple safety property, I could prove it by a reachability analysis. So I could take
all of the states of my system here, which I've drawn as circles, my state is an assignment of
values to variables in the system, and I can draw a transition graph where I have arrows between
two states. If you can get from one state to the other by a single transition of the system.
And then reachability analysis is essentially a search in this graph. So I designate an initial state
and I designate a bad state that I don't want to reach, and then I can just do a [inaudible] search.
So I'll find all the states that I can reach in one step, two steps, three steps, four steps, and so on,
and perhaps I'll eventually reach the bad state, and I can trace this back to an error path that tells
me why the system fails. So that's a counterexample.
Now, if I were to fix the system by, for example, removing a transition arc, then if I did my
reachability analysis, I might reach a fixed point, which is a condition where I cannot reach any
additional states, and then I've computed the reachable state set of the system. And since that
doesn't contain my error state, I would say that my simple safety property is verified.
So model checking is only a little bit more complicated than that. And, you know, if we want to
think about model checking, you know, reachability analysis essentially captures the idea it's
computing a fixed point or searching in a state space.
And, in fact, this very simple technique can find subtle bugs in circuits and protocols, but of
course it suffers from the state explosion problem, which is that the number of states of this thing
can be exponential in a number of state holding components in the system.
So that means I can only do this, I can only draw this graph for small systems.
So that leads me to my first advertisement, which is about symbolic model checking. And this
was a technique that I developed actually for my thesis that avoids building that state graph by
essentially computing a succinct representation for a large set of states. So somehow capturing
the regularity in that set of reachable states so I don't have to write it out explicitly.
And the classic representation for doing that was invented -- or techniques for it were invented
by Randy Bryant. It's called a binary decision diagram. And that's essentially a compact
representation of a decision tree.
So if I have a decision tree here and it tells me, for example, whether a given state is reachable.
So 1 might mean reachable and 0 might mean not reachable.
Well, I can essentially just squeeze out the redundancy in that representation to get a binary
decision diagram, which is a compressed representation. And if I can do my model checking
only working on the compressed representation, then I can potentially avoid that problem of state
explosion.
And one of the things that I applied that to in my thesis was multiprocessor cache coherence
protocols, where we viewed the protocols by which a multiprocessor keeps its various cache
memories consistent essentially as a set of finite state machines that are communicating to each
other over some kind of store-and-forward network.
And if we can model the protocol at that level, then we can model check it. And I showed that
symbolic model checking could, in fact, detect very subtle bugs in these protocols that would be
very hard to find on the actual hardware and that it allowed that verification to be done in a
somewhat scalable way, so I could talk about a large number of caches interacting with each
other and therefore be able to find the bugs in the system.
And I think this was actually the first commercial application, I'm not certain, but the first
application of model checking to a real commercial machine.
So that's nice in a very abstract way. But what about the real world? I mean, how do we deal
with the complexity of the systems. Please.
>> [inaudible] how many of the bugs were [inaudible]?
>> Kenneth McMillan: Right. So the bugs were typically safety or deadlock. Yeah. So I don't
know how you want to characterize deadlock, but the most interesting ones were deadlocks.
Or maybe livelock is the way to put it. You get a situation where the system wouldn't
completely halt, but some particular event could never occur.
Okay. So how would we, for example, cope with the complexity of this object. I mean, if we
want to talk about a microprocessor and say something interesting about it, we might have to
think about something on the order of 100,000 registers, which is much larger than what we can
do with the symbolic model checking technique.
And since this state space is essentially two to the number of registers, you know, this is going to
be a number that's beyond astronomical. And of course if you wanted to think about the
software that you're running on that object, the situation could be even more complex.
So in order to make model checking into a useful tool for engineers, we had to find ways to cut
this problem down to size. And this talk is mainly about one key aspect of that problem, which
is how do you decide what parts of this system or what facts about this system are relevant to
proving a particular property.
And I'll talk about some of the things that we've learned over the last decade or so on that subject
of relevance.
So if I want to talk about verifying a property and facts that are relevant to a property, I have to
know what I mean by a property. And I'm going to distinguish between what I'll call deep and
shallow properties.
So a property is shallow if in some sense you don't have to know very much information about
the system to prove that property. So for my chip, a deep property might be that it implements
the x86 architecture, which would be very hard to prove. While a shallow property would be say
that we have a bus bridge in the system that's giving us access to memory perhaps. And we just
want to show that the bus bridge never drops transactions.
And in order to prove a shallow property like that, we might not have to know very much
information about the system. In fact, we might not even have to know very much information
about the bus bridge to do that.
So there are a variety of methodologies that we can use to reduce a deep property of a system
like this to a large number of shallow properties like this that we can hope to actually verify.
And here's the second advertisement. In the late '90s I spent a lot of time working on techniques
they called functional decomposition. And that meant that we're going to start with some simple
abstract model, like our model of the cache coherence protocol, where we are thinking in terms
of communicating finite state machines.
And we are going to refine it down to some actual implementation. So this is a schematic of an
implementation of such a protocol that was done at Silicon Graphics, and it has a lot of things
going on. You know, we have messages coming in in queues and they're being broken apart,
control information is going this way and data is going this way.
The control is being operated on in some kind of a pipeline, and the status of transactions in
flight is being stored in a content addressable memory. At some point control and data are being
reunited and queued for output. And that's sort of a high-level view of about 30,000 lines of
Verilog that was implementing this protocol in a chip.
So we want to show that a collection of these things together are going to implement this
high-level protocol. And that's a deep property that we wouldn't be able to attack directly.
However, I dropped a methodology that let you essentially break down that deep property into a
collection of shallow properties mainly that are tracking individual transactions through the RTL.
So the properties might be on the order of this. I would say I want all of my enqueued
transactions to correctly match the abstract model. I want them to be matched up with the status
and the CAM correctly. I want the tables to process things in the right way compared to the
abstract protocol. And I want messages to be enqueued with correct data on the way out.
So I'm greatly simplifying here. But the idea is that we're breaking the problem down into pieces
by looking at individual transactions, showing that they move through correctly, and showing
that other transactions don't interfere with them. So we're breaking down deep properties into
shallow properties.
And this was something that an engineer at Silicon Graphics was actually able to do using this
methodology. And he was able to find hundreds of bugs in this low-level design by this
technique and to formally verify its correctness.
So, okay, end of advertisement. If what we're interested in is shallow properties of very large
and complex systems and our problem is to prove them, then the solution is abstraction. Yeah.
>> That particular case study that you referred to was the one that [inaudible] ->> Kenneth McMillan: It was Oscar's thing, yeah.
>> Right. So that indeed was very impressive. What I was wondering -- I mean, [inaudible]
must be a very smart way to be able to push a proof like that [inaudible] afterwards did you find
this kind of replicated in an industrial setting?
>> Kenneth McMillan: I don't think you would -- right. So I don't think you would want to
replicate this. And the reason is it's too difficult. So part of the problem with that is that the
tools that I was able to give Oscar were too primitive. And that has a little bit to do with the stuff
I'm going to talk about, because essentially Oscar was having to do a lot of things manually that
we can now do in a more automated way. Okay. That's the first thing.
And the second thing is that we are now able to do some verification of systems that are infinite
state and so on that would have helped him do less work doing the decomposition.
So an interesting question is how much easier would it be now if you went -- ten years later if
you went back and tried to redo that exercise with the current technology, how much -- you
know, how difficult would that be to do, because that was a -- you know, that was a tour de force
application.
So what's happening in practice with most people is they are essentially writing down shallow
properties of pieces of their design that they're hoping will imply what they want, but they're
doing it in an informal way, because they're not able to put together the proof formally because
the methodology is just too complex to apply.
So I think it would be interesting -- it would be interesting at some point, you know, in the future
to go back and say with all of the technology that we have now, all right, with, you know, SMT
solvers and interpolation and all kinds of infinite-state model checking techniques and so on,
how difficult would it be to do that. Could we simplify it down to the point where someone
who's not Oscar could do that exercise.
Okay. So right. So now assuming that we've reached the point where we have shallow
properties to prove, which obviously might be difficult to do, the question, then, is how do we
abstract the system in such a way that we can prove those properties. How do we extract just the
facts about system state that are relevant to a given shallow property, a very complex [inaudible]
system.
>> Can you define what shallow means or deep means. Or is this the definition: A property that
is shallow is one that can be always there [inaudible].
>> Kenneth McMillan: Right. So I'm saying that a property is shallow in the case where there's
a simple abstraction that proves it. And so that -- you know, that's sort of -- it's not particularly
well defined. And if you're doing it in practice, of course, you don't necessarily know. You
know, you have a sense of if I break the thing up in this way, you know, that each of the
problems I'm going to get is relatively shallow. But in fact you don't find that out until you're
actually able to do the proof.
So this is a -- sort of a vague heuristic definition. And you'll see a number of vague heuristic
definitions to fall on.
Okay. So how do we know what information is actually relevant. And how do we decide this
mechanically. And you might say, you know, that's not a very well-defined problem. You might
say that's even AI complete. But, in fact -- in fact, I would have said that a decade ago or so.
But it turns out that we can actually give some fairly concrete answers based on concepts that
have been developed over the last decade or so, especially in studying the Boolean satisfiability
problem. And I'll talk more about that later.
So this is the way I would describe or I would put the basic principles that we've learned in this
area. And there are two basic principles. The first I might call the parsimony principle about
relevance. It says a relevant fact about a system or predicate is one that is used in a parsimonious
proof of the desired property in a simple proof.
And second is what I would call the generalization principle. It says that facts that are used in
proofs of special cases tend to be relevant to the overall proof. In other words, we start with
simple special cases and we generalize from those special cases.
So simple proofs define relevance. And to find relevant facts, we generalize from special cases.
And those -- this is mainly what we're doing almost everywhere we're talking about relevance.
For example, in SIGAR, we're looking at special cases and generalizing from them.
So what do I mean by a proof? How can I make this notion concrete? Well, a proof is just a
series of deductions from premises to conclusions. So each deduction is going to be an instance
of an inference rule and usually will represent a proof as a tree where we start with some
premises and we arrive at a conclusion at the root of the tree. And each branching point in the
tree is the application of a sound inference rule.
So if the conclusion of this proof is false, that means we have what's called a representation.
And that means that the premises must be inconsistent since they imply false.
Now, exactly which inference rules we use are going to depend on the theory that we're
reasoning in. So if we're just doing Boolean logic, then we can get away with the resolution rule.
For example, if I know that P or gamma holds and not P or delta holds, then since P has to be
either true or false tells me that gamma or delta holds. And this rule by itself is complete for
propositional logic, but if I'm reasoning about arithmetic, I might need to add some more
interesting rules to my system. For example, the sum rule for inequalities, which tells me that I
can sum up the left-hand side and the right-hand side of an inequality to get a fact that it's
implied.
So this is what proofs are going to look like, in very simple calculi like this.
Now, the last thing that we need to do proofs about sequential systems is of course inductive
invariance. And an inductive invariant is just some condition that I'm going to write about the
system, some Boolean-valued formula, and the condition is going to split the state space into two
parts. On the left here I have the states that satisfy the invariant, on the right the ones that don't,
and an inductive invariant is just one that forms a barrier between the initial state and the final
states so that no transitions go across the barrier and therefore I've proved that this is not
reachable.
Now, of course, the reachable states are an inductive invariant, but they might be a very complex
inductive invariant. This might be a set that's very hard to describe and contain a lot of
information, even if I try to represent it BVDs, whereas for any given property an inductive
invariant that proves it could be very simple.
And so mainly what we're doing in verification is trying to come up with these inductive
invariants. So since proofs are made of inductive invariants, I can now say a little bit more about
what I mean by relevance. So I'm going to say a fact is relevant if you use it in a simple
inductive invariant. At least that's a start on what we mean by a simple proof.
So here's a simple example program that comes up from time to time. Here we are setting two
variables, X and Y, to zero. And then in a loop we increment both variables and then while X is
not equal to zero we'll decrement both variables, and at the end of the program we know when
we fall out of this loop that X is equal to zero, so Y should also be equal to zero and we want to
prove that.
So our state variables are the program counter and the two data variables. And our property,
which is negation of the bad states, is that the program -- if the program counter is at L6, then Y
is equal to zero. And the simplest inductive invariant that I could come up for that -- come up
with for that said in addition to the property that either the program counter is at the beginning or
X is equal to Y.
And you can see that this is inductive in the sense that once it becomes true it has to always be
true since we increment and decrement the variables simultaneously.
So my only point about this proof is that it contains two atomic facts here that I am now going to
say are in some sense relevant because they occur in the simplest proof that I know how to
construct; that is, PC equals L1 and X equals Y. And there are of course lots of facts I could
deduce, like X is greater than or equal to zero, that are not relevant to this property.
So if we know these relevant facts, for example, these two atomic facts, there are lots of
techniques that will allow us to construct or synthesize the inductive invariant. The fact that the
logical aura of those two facts is inductive.
So the -- for example, we could use predicate abstraction. The interesting thing for our point of
view is how we come up with these facts in the first place.
So we can learn something about that from the Boolean satisfiability problem and from
techniques that have been developed for that. So SAT is of course the classic NP-complete
problem. It takes as input a Boolean formula in conjunctive normal form and as output it either
gives us a satisfying assignment or a statement that the problem is not satisfiable.
So we can put this pair of clauses into our SAT solver, and the SAT solver will give us out a
model like this, or if there isn't any model, it will say unsatisfiable.
So modern SAT solvers I would say are using the generalization principle to help them focus on
relevant facts. And in particular they will produce refutation proofs in the case where these
clauses are unsatisfiable.
So here's how that works more or less. SAT solvers are doing backtracking search, so we can
just start deciding values for variables, like A equals zero, B equals zero, C equals 1, so on, Q
equals zero. And at some point in this search we're going to become stuck. We reach a point
where we can't give a value for R because either value that we give would make some clause
false. So at that point we say that we are in conflict.
Now, in a classical backtracking search, what you would do, of course, is you would go back to
the last decision point and go in a different direction. But that's not what a modern SAT solver
does. Instead the SAT solver does deduction at this point. That is to say we take the two clauses
that caused us to be stuck. This one says that R can't be false, and this one says that it can't be
true. And we apply the resolution rule to those two clauses to get a new clause that's implied by
what we had before.
And what's interesting about that new clause is it tells us that we were really stuck way back
here. When we decided C equals 1, we were already infeasible because this clause was false. So
that tells us that we need to go way back to the top and branch in a different direction giving
value zero to C.
So what's going on here is conflicts in the search are telling us where to do deduction and
deduction is in turn guiding the search. And we can think of this deduction as an instance of the
generalization principle.
It's taking this failure to find a solution and is generalizing the failure to say not only does this
particular combination not work but any combination that makes this clause false doesn't work.
So we've carved out a piece of the space here and we've generalized our failure, and that has
pushed the search in a different direction, and that's called DPLL.
Now, in the solver, then, we get sort of a feedback between search and deduction. So we're
making case splits. We're searching. We are then propagating some implications of those case
splits to determine when we're in a conflicting state where we can't continue, and when that
happens we are doing deduction or generalization to learn a new fact about the system, which in
turn tells us which variables are interesting to do case splits on.
So we have this sort of a feedback loop that allows the SAT solver to focus on relevant case
splits and relevant deductions, and that sometimes allows the SAT solver to handle problems
with millions of clauses and it helps us to generate simple proofs in the case of unsatisfiability,
because we can focus on relevant deductions.
So what I want to look at next is what lessons can we learn from this architecture and how can
we use this to, for example, generate inductive invariance. How can you apply this same
principle for invariant generation.
Well, here's a very simple way you could approach that. As a special case, we are going to
consider one execution path of a program. We're going to look at one way that the execution
could proceed, and we will just construct that as sort of a straight-line program.
So we're looking at a fragment of the program's behavior. And we'll try to construct a proof by
whatever means necessary, that that straight-line program is not able to violate the property at
hand, and then we'll see if the generalization principle works; that is, we'll see if that proof will
contain facts that we need to build an inductive invariant.
So back to our example program. You recall that the fact that we needed to know at each loop
was that X equals Y. That was the key fact in the invariant. So would that fact fall out of
analyzing a particular execution.
Well, let's look at it. Let's unroll each of those loops twice. And we would get an inline program
that looks like this, where here I've unrolled the increment loop twice and here the decrement
loop twice, and each time I passed through a conditional I have put that condition in brackets. So
to execute this path, we would have to make all of the conditions in brackets true.
And at the end I've stated the negation of the property that I want to prove. So I said Y is not
equal to zero. So the program can fail along this path exactly when we can make all of these
guards true, and I'm going to prove that that's not the case. And I can do that in the Floyd-Hoare
style where I just deduce a sequence of facts, each of which implies the next.
So I start with true and then after initializing the variables I could say X equals zero and Y equals
zero. After I increment them I could say X equals Y, for example, among other things I could
prove and so on until I get to the bottom. At the end I could prove if X is equal to zero then Y is
equal to zero, which means that when we fall out of the loop with X equals zero, we have a
contradiction, and that gives us false.
So this is a refutation of a special case, and you'll notice that it contains the proof or it contains
the ingredients of the proof of the general case, which is that inductive invariant X equals Y.
So of course it didn't have to work out that way. I could have written this proof for the special
case which would have been useful. It only talks about the particular values of Y as I execute the
program. And, in fact, if I looked at longer and longer traces, these predicates would just go to
infinity. I'd have X -- Y equals 3, Y equals 4, et cetera, and I diverge.
So clearly applying this principle for program verification or sequential system verification is
more difficult than SAT. Somehow I have to avoid that kind of divergence, and I'll talk a little
bit more about that later.
But the general principle still applies. And we're looking for a practical method now of getting
these inline proofs that are relevant to the overall proof.
>> So the reason it's more difficult is that in the case of SAT the vocabulary of proofs is limited.
>> Kenneth McMillan: Yeah.
>> It's basically clauses over some fixed set of literals, right?
>> Kenneth McMillan: Right.
>> [inaudible]
>> Kenneth McMillan: Everything is finite. So you have no possibility of divergence. And so,
you know -- right. And also you have to think, well, if I'm looking at this infinite sequence of
paths, all right, is it ever going to actually converge, which just is not an issue in SAT because
it's essentially a finite search.
So okay. So in order to do that, I'm going to apply Craig's interpolation lemma which is a very
basic result from a proof theory from the 1950s. And it talks about pairs of first-order formulas.
So let's say that A and B are two formulas over some vocabulary of nonlogical symbols, like
predicate symbols and function symbols and constants. And I have the usual logical symbols I
can play with.
So if I have two such formulas and those formulas imply false -- that is, they're inconsistent -then the lemma says there exists a fact I can put in between, an interpolant A prime, such that A
implies the interpolant, the interpolant implies that B is false, and the interpolant uses only the
common vocabulary between those symbols.
So if A were the formula P and Q and B were the formula now Q and R, then an interpolant
would be Q because it's implied by A. It's inconsistent with B, because Q can't be both true and
false, and it's written just using the common symbol Q between those two formulas.
So that's an interpolant. And the question is why is that interesting for us.
Well, we can think of interpolants as being explanations in some sense. So suppose, for
example, they have some very large, complex, unknown formula A in a black box and I want to
ask some questions about A to try to understand it. Well, I could propose a formula B and I
could ask is that consistent with A.
So if you want to think concretely, imagine that A is some very complex set of rules for
configuring rackmount servers, say. And B is a query about, say, the performance and the power
consumption. So is it possible, for example, that the performance is 3 and the power
consumption is 7, whatever that means.
So we could post that query to A, and A might say sorry, unsat, that's inconsistent with me, I
cannot build that server.
So now how could A explain to B the cause of the failure. Well, one way would be that A could
provide a proof. We could take the premises A and B and deduce false in our proof system, and
that would proof the inconsistence. But the trouble with that is that proof is going to contain all
kinds of variables from the black box, all kinds of variables in A that B doesn't know about. And
so B is not going to understand that proof.
So instead what we can do is we can use the concept of feasible interpolation to find an
explanation. That means we can take that proof and we can run it through an algorithm and it
can derive an interpolant. That's a fact that's implied by A and it's inconsistent with B. It says,
oh, if you wanted -- it says -- and it says basically that, say, the performance has to be less than 2
times the power of consumption, whatever that means.
So that would be a general fact that would rule out this query, and also a larger space of possible
queries, and it might tell us how we need to fix our query to get something that's feasible.
So if we think about all the possible explanations or the possible interpolants, well, there's
actually a space of them. There's a space where we have the most specific. And the most
specific would be your exact query doesn't work. It would be the negation of your query. And
the most general would be some very complicated formula that tells you exactly all the
combinations of X and Y that are feasible. But that's not a good explanation either; it's too
complex, I can't understand it.
So somewhere between those two is a relevant generalization. It gives me an understandable
reason in my language as to why my query is not feasible, like Y is less than 2x.
So the idea here and the way that we're applying the relevance principle is that we're going to say
relevant generalizations are derived from parsimonious proofs. So if I look at the simplest
possible proof that A and B are inconsistent, that ought to give me a relevant generalization, a
relevant explanation of why my query didn't work.
So we're thinking of interpolants as a way of getting explanations out of proofs.
And let's see how we can apply that idea to programs now. We'll see that that notion of
explanation can be used to generate program proofs.
So here's a very simple inline program. It says X gets Y, increment Y, and then pass through a
guard when X is equal to 1. Now, this guard condition obviously can't be true because at this
point in the program Y is equal to X plus 1.
So we're going to prove that logically to generate an explanation. We'll take this program and
we'll turn it into static single assignment form as a set of constraints. So that means every time
we assign a variable, we will give it a new subscript or create a new version of it.
So increment Y becomes Y1 equals Y not plus 1, and so on. So we've turned that program into a
logical statement, a mathematical statement that we can put into a SAT solver, in particular into a
SAT solver that knows a little bit about arithmetic like a SAT modulo theory solver. And it will
say this is inconsistent and here is a proof in an appropriate proof system that this is actually
inconsistent.
From that proof we can then generate interpolants. And that will be a sequence of facts about
our variables that has the following properties.
Each formula implies the next. So each formula is implied by what came before. Each formula
is only written over the common symbols between what comes before and what comes after.
And that means that each formula is written in terms of the variables that represent the state of
the program at just one location. So it's a formula about program state.
And finally it's a refutation. It begins with true and ends with false. So if our SAT solver
generates a refutation for these facts, we could algorithmically generate these facts in the
vocabulary that we want. And that will give us a proof. That is, by just dropping the subscripts
we now get a Floyd-Hoare proof that you cannot execute this program fragment.
So to recap, we start by translating into static single assignment form. We then ask our solver,
our SAT solver or SMT solver, to give us a proof that the resulting constraints are inconsistent,
and then from that we can derive this Floyd-Hoare proof about this program fragment.
So then the hope is, of course, by the generalization principle that these facts that we used in the
deduction are actually going to be relevant, for example, to constructing an inductive invariant.
So that's the general scheme of things. And now I just want to sort of summarize. This is the
third advertisement. I want to summarize the result -- the research that I've been doing in this
area over the last six or seven years.
So I started out in 2003 looking at Boolean or propositional logic and thinking about hardware
designs at the bit level. And so I would take a SAT solver and I would have it examine all the
possible executions of that piece of hardware for a fixed link of time, say K clock cycles.
Now, having the proof that there was no failure of the property in that amount of time, I could
then use interpolants to generate a sequence of facts that I could prove at time 1 and time 2 and
time 3 and so on. And I show that there were techniques to get those facts to converge to
inductive invariants. So having that I could then prove that there are no failures in that -- of that
particular property in any amount of time just using a SAT solver and this interpolation
technique.
So, in fact, this is currently the most effective technique that we have at Cadence. This is the
first-line technique for verifying temporal properties of hardware designs.
So then I said, well, can we move to a richer logic. And I built a little SAT modulo theory
solver, which is a SAT solver that knows something about arithmetic and --
>> Can you just expand on your comment about this is the front-line technique at Cadence?
Does it mean that ->> Kenneth McMillan: Yeah. In the products.
>> In the products. So the -- Cadence ->> Kenneth McMillan: So, in other words ->> [inaudible] so at Cadence, like your engineers go and verify customers' designs using ->> Kenneth McMillan: No, no, they sell the tools to the designers, and the designers apply the
tools. That's why I'm saying -- and they use a very informal methodology overall to do it. So
they might, for example, attack the bus bridge and start writing properties of the bus bridge like,
you know, every transaction that goes in essentially it comes out, it doesn't get duplicated and so
on.
And so they would do the verification at that level and they would never put it together into the
overall proof.
So the assumption is we already had the shallow properties, because we have no technology for
getting the shallow properties. That's -- that's future research.
So -- and so interpolation is just the first technique that we apply to that, because it's on average
the most effective.
Okay. So then I looked at teaching the SAT solver, you know, using some well-known
techniques about arithmetic and equality in other theories and getting out proofs in a slightly
richer calculus to see if we could then talk about more interesting properties, say, that involve
integers in arithmetic and so on.
And this solver that I developed was used by the BLAST software model checking group at
Berkeley, in particular by Ranjit Jhala, to look at program traces and try to pull out relevant
predicates in just the way that I was describing, and then they could use predicate abstraction as a
way of synthesizing the inductive invariant as a combination of those basic predicates or basic
ingredients. And we'll look a little bit more at that later.
And later still I said, well, in software verification we should be able to generate the invariants
directly from the interpolants. That is, the interpolants should be able to tell us more than just
the basic ingredients of the invariants. And I looked at techniques for actually getting the
interpolants to converge to induct an invariance directly. And I'll talk more about that.
I looked at this problem of convergence that we talked about before. And I said how can you
prevent us from looking at particular cases and just getting an infinite sequence of particular
facts instead of general facts that will converge.
And what I found was that if you can get the appropriate control on the prover and on the
language that the prover is able to use, you can actually prevent that divergence and you can get
a result that says that if an inductive invariant exists in your logic, then you will eventually
converge to one in this technique. So you can solve that divergence problem at some cost.
I also looked at richer logic still, for example, looked at provers for full first-order logic that
handle quantifiers and built a system that can generate in quantified inductive invariance with
universal quantifiers based on a first-order prover, and that was able, for example, to find
invariants for simple programs that manipulate linked lists on a heap or manipulate arrays. So
it's moving up to richer logics and data structures.
And I've also recently been looking at generating summaries for procedures. So that's a simple
logical fact that will tell you everything you need to know about a particular procedure in a
particular calling context, so that you learn these sort of reusable facts about procedures so you
can handle programs modularly.
So as you can see, sort of a theme of this work is moving on to richer logical languages and more
interesting classes of systems using the same basic generalization principle as a way of
constructing invariants.
So I just want to give some data now to give a sense that this stuff is more or less real. This is
for the basic interpolation method of bit-level hardware designs where we are looking at
behaviors of length K and asking a SAT solver to prove that there's no error for behaviors of that
length, and then using interpolation to generate invariance.
And this is comparing against an earlier technique for generating inductive invariants, again,
checking them using a SAT solver. And this technique called K-induction is essentially
generating the invariant directly from the property rather than from proofs.
And so here I've plotted runtimes. On the X axis on a log scale I have interpolation; on the Y
axis I have K-induction. And each point is a verification problem for some property that was
written by an engineer about a real commercial design. So a shallow property about a complex
system.
And you can see that the interpolation technique completely dominates here and that, you know,
there are points over here where it is four orders of magnitude, or almost four orders of
magnitude faster, which is a big gap.
And the reason is that interpolation is able to more quickly focus in on just relevant facts and
ignore irrelevant facts. And that allows us to do the proof looking at shorter, simpler executions
for smaller values of K.
So now let's look at software model checking. And this is using the BLAST tool on some
benchmarks that came from -- that are example drivers from Microsoft that were provided by -Tom, did you provide these benchmarks? Or do you know where they got them from?
>> Tom Ball: I think they came from the ->> Kenneth McMillan: It came from SLAM in some way, but ->> Tom Ball: Well, they came from -- they're all publically available, I believe.
>> Kenneth McMillan: So, but, in any event, the properties probably came from SLAM, I'm
guessing.
>> Tom Ball: Yes, yes. There was -- I think this is almost nine ->> Kenneth McMillan: Many years ago.
>> Tom Ball: Eight -- yeah. We transferred quite a few of the properties.
>> Kenneth McMillan: So the interesting thing about this is the way that BLAST was behaving.
It was working all right on the smaller ones, and then on the larger ones, it was just not finishing.
And the reason is that it didn't have a technique for extracting just the relevant facts where the
relevant facts were needed. It had no way of getting explanations from proofs of specific cases.
And so when they started doing that using my little interpolating prover, they were able to handle
the larger problems, and the reason was basically this: That the ability to get explanations from
proofs allowed them to find just a small set of relevant predicates for each location so that as you
went up to the larger benchmarks, the number of predicates needed in each location was staying
roughly the same. So the system was scaling better.
Now, so this is based, again, on this idea of looking at special cases and getting explanations
from proofs. And later I was able to show that in fact you could get the invariants for these
problems much more directly by just getting the interpolants to converge. That was CAV 2006.
And so, you know, in this case you can see using that technique versus predicate abstraction. It
was about two orders of magnitude faster. Again -- yeah.
>> Sorry to interrupt. Do you just expect one interpolant or many interpolants?
>> Kenneth McMillan: Yeah. So what would happen is you would start labeling your Boolean
reachability tree, and you're labeling it with facts that you derive from interpolants along
particular paths.
>> But for -- I know that along a given path you may have many interpolants. It may be the case
that there are many different proofs of the same ->> Kenneth McMillan: Right. So you can have -- right. So you can have many different proofs
even for the same path. And so what could happen is in some cases you might derive some
irrelevant facts. And in that case, for example, you would perhaps not converge in a loop until
you are able to unwind it enough to actually get the prover to give you relevant facts.
So if you look at the final facts that are labeling each program location, it actually turns out to be
a combination of facts, okay, actually a -- you know, a disjunction of conjunctions of
interpolants, and as you go along, those facts are not actually monotonically strengthening.
Sometimes they get weaker and sometimes they get stronger as you explore more of the tree and
as you label the -- as you label paths with more facts.
>> But to be complete to your -- the taking of the previous slide, in those experiments do you
generate -- do you get the predicates from one proof or do you look for alternate proofs?
>> Kenneth McMillan: Yeah. Okay. So we are never -- in these experiments we are never
running multiple proofs on the same program path. So we'll wind up with multiple proofs on
different program paths.
>> [inaudible]
>> Kenneth McMillan: Right. Um-hmm. So, I mean, there is a question of, you know, you
could look at multiple proofs and try to decide which one is relevant, or the alternative is to say I
am going to accept the fact that I might get some irrelevant facts and have a way to recover from
that in the future to learn that this fact wasn't relevant and that some other fact was really needed
by exploring a different path.
Okay. And the last application I wanted to talk about is a fairly recent one, and it's looking at
test generation, where we are trying -- we are looking at a piece of code and we want to generate
inputs for that code that somehow explore the space of program locations we could reach or
some other defined coverage space.
And so here what we're doing is as we generate tests for that code, we're of course trying to
generate tests that will take us along different paths using an SMT solver, but we are also
learning -- as we backtrack and look at different paths, we are learning facts that we can annotate
onto that program. And a given fact will tell us when you reach here, if this condition is true,
then there are no further coverage goals that you can find, so backtrack in a different direction.
So we're learning things about the program as we go along.
And what you can see is if we don't use learning in this way, then we tend to get stuck at fairly
long plateaus. If we look at the number of tests generated versus the amount of coverage we're
getting.
So as we keep generating tests of course we're exploring new program paths, which is good, but
we're not seeing any new program locations.
And what happens is if we're able to learn, then this curve sort of goes more straight up. It
doesn't get stuck in these plateaus because you quickly learn that you can't by looking at this
same set of what you [inaudible] looking at this particular set of paths or going in this particular
direction get to anything new, so you have to go -- you push the search in a different direction.
So it's very much like applying the principle of DPLL, applying the SAT solver principle, to
exploring programs. So our search space is program paths. And we are learning program
annotations much in the way that a SAT solver is learning conflict clauses.
>> In the case of SAT solvers, right, there's always a rule [inaudible].
>> Kenneth McMillan: In this case we have many goals.
>> What is the goal in this case?
>> Kenneth McMillan: Well, right, the coverage goal in this case was to cover all the basic
blocks. So to find -- for every basic block to find sum path that's feasible that goes through that
basic block.
And so when things flatten out here, it means we are exploring the same piece of the code and
not seeing any new basic blocks. For example, we might have some diamonds in the control
flow and we're exploring different paths through those diamonds.
So and also with this technique, for example, you can also learn -- in fact, this technique is
learning procedure summaries, for example, so you can -- as it goes along, you can learn
reusable facts about procedures and do things modularly.
Okay. So just to sort of overview the main points that I'm trying to get at. I tried to establish
two essential principles for talking about relevance in verification. One was the parsimony
principle that said relevant facts tend to come out of simple proofs.
The second was the generalization principle which says that if you want to find relevant facts,
you look at special cases like a particular Boolean assignment or a particular program path and
you try to look at the proofs of those to pull out facts that will be relevant to the general case. So
we're always moving from special cases to the general case.
And I would claim that this principle is essentially all that we really know about relevance and
how to automatically find relevant facts.
Now, SAT solvers and another similar provers essentially apply these principles to help them
focus on relevant deductions so they're able to give us parsimonious proofs.
But the final piece of the puzzle is to be able to get explanations from those proofs in a language
that you understand. For example, as a simple Floyd-Hoare proof. And we can do that by taking
simple proofs and generating explanations or generalizations using interpolation.
So this is like a little bit -- this is a different take on generalization, whereas in machine learning
you are generalizing inductively from examples. Here we are generalizing deductively using
proof.
So we can exploit these ideas to extract relevant facts that will then -- we can then use to build
inductive invariants for programs or hardware designs and so on, as long as we have relatively
shallow properties of very complex systems that we need to prove.
And the last thing I'd like to do is just suggest some applications outside of the area of
verification that might be interesting.
So as I mentioned, you know, in AI or in machine learning, you are often generalizing from
examples. But here we can generalize from deductions. So how could you use that principle.
You know, could you use it, for example, if you have large knowledge bases, like Web
ontologies or expert systems or other world-based systems and you want to get some explanation
out in a language that you understand about some very complex set of rules, why some simple
query is not working. So would it allow you to get explanations of very complex -- not from
very complex knowledge bases.
Or if you're doing program verification and you have an error trace or a large number of error
traces, can you use this to get explanations of failures that are, again, deductive rather than
inductive and therefore perhaps more likely to be focused on relevant information.
Or another application is if you're doing random constraint solving, you can learn things about
subsets of the variables that you're solving for from interpolants, so that you can learn things
about the space in terms of vocabulary that you're interested in, or, you know, in time where you
have some large logical system that you don't understand, perhaps even executions of malicious
code, you might be able to get explanations of failures out of those systems.
And of course you might, you know, imagine various other areas, for example, robotic planning
or any other kinds of systems where you have complex sets of rules from which you want to
derive explanations.
So the last thing I want to leave you with is just idea that these notions of relevance that are
coming out of formal verification and things we've learned about SAT around program proving
and so on could potentially have very broad applications in computer science, if we know how to
apply them.
So that's all I have to say. Thank you.
[applause]
>> Tom Ball: Wow. We have time for questions. Thank you, Ken, for such a nice talk.
>> We've got time for questions?
>> Tom Ball: Yes.
>> Kenneth McMillan: At least five minutes.
>> Can you go back a few slides.
>> Kenneth McMillan: Sure.
>> Oh, here. Parsimony principle. I have a comment about the parsimony principle. So I
[inaudible] but I also think that the parsimony principle will not work unless you provide into the
vocabulary over which the parsimonious proof is going to be constructed the rights of
abstractions.
So, for example, I think often -- I think what you are trying to really do here is discover the proof
that is in the programmer's head. I mean, we're imagining that the programmer is really trying to
make a program work, right, and that's the reason why the program is working.
So oftentimes programmers use some fairly sophisticated abstractions that are not formalized.
Now I think if you provide that in some kind of a formal system, then I can imagine this
working. But if that is not -- if the system is not even aware of the abstraction, then I think this is
hopeless.
>> Kenneth McMillan: Well, okay, so I would say several things about that. So you're -- when
you are trying to do generalization, you need some kind of principle by which to do it. And one
possible principle which is common in machine learning is to have an inductive bias which says I
like some facts more than others.
So in some sense what you're saying is the programmer has some idea in mind of what facts are
better than other facts, therefore I want to guide the program -- guide the prover to use those
facts, more or less.
>> Well, no, I'm saying something even worse than that. So imagine -- imagine trying to prove
that the derivative of X squared with respect to X is 2X [inaudible] principles.
>> Kenneth McMillan: Yeah.
>> Right? That's a very hairy thing. But because somebody has figured out that, you know, you
don't have to do everything from first principles, there is this template that the derivative of X to
the power of N is N times X to the power of minus 1, and now you can use that.
>> Kenneth McMillan: Right.
>> Now that template is a very powerful abstraction. Without that, you know, just trying to do
the limit proof of first principle is you'll just get lost if you're trying to do that automatically.
>> Kenneth McMillan: Right. Okay. So what I'm saying is that there are different kinds of
biases. So another bias -- I mean, I talked about what's a simple proof. Well, you should ask in
what proof system.
>> Yes.
>> Kenneth McMillan: All right? Okay. So in general if I have a richer proof system, I will be
able to find simpler proofs, which is good, right? Now -- of course I am talking about relatively
shallow properties, right, so I am sort of assuming that things have been broken down into
relatively small steps. But even given that, of course you have different possible proof systems
in which you can solve the problem.
So if you give me a richer proof system, then I will be able to find a simpler proof. And that's
what I'm talking about the parsimony principle. We want to get down to the simplest possible
proofs. So if I am using, say, Boolean logic, I may still have a proof, but it will be much more
complex than the proofs that I could get if I know a little bit of arithmetic. Or if I know a little
bit of analysis, it might help me to solve your problem.
So there is a bias here. The bias is a proof system. Rather than saying I think this is the fact that
you should be proving, I'm saying this is the proof system in which you should be doing the
proof.
And so if you look at the examples that were done on -- that I did on, for example, linked lists,
what I did was I set up a proof system that was appropriate for those problems, where I had a
collection of axioms, I defined some quantities and I gave some axioms over those quantities to
help the system do the kinds of proofs that I wanted it to do. And that was a bias.
>> [inaudible]
>> Kenneth McMillan: Right. And that was a bias. And of course what I'm saying, if you have
no bias, you cannot generalize. If you have no preference of one fact over another fact, you
cannot generalize. And I'm just suggesting that there are different kinds of biases. There are
language biases, which is typical in machine learning. And are -- but a deductive system can
also be a bias.
In other words, giving some -- giving some axioms, for example, can be a bias in terms of what
kinds of proofs you can get and can allow the system to then get simpler proofs by making
proofs of certain things simpler, you are effectively biasing the system towards certain kinds of
generalizations.
>> If you look at interpolants, so you're using interpolants as possible explanations between two
facts, right?
>> Kenneth McMillan: Yeah, um-hmm.
>> [inaudible] interpolant also has a restriction [inaudible].
>> Kenneth McMillan: Right.
>> So that might actually sometimes work against you, right, because there might be some
crucial [inaudible] entities that might result in concise formulas [inaudible].
>> Kenneth McMillan: Right. So if you have a linear time interpolation lemma, then that won't
happen in the sense that you know if you have a simple proof using these other quantities that
you want to eliminate, ergo you can get an interpolant that is linear size in the size of your proof.
If you don't have a linear time interpolation result, then you're right, you might say, oh, by
introducing some additional variables, I could get a much simpler representation.
Now, of course anytime you're introducing additional variables, you are introducing another kind
of difficulty, though, which is those variables are essentially quantified. They're essentially
hidden, if you will. And then you have to be able to cope with the quantifiers which makes life
then much more difficult for any subsequent provers that want to make use of that fact.
So you have to think about -- you know, one of the nice things about interpolants in arithmetic is
that they don't introduce quantifiers, which is just very nice. In other theories, sometimes you
wind up necessarily introducing quantifiers to be able to get -- to get interpolates.
>> [inaudible] the last interpolant examples?
>> Kenneth McMillan: I would say certainly use the equality theory but could it have just been
done with linear arithmetic for the BLASTing samples? I mean, probably.
>> So did anyone try a combination of [inaudible] and unsatisfiable [inaudible] as opposed to
[inaudible]?
>> Kenneth McMillan: Yeah, so -- well, of course that is one way of generating a proof. I
mean, quantifier elimination is a way of organizing the proof in a particular direction.
>> But it's not necessarily minimum, which --
>> Kenneth McMillan: Yeah, okay, so the difficult -- right. So one of the nice things about -- it
could be done. It could be done that way. Once you say I want to just localize and eliminate
some facts and then do quantifier elimination, which would be essentially computing an extreme
interpolant, the strongest of the weakest, depending on how you did it, you know, my sense is
that you probably don't want to do that because you wind up generating potentially a lot of
irrelevant information by doing things that way.
>> [inaudible] satisfy the course [inaudible].
>> Kenneth McMillan: It's possible. It's possible. So I certainly -- there are many different
proof systems. And so if you look at the way that I dealt with superposition, for example, it's
much more in the flavor of what you're talking about, where you are doing a superposition
formula which is eliminating variables in a particular order, much like you would do in quantifier
elimination.
Although, in the end, if you did a quantifier elimination proof, you could possibly even extract a
core from that because you could say, oh, my quantifier elimination technique generated all these
constraints, but if you trace it back not all of them were actually used. Or you could say the
same thing for a Groebner basis proof or something like that.
So there are different proof systems, and each one, you know -- different proof systems might
have different ways of getting explanations out of them. But, yes, it's possible that it would
work.
>> I focus off and on on dynamic data structures [inaudible] things like that, and then a theory
that's quite useful to [inaudible] the theory of [inaudible] quantifications. Could you say again
what the [inaudible] invariants that make use of quantifications, but there are other [inaudible]
you could just have theories lying around in other places [inaudible] verification conditions
[inaudible] that you get. Can you say more about [inaudible] the state of quantifiers with respect
to interpolation or resources exist?
>> Kenneth McMillan: Okay. So for any given -- right. So if you have, for example, a theory
that you can axiomatize, then you can ->> Using quantifiers.
>> Kenneth McMillan: Well, with quantifiers, yeah. Then you can use some of the techniques
that I was talking about. You can just axiomatize your favorite theory. You can introduce your
favorite predicates, you can axiomatize them. And then that essentially is biasing the prover. It's
saying use these facts. Use these axioms to see if you can get a simple proof.
So that will tend to focus the prover on using certain facts.
On the other hand, you might wind up with a theory that's not finitely axiomatizable, so the
theory -- so say you want to introduce a reachability predicate. So, you know, one of the things
that I did was I said, well, I will do a partial axiomatization of that. And so I will wind up with
an incomplete prover for that.
So apart from that, other than just the ability to axiomatize your theory, if you can't do that, then
you wind up with a -- if you can't either axiomatize it or encode it in some existing theory, then
you have to solve the problem of -- for this proof system, is this proof system interpolating, is
your logic -- does your logic have the correct property.
So it's not -- so there is a pretty wide range of things you can do, but you might wind up having
to come up with some original result in order to have handle a particular theory that you want to
introduce.
But, I mean, one of the things that I've found is just the ability to give axioms is very nice in
terms of guiding the system towards doing certain kinds of -- doing certain kinds of deductions.
>> Tom Ball: Okay. Let's thank Ken again.
[applause]
Download