>> Tom Ball: So we'll get started. I'm Tom Ball. And about a year ago Ken McMillan came and visited us. Actually, it was March 3rd. We checked this morning. I was -- if you go back to Resnet, check me out, 50 pounds heavier introducing Ken. But Ken hasn't changed at all in the last year. No, I think he has. >> Kenneth McMillan: A little gray hair. >> Tom Ball: Ken has changed a little bit I think. And so he's going to tell us a little bit about I guess what's changed in the last year since he came and talked to us. And we welcome Ken McMillan very much back to Microsoft Research. Please, take it away. >> Kenneth McMillan: Thank you. Well, I'm not sure, actually, how much will be different. We'll see by the time we get to the end if there's anything new that I have to say. I mean, what I'm going to talk about is how we cope with complexity in dealing with and verifying hardware systems and software systems sort of at a fairly high philosophical level with a little bit of technical detail. And mainly I'll be discussing what we know about the problem of relevance in verification and where that comes from. And since this is a job talk, I'll include a few advertisements along the way of things that I've worked on. So okay. I think everybody in the room probably has a good sense of the complexity of the kinds of systems that we develop in hardware and software. I think hardware and software systems are probably the most complex objects that human beings have ever created. And you might ask, well, how do we manage to create such complex things and have them work most of the time. And you probably also are familiar with the answer, we do it by debugging. That is to say, we design something that's approximately correct and we test it and we fix it where it breaks until its moderately reliable. So the consequence of that is that the primary task of design of a complex system turns out to actually be verification; that is, finding the bugs so that we can fix them. And in fact if you look at a hardware design project, if you look at a chip design, people are spending some say as much as three-quarters of the engineering effort overall on just verifying, just doing simulations of the chip and trying to pull out the bugs. And of course as we know, when we fail at that, the costs of even very small errors can be very large. So in a chip design it could be half a billion dollars conceivably, as happened in the 1990s. And we've also seen that in software projects. A very small error in the code could result in a security vulnerability, they could have a very large economic cost. So because of this process of design that we use, of design by debugging, it turns out that when you buy a piece of software, you get no warranty, you get no guarantee that the software does anything at all. And that's because fundamentally we do not know how to design correct systems. So I would say, and I would hope that some people in this room would agree, that correct design is one of the remaining unsolved grand challenges in computing. It's something really worth putting our effort into. Now, naturally, you might imagine that the way to attack that problem would be to apply logic in some way, to apply logical proofs in order to help us to design correct systems, not just by debugging. But in practice, of course, constructing proofs can be an overwhelming task. The proof can be even substantially more complex than the system itself. So it's been argued that we need to have automation to help us do that. Well, that brings me to the topic of model checking, which is an automated method to help us do proofs about systems. Now, I'm afraid that those familiar with model checking will have to bear with me for a few slides here. But a model checker is a tool that takes some model of your system, typically a finite-state model, and it takes a logical specification. So if my system were a server and P was a condition of receiving a request and Q was the condition of sending a reply, then this specification would say always when a request arrives eventually in the future we receive a reply. So that's the kind of temporal specification that we can typically make in model checking, and we put these things into a model checker. And we can get one of two answers, either yes, every behavior of that system satisfies all logical specification, or, no, and here is a behavioral counterexample, which is an execution trace of the system that would show, for example, a request being received without a reply being sent. So it's been argued that this ability of model checking to produce behavioral counter examples is a major advantage because it helps us to debug our systems, it helps us to find out what's going on. So model checking is not that complicated. The simple case -- simplest case of it is, of course, just a simple safety property which says there's some bad condition that I want never to happen. So if I have a simple safety property, I could prove it by a reachability analysis. So I could take all of the states of my system here, which I've drawn as circles, my state is an assignment of values to variables in the system, and I can draw a transition graph where I have arrows between two states. If you can get from one state to the other by a single transition of the system. And then reachability analysis is essentially a search in this graph. So I designate an initial state and I designate a bad state that I don't want to reach, and then I can just do a [inaudible] search. So I'll find all the states that I can reach in one step, two steps, three steps, four steps, and so on, and perhaps I'll eventually reach the bad state, and I can trace this back to an error path that tells me why the system fails. So that's a counterexample. Now, if I were to fix the system by, for example, removing a transition arc, then if I did my reachability analysis, I might reach a fixed point, which is a condition where I cannot reach any additional states, and then I've computed the reachable state set of the system. And since that doesn't contain my error state, I would say that my simple safety property is verified. So model checking is only a little bit more complicated than that. And, you know, if we want to think about model checking, you know, reachability analysis essentially captures the idea it's computing a fixed point or searching in a state space. And, in fact, this very simple technique can find subtle bugs in circuits and protocols, but of course it suffers from the state explosion problem, which is that the number of states of this thing can be exponential in a number of state holding components in the system. So that means I can only do this, I can only draw this graph for small systems. So that leads me to my first advertisement, which is about symbolic model checking. And this was a technique that I developed actually for my thesis that avoids building that state graph by essentially computing a succinct representation for a large set of states. So somehow capturing the regularity in that set of reachable states so I don't have to write it out explicitly. And the classic representation for doing that was invented -- or techniques for it were invented by Randy Bryant. It's called a binary decision diagram. And that's essentially a compact representation of a decision tree. So if I have a decision tree here and it tells me, for example, whether a given state is reachable. So 1 might mean reachable and 0 might mean not reachable. Well, I can essentially just squeeze out the redundancy in that representation to get a binary decision diagram, which is a compressed representation. And if I can do my model checking only working on the compressed representation, then I can potentially avoid that problem of state explosion. And one of the things that I applied that to in my thesis was multiprocessor cache coherence protocols, where we viewed the protocols by which a multiprocessor keeps its various cache memories consistent essentially as a set of finite state machines that are communicating to each other over some kind of store-and-forward network. And if we can model the protocol at that level, then we can model check it. And I showed that symbolic model checking could, in fact, detect very subtle bugs in these protocols that would be very hard to find on the actual hardware and that it allowed that verification to be done in a somewhat scalable way, so I could talk about a large number of caches interacting with each other and therefore be able to find the bugs in the system. And I think this was actually the first commercial application, I'm not certain, but the first application of model checking to a real commercial machine. So that's nice in a very abstract way. But what about the real world? I mean, how do we deal with the complexity of the systems. Please. >> [inaudible] how many of the bugs were [inaudible]? >> Kenneth McMillan: Right. So the bugs were typically safety or deadlock. Yeah. So I don't know how you want to characterize deadlock, but the most interesting ones were deadlocks. Or maybe livelock is the way to put it. You get a situation where the system wouldn't completely halt, but some particular event could never occur. Okay. So how would we, for example, cope with the complexity of this object. I mean, if we want to talk about a microprocessor and say something interesting about it, we might have to think about something on the order of 100,000 registers, which is much larger than what we can do with the symbolic model checking technique. And since this state space is essentially two to the number of registers, you know, this is going to be a number that's beyond astronomical. And of course if you wanted to think about the software that you're running on that object, the situation could be even more complex. So in order to make model checking into a useful tool for engineers, we had to find ways to cut this problem down to size. And this talk is mainly about one key aspect of that problem, which is how do you decide what parts of this system or what facts about this system are relevant to proving a particular property. And I'll talk about some of the things that we've learned over the last decade or so on that subject of relevance. So if I want to talk about verifying a property and facts that are relevant to a property, I have to know what I mean by a property. And I'm going to distinguish between what I'll call deep and shallow properties. So a property is shallow if in some sense you don't have to know very much information about the system to prove that property. So for my chip, a deep property might be that it implements the x86 architecture, which would be very hard to prove. While a shallow property would be say that we have a bus bridge in the system that's giving us access to memory perhaps. And we just want to show that the bus bridge never drops transactions. And in order to prove a shallow property like that, we might not have to know very much information about the system. In fact, we might not even have to know very much information about the bus bridge to do that. So there are a variety of methodologies that we can use to reduce a deep property of a system like this to a large number of shallow properties like this that we can hope to actually verify. And here's the second advertisement. In the late '90s I spent a lot of time working on techniques they called functional decomposition. And that meant that we're going to start with some simple abstract model, like our model of the cache coherence protocol, where we are thinking in terms of communicating finite state machines. And we are going to refine it down to some actual implementation. So this is a schematic of an implementation of such a protocol that was done at Silicon Graphics, and it has a lot of things going on. You know, we have messages coming in in queues and they're being broken apart, control information is going this way and data is going this way. The control is being operated on in some kind of a pipeline, and the status of transactions in flight is being stored in a content addressable memory. At some point control and data are being reunited and queued for output. And that's sort of a high-level view of about 30,000 lines of Verilog that was implementing this protocol in a chip. So we want to show that a collection of these things together are going to implement this high-level protocol. And that's a deep property that we wouldn't be able to attack directly. However, I dropped a methodology that let you essentially break down that deep property into a collection of shallow properties mainly that are tracking individual transactions through the RTL. So the properties might be on the order of this. I would say I want all of my enqueued transactions to correctly match the abstract model. I want them to be matched up with the status and the CAM correctly. I want the tables to process things in the right way compared to the abstract protocol. And I want messages to be enqueued with correct data on the way out. So I'm greatly simplifying here. But the idea is that we're breaking the problem down into pieces by looking at individual transactions, showing that they move through correctly, and showing that other transactions don't interfere with them. So we're breaking down deep properties into shallow properties. And this was something that an engineer at Silicon Graphics was actually able to do using this methodology. And he was able to find hundreds of bugs in this low-level design by this technique and to formally verify its correctness. So, okay, end of advertisement. If what we're interested in is shallow properties of very large and complex systems and our problem is to prove them, then the solution is abstraction. Yeah. >> That particular case study that you referred to was the one that [inaudible] ->> Kenneth McMillan: It was Oscar's thing, yeah. >> Right. So that indeed was very impressive. What I was wondering -- I mean, [inaudible] must be a very smart way to be able to push a proof like that [inaudible] afterwards did you find this kind of replicated in an industrial setting? >> Kenneth McMillan: I don't think you would -- right. So I don't think you would want to replicate this. And the reason is it's too difficult. So part of the problem with that is that the tools that I was able to give Oscar were too primitive. And that has a little bit to do with the stuff I'm going to talk about, because essentially Oscar was having to do a lot of things manually that we can now do in a more automated way. Okay. That's the first thing. And the second thing is that we are now able to do some verification of systems that are infinite state and so on that would have helped him do less work doing the decomposition. So an interesting question is how much easier would it be now if you went -- ten years later if you went back and tried to redo that exercise with the current technology, how much -- you know, how difficult would that be to do, because that was a -- you know, that was a tour de force application. So what's happening in practice with most people is they are essentially writing down shallow properties of pieces of their design that they're hoping will imply what they want, but they're doing it in an informal way, because they're not able to put together the proof formally because the methodology is just too complex to apply. So I think it would be interesting -- it would be interesting at some point, you know, in the future to go back and say with all of the technology that we have now, all right, with, you know, SMT solvers and interpolation and all kinds of infinite-state model checking techniques and so on, how difficult would it be to do that. Could we simplify it down to the point where someone who's not Oscar could do that exercise. Okay. So right. So now assuming that we've reached the point where we have shallow properties to prove, which obviously might be difficult to do, the question, then, is how do we abstract the system in such a way that we can prove those properties. How do we extract just the facts about system state that are relevant to a given shallow property, a very complex [inaudible] system. >> Can you define what shallow means or deep means. Or is this the definition: A property that is shallow is one that can be always there [inaudible]. >> Kenneth McMillan: Right. So I'm saying that a property is shallow in the case where there's a simple abstraction that proves it. And so that -- you know, that's sort of -- it's not particularly well defined. And if you're doing it in practice, of course, you don't necessarily know. You know, you have a sense of if I break the thing up in this way, you know, that each of the problems I'm going to get is relatively shallow. But in fact you don't find that out until you're actually able to do the proof. So this is a -- sort of a vague heuristic definition. And you'll see a number of vague heuristic definitions to fall on. Okay. So how do we know what information is actually relevant. And how do we decide this mechanically. And you might say, you know, that's not a very well-defined problem. You might say that's even AI complete. But, in fact -- in fact, I would have said that a decade ago or so. But it turns out that we can actually give some fairly concrete answers based on concepts that have been developed over the last decade or so, especially in studying the Boolean satisfiability problem. And I'll talk more about that later. So this is the way I would describe or I would put the basic principles that we've learned in this area. And there are two basic principles. The first I might call the parsimony principle about relevance. It says a relevant fact about a system or predicate is one that is used in a parsimonious proof of the desired property in a simple proof. And second is what I would call the generalization principle. It says that facts that are used in proofs of special cases tend to be relevant to the overall proof. In other words, we start with simple special cases and we generalize from those special cases. So simple proofs define relevance. And to find relevant facts, we generalize from special cases. And those -- this is mainly what we're doing almost everywhere we're talking about relevance. For example, in SIGAR, we're looking at special cases and generalizing from them. So what do I mean by a proof? How can I make this notion concrete? Well, a proof is just a series of deductions from premises to conclusions. So each deduction is going to be an instance of an inference rule and usually will represent a proof as a tree where we start with some premises and we arrive at a conclusion at the root of the tree. And each branching point in the tree is the application of a sound inference rule. So if the conclusion of this proof is false, that means we have what's called a representation. And that means that the premises must be inconsistent since they imply false. Now, exactly which inference rules we use are going to depend on the theory that we're reasoning in. So if we're just doing Boolean logic, then we can get away with the resolution rule. For example, if I know that P or gamma holds and not P or delta holds, then since P has to be either true or false tells me that gamma or delta holds. And this rule by itself is complete for propositional logic, but if I'm reasoning about arithmetic, I might need to add some more interesting rules to my system. For example, the sum rule for inequalities, which tells me that I can sum up the left-hand side and the right-hand side of an inequality to get a fact that it's implied. So this is what proofs are going to look like, in very simple calculi like this. Now, the last thing that we need to do proofs about sequential systems is of course inductive invariance. And an inductive invariant is just some condition that I'm going to write about the system, some Boolean-valued formula, and the condition is going to split the state space into two parts. On the left here I have the states that satisfy the invariant, on the right the ones that don't, and an inductive invariant is just one that forms a barrier between the initial state and the final states so that no transitions go across the barrier and therefore I've proved that this is not reachable. Now, of course, the reachable states are an inductive invariant, but they might be a very complex inductive invariant. This might be a set that's very hard to describe and contain a lot of information, even if I try to represent it BVDs, whereas for any given property an inductive invariant that proves it could be very simple. And so mainly what we're doing in verification is trying to come up with these inductive invariants. So since proofs are made of inductive invariants, I can now say a little bit more about what I mean by relevance. So I'm going to say a fact is relevant if you use it in a simple inductive invariant. At least that's a start on what we mean by a simple proof. So here's a simple example program that comes up from time to time. Here we are setting two variables, X and Y, to zero. And then in a loop we increment both variables and then while X is not equal to zero we'll decrement both variables, and at the end of the program we know when we fall out of this loop that X is equal to zero, so Y should also be equal to zero and we want to prove that. So our state variables are the program counter and the two data variables. And our property, which is negation of the bad states, is that the program -- if the program counter is at L6, then Y is equal to zero. And the simplest inductive invariant that I could come up for that -- come up with for that said in addition to the property that either the program counter is at the beginning or X is equal to Y. And you can see that this is inductive in the sense that once it becomes true it has to always be true since we increment and decrement the variables simultaneously. So my only point about this proof is that it contains two atomic facts here that I am now going to say are in some sense relevant because they occur in the simplest proof that I know how to construct; that is, PC equals L1 and X equals Y. And there are of course lots of facts I could deduce, like X is greater than or equal to zero, that are not relevant to this property. So if we know these relevant facts, for example, these two atomic facts, there are lots of techniques that will allow us to construct or synthesize the inductive invariant. The fact that the logical aura of those two facts is inductive. So the -- for example, we could use predicate abstraction. The interesting thing for our point of view is how we come up with these facts in the first place. So we can learn something about that from the Boolean satisfiability problem and from techniques that have been developed for that. So SAT is of course the classic NP-complete problem. It takes as input a Boolean formula in conjunctive normal form and as output it either gives us a satisfying assignment or a statement that the problem is not satisfiable. So we can put this pair of clauses into our SAT solver, and the SAT solver will give us out a model like this, or if there isn't any model, it will say unsatisfiable. So modern SAT solvers I would say are using the generalization principle to help them focus on relevant facts. And in particular they will produce refutation proofs in the case where these clauses are unsatisfiable. So here's how that works more or less. SAT solvers are doing backtracking search, so we can just start deciding values for variables, like A equals zero, B equals zero, C equals 1, so on, Q equals zero. And at some point in this search we're going to become stuck. We reach a point where we can't give a value for R because either value that we give would make some clause false. So at that point we say that we are in conflict. Now, in a classical backtracking search, what you would do, of course, is you would go back to the last decision point and go in a different direction. But that's not what a modern SAT solver does. Instead the SAT solver does deduction at this point. That is to say we take the two clauses that caused us to be stuck. This one says that R can't be false, and this one says that it can't be true. And we apply the resolution rule to those two clauses to get a new clause that's implied by what we had before. And what's interesting about that new clause is it tells us that we were really stuck way back here. When we decided C equals 1, we were already infeasible because this clause was false. So that tells us that we need to go way back to the top and branch in a different direction giving value zero to C. So what's going on here is conflicts in the search are telling us where to do deduction and deduction is in turn guiding the search. And we can think of this deduction as an instance of the generalization principle. It's taking this failure to find a solution and is generalizing the failure to say not only does this particular combination not work but any combination that makes this clause false doesn't work. So we've carved out a piece of the space here and we've generalized our failure, and that has pushed the search in a different direction, and that's called DPLL. Now, in the solver, then, we get sort of a feedback between search and deduction. So we're making case splits. We're searching. We are then propagating some implications of those case splits to determine when we're in a conflicting state where we can't continue, and when that happens we are doing deduction or generalization to learn a new fact about the system, which in turn tells us which variables are interesting to do case splits on. So we have this sort of a feedback loop that allows the SAT solver to focus on relevant case splits and relevant deductions, and that sometimes allows the SAT solver to handle problems with millions of clauses and it helps us to generate simple proofs in the case of unsatisfiability, because we can focus on relevant deductions. So what I want to look at next is what lessons can we learn from this architecture and how can we use this to, for example, generate inductive invariance. How can you apply this same principle for invariant generation. Well, here's a very simple way you could approach that. As a special case, we are going to consider one execution path of a program. We're going to look at one way that the execution could proceed, and we will just construct that as sort of a straight-line program. So we're looking at a fragment of the program's behavior. And we'll try to construct a proof by whatever means necessary, that that straight-line program is not able to violate the property at hand, and then we'll see if the generalization principle works; that is, we'll see if that proof will contain facts that we need to build an inductive invariant. So back to our example program. You recall that the fact that we needed to know at each loop was that X equals Y. That was the key fact in the invariant. So would that fact fall out of analyzing a particular execution. Well, let's look at it. Let's unroll each of those loops twice. And we would get an inline program that looks like this, where here I've unrolled the increment loop twice and here the decrement loop twice, and each time I passed through a conditional I have put that condition in brackets. So to execute this path, we would have to make all of the conditions in brackets true. And at the end I've stated the negation of the property that I want to prove. So I said Y is not equal to zero. So the program can fail along this path exactly when we can make all of these guards true, and I'm going to prove that that's not the case. And I can do that in the Floyd-Hoare style where I just deduce a sequence of facts, each of which implies the next. So I start with true and then after initializing the variables I could say X equals zero and Y equals zero. After I increment them I could say X equals Y, for example, among other things I could prove and so on until I get to the bottom. At the end I could prove if X is equal to zero then Y is equal to zero, which means that when we fall out of the loop with X equals zero, we have a contradiction, and that gives us false. So this is a refutation of a special case, and you'll notice that it contains the proof or it contains the ingredients of the proof of the general case, which is that inductive invariant X equals Y. So of course it didn't have to work out that way. I could have written this proof for the special case which would have been useful. It only talks about the particular values of Y as I execute the program. And, in fact, if I looked at longer and longer traces, these predicates would just go to infinity. I'd have X -- Y equals 3, Y equals 4, et cetera, and I diverge. So clearly applying this principle for program verification or sequential system verification is more difficult than SAT. Somehow I have to avoid that kind of divergence, and I'll talk a little bit more about that later. But the general principle still applies. And we're looking for a practical method now of getting these inline proofs that are relevant to the overall proof. >> So the reason it's more difficult is that in the case of SAT the vocabulary of proofs is limited. >> Kenneth McMillan: Yeah. >> It's basically clauses over some fixed set of literals, right? >> Kenneth McMillan: Right. >> [inaudible] >> Kenneth McMillan: Everything is finite. So you have no possibility of divergence. And so, you know -- right. And also you have to think, well, if I'm looking at this infinite sequence of paths, all right, is it ever going to actually converge, which just is not an issue in SAT because it's essentially a finite search. So okay. So in order to do that, I'm going to apply Craig's interpolation lemma which is a very basic result from a proof theory from the 1950s. And it talks about pairs of first-order formulas. So let's say that A and B are two formulas over some vocabulary of nonlogical symbols, like predicate symbols and function symbols and constants. And I have the usual logical symbols I can play with. So if I have two such formulas and those formulas imply false -- that is, they're inconsistent -then the lemma says there exists a fact I can put in between, an interpolant A prime, such that A implies the interpolant, the interpolant implies that B is false, and the interpolant uses only the common vocabulary between those symbols. So if A were the formula P and Q and B were the formula now Q and R, then an interpolant would be Q because it's implied by A. It's inconsistent with B, because Q can't be both true and false, and it's written just using the common symbol Q between those two formulas. So that's an interpolant. And the question is why is that interesting for us. Well, we can think of interpolants as being explanations in some sense. So suppose, for example, they have some very large, complex, unknown formula A in a black box and I want to ask some questions about A to try to understand it. Well, I could propose a formula B and I could ask is that consistent with A. So if you want to think concretely, imagine that A is some very complex set of rules for configuring rackmount servers, say. And B is a query about, say, the performance and the power consumption. So is it possible, for example, that the performance is 3 and the power consumption is 7, whatever that means. So we could post that query to A, and A might say sorry, unsat, that's inconsistent with me, I cannot build that server. So now how could A explain to B the cause of the failure. Well, one way would be that A could provide a proof. We could take the premises A and B and deduce false in our proof system, and that would proof the inconsistence. But the trouble with that is that proof is going to contain all kinds of variables from the black box, all kinds of variables in A that B doesn't know about. And so B is not going to understand that proof. So instead what we can do is we can use the concept of feasible interpolation to find an explanation. That means we can take that proof and we can run it through an algorithm and it can derive an interpolant. That's a fact that's implied by A and it's inconsistent with B. It says, oh, if you wanted -- it says -- and it says basically that, say, the performance has to be less than 2 times the power of consumption, whatever that means. So that would be a general fact that would rule out this query, and also a larger space of possible queries, and it might tell us how we need to fix our query to get something that's feasible. So if we think about all the possible explanations or the possible interpolants, well, there's actually a space of them. There's a space where we have the most specific. And the most specific would be your exact query doesn't work. It would be the negation of your query. And the most general would be some very complicated formula that tells you exactly all the combinations of X and Y that are feasible. But that's not a good explanation either; it's too complex, I can't understand it. So somewhere between those two is a relevant generalization. It gives me an understandable reason in my language as to why my query is not feasible, like Y is less than 2x. So the idea here and the way that we're applying the relevance principle is that we're going to say relevant generalizations are derived from parsimonious proofs. So if I look at the simplest possible proof that A and B are inconsistent, that ought to give me a relevant generalization, a relevant explanation of why my query didn't work. So we're thinking of interpolants as a way of getting explanations out of proofs. And let's see how we can apply that idea to programs now. We'll see that that notion of explanation can be used to generate program proofs. So here's a very simple inline program. It says X gets Y, increment Y, and then pass through a guard when X is equal to 1. Now, this guard condition obviously can't be true because at this point in the program Y is equal to X plus 1. So we're going to prove that logically to generate an explanation. We'll take this program and we'll turn it into static single assignment form as a set of constraints. So that means every time we assign a variable, we will give it a new subscript or create a new version of it. So increment Y becomes Y1 equals Y not plus 1, and so on. So we've turned that program into a logical statement, a mathematical statement that we can put into a SAT solver, in particular into a SAT solver that knows a little bit about arithmetic like a SAT modulo theory solver. And it will say this is inconsistent and here is a proof in an appropriate proof system that this is actually inconsistent. From that proof we can then generate interpolants. And that will be a sequence of facts about our variables that has the following properties. Each formula implies the next. So each formula is implied by what came before. Each formula is only written over the common symbols between what comes before and what comes after. And that means that each formula is written in terms of the variables that represent the state of the program at just one location. So it's a formula about program state. And finally it's a refutation. It begins with true and ends with false. So if our SAT solver generates a refutation for these facts, we could algorithmically generate these facts in the vocabulary that we want. And that will give us a proof. That is, by just dropping the subscripts we now get a Floyd-Hoare proof that you cannot execute this program fragment. So to recap, we start by translating into static single assignment form. We then ask our solver, our SAT solver or SMT solver, to give us a proof that the resulting constraints are inconsistent, and then from that we can derive this Floyd-Hoare proof about this program fragment. So then the hope is, of course, by the generalization principle that these facts that we used in the deduction are actually going to be relevant, for example, to constructing an inductive invariant. So that's the general scheme of things. And now I just want to sort of summarize. This is the third advertisement. I want to summarize the result -- the research that I've been doing in this area over the last six or seven years. So I started out in 2003 looking at Boolean or propositional logic and thinking about hardware designs at the bit level. And so I would take a SAT solver and I would have it examine all the possible executions of that piece of hardware for a fixed link of time, say K clock cycles. Now, having the proof that there was no failure of the property in that amount of time, I could then use interpolants to generate a sequence of facts that I could prove at time 1 and time 2 and time 3 and so on. And I show that there were techniques to get those facts to converge to inductive invariants. So having that I could then prove that there are no failures in that -- of that particular property in any amount of time just using a SAT solver and this interpolation technique. So, in fact, this is currently the most effective technique that we have at Cadence. This is the first-line technique for verifying temporal properties of hardware designs. So then I said, well, can we move to a richer logic. And I built a little SAT modulo theory solver, which is a SAT solver that knows something about arithmetic and -- >> Can you just expand on your comment about this is the front-line technique at Cadence? Does it mean that ->> Kenneth McMillan: Yeah. In the products. >> In the products. So the -- Cadence ->> Kenneth McMillan: So, in other words ->> [inaudible] so at Cadence, like your engineers go and verify customers' designs using ->> Kenneth McMillan: No, no, they sell the tools to the designers, and the designers apply the tools. That's why I'm saying -- and they use a very informal methodology overall to do it. So they might, for example, attack the bus bridge and start writing properties of the bus bridge like, you know, every transaction that goes in essentially it comes out, it doesn't get duplicated and so on. And so they would do the verification at that level and they would never put it together into the overall proof. So the assumption is we already had the shallow properties, because we have no technology for getting the shallow properties. That's -- that's future research. So -- and so interpolation is just the first technique that we apply to that, because it's on average the most effective. Okay. So then I looked at teaching the SAT solver, you know, using some well-known techniques about arithmetic and equality in other theories and getting out proofs in a slightly richer calculus to see if we could then talk about more interesting properties, say, that involve integers in arithmetic and so on. And this solver that I developed was used by the BLAST software model checking group at Berkeley, in particular by Ranjit Jhala, to look at program traces and try to pull out relevant predicates in just the way that I was describing, and then they could use predicate abstraction as a way of synthesizing the inductive invariant as a combination of those basic predicates or basic ingredients. And we'll look a little bit more at that later. And later still I said, well, in software verification we should be able to generate the invariants directly from the interpolants. That is, the interpolants should be able to tell us more than just the basic ingredients of the invariants. And I looked at techniques for actually getting the interpolants to converge to induct an invariance directly. And I'll talk more about that. I looked at this problem of convergence that we talked about before. And I said how can you prevent us from looking at particular cases and just getting an infinite sequence of particular facts instead of general facts that will converge. And what I found was that if you can get the appropriate control on the prover and on the language that the prover is able to use, you can actually prevent that divergence and you can get a result that says that if an inductive invariant exists in your logic, then you will eventually converge to one in this technique. So you can solve that divergence problem at some cost. I also looked at richer logic still, for example, looked at provers for full first-order logic that handle quantifiers and built a system that can generate in quantified inductive invariance with universal quantifiers based on a first-order prover, and that was able, for example, to find invariants for simple programs that manipulate linked lists on a heap or manipulate arrays. So it's moving up to richer logics and data structures. And I've also recently been looking at generating summaries for procedures. So that's a simple logical fact that will tell you everything you need to know about a particular procedure in a particular calling context, so that you learn these sort of reusable facts about procedures so you can handle programs modularly. So as you can see, sort of a theme of this work is moving on to richer logical languages and more interesting classes of systems using the same basic generalization principle as a way of constructing invariants. So I just want to give some data now to give a sense that this stuff is more or less real. This is for the basic interpolation method of bit-level hardware designs where we are looking at behaviors of length K and asking a SAT solver to prove that there's no error for behaviors of that length, and then using interpolation to generate invariance. And this is comparing against an earlier technique for generating inductive invariants, again, checking them using a SAT solver. And this technique called K-induction is essentially generating the invariant directly from the property rather than from proofs. And so here I've plotted runtimes. On the X axis on a log scale I have interpolation; on the Y axis I have K-induction. And each point is a verification problem for some property that was written by an engineer about a real commercial design. So a shallow property about a complex system. And you can see that the interpolation technique completely dominates here and that, you know, there are points over here where it is four orders of magnitude, or almost four orders of magnitude faster, which is a big gap. And the reason is that interpolation is able to more quickly focus in on just relevant facts and ignore irrelevant facts. And that allows us to do the proof looking at shorter, simpler executions for smaller values of K. So now let's look at software model checking. And this is using the BLAST tool on some benchmarks that came from -- that are example drivers from Microsoft that were provided by -Tom, did you provide these benchmarks? Or do you know where they got them from? >> Tom Ball: I think they came from the ->> Kenneth McMillan: It came from SLAM in some way, but ->> Tom Ball: Well, they came from -- they're all publically available, I believe. >> Kenneth McMillan: So, but, in any event, the properties probably came from SLAM, I'm guessing. >> Tom Ball: Yes, yes. There was -- I think this is almost nine ->> Kenneth McMillan: Many years ago. >> Tom Ball: Eight -- yeah. We transferred quite a few of the properties. >> Kenneth McMillan: So the interesting thing about this is the way that BLAST was behaving. It was working all right on the smaller ones, and then on the larger ones, it was just not finishing. And the reason is that it didn't have a technique for extracting just the relevant facts where the relevant facts were needed. It had no way of getting explanations from proofs of specific cases. And so when they started doing that using my little interpolating prover, they were able to handle the larger problems, and the reason was basically this: That the ability to get explanations from proofs allowed them to find just a small set of relevant predicates for each location so that as you went up to the larger benchmarks, the number of predicates needed in each location was staying roughly the same. So the system was scaling better. Now, so this is based, again, on this idea of looking at special cases and getting explanations from proofs. And later I was able to show that in fact you could get the invariants for these problems much more directly by just getting the interpolants to converge. That was CAV 2006. And so, you know, in this case you can see using that technique versus predicate abstraction. It was about two orders of magnitude faster. Again -- yeah. >> Sorry to interrupt. Do you just expect one interpolant or many interpolants? >> Kenneth McMillan: Yeah. So what would happen is you would start labeling your Boolean reachability tree, and you're labeling it with facts that you derive from interpolants along particular paths. >> But for -- I know that along a given path you may have many interpolants. It may be the case that there are many different proofs of the same ->> Kenneth McMillan: Right. So you can have -- right. So you can have many different proofs even for the same path. And so what could happen is in some cases you might derive some irrelevant facts. And in that case, for example, you would perhaps not converge in a loop until you are able to unwind it enough to actually get the prover to give you relevant facts. So if you look at the final facts that are labeling each program location, it actually turns out to be a combination of facts, okay, actually a -- you know, a disjunction of conjunctions of interpolants, and as you go along, those facts are not actually monotonically strengthening. Sometimes they get weaker and sometimes they get stronger as you explore more of the tree and as you label the -- as you label paths with more facts. >> But to be complete to your -- the taking of the previous slide, in those experiments do you generate -- do you get the predicates from one proof or do you look for alternate proofs? >> Kenneth McMillan: Yeah. Okay. So we are never -- in these experiments we are never running multiple proofs on the same program path. So we'll wind up with multiple proofs on different program paths. >> [inaudible] >> Kenneth McMillan: Right. Um-hmm. So, I mean, there is a question of, you know, you could look at multiple proofs and try to decide which one is relevant, or the alternative is to say I am going to accept the fact that I might get some irrelevant facts and have a way to recover from that in the future to learn that this fact wasn't relevant and that some other fact was really needed by exploring a different path. Okay. And the last application I wanted to talk about is a fairly recent one, and it's looking at test generation, where we are trying -- we are looking at a piece of code and we want to generate inputs for that code that somehow explore the space of program locations we could reach or some other defined coverage space. And so here what we're doing is as we generate tests for that code, we're of course trying to generate tests that will take us along different paths using an SMT solver, but we are also learning -- as we backtrack and look at different paths, we are learning facts that we can annotate onto that program. And a given fact will tell us when you reach here, if this condition is true, then there are no further coverage goals that you can find, so backtrack in a different direction. So we're learning things about the program as we go along. And what you can see is if we don't use learning in this way, then we tend to get stuck at fairly long plateaus. If we look at the number of tests generated versus the amount of coverage we're getting. So as we keep generating tests of course we're exploring new program paths, which is good, but we're not seeing any new program locations. And what happens is if we're able to learn, then this curve sort of goes more straight up. It doesn't get stuck in these plateaus because you quickly learn that you can't by looking at this same set of what you [inaudible] looking at this particular set of paths or going in this particular direction get to anything new, so you have to go -- you push the search in a different direction. So it's very much like applying the principle of DPLL, applying the SAT solver principle, to exploring programs. So our search space is program paths. And we are learning program annotations much in the way that a SAT solver is learning conflict clauses. >> In the case of SAT solvers, right, there's always a rule [inaudible]. >> Kenneth McMillan: In this case we have many goals. >> What is the goal in this case? >> Kenneth McMillan: Well, right, the coverage goal in this case was to cover all the basic blocks. So to find -- for every basic block to find sum path that's feasible that goes through that basic block. And so when things flatten out here, it means we are exploring the same piece of the code and not seeing any new basic blocks. For example, we might have some diamonds in the control flow and we're exploring different paths through those diamonds. So and also with this technique, for example, you can also learn -- in fact, this technique is learning procedure summaries, for example, so you can -- as it goes along, you can learn reusable facts about procedures and do things modularly. Okay. So just to sort of overview the main points that I'm trying to get at. I tried to establish two essential principles for talking about relevance in verification. One was the parsimony principle that said relevant facts tend to come out of simple proofs. The second was the generalization principle which says that if you want to find relevant facts, you look at special cases like a particular Boolean assignment or a particular program path and you try to look at the proofs of those to pull out facts that will be relevant to the general case. So we're always moving from special cases to the general case. And I would claim that this principle is essentially all that we really know about relevance and how to automatically find relevant facts. Now, SAT solvers and another similar provers essentially apply these principles to help them focus on relevant deductions so they're able to give us parsimonious proofs. But the final piece of the puzzle is to be able to get explanations from those proofs in a language that you understand. For example, as a simple Floyd-Hoare proof. And we can do that by taking simple proofs and generating explanations or generalizations using interpolation. So this is like a little bit -- this is a different take on generalization, whereas in machine learning you are generalizing inductively from examples. Here we are generalizing deductively using proof. So we can exploit these ideas to extract relevant facts that will then -- we can then use to build inductive invariants for programs or hardware designs and so on, as long as we have relatively shallow properties of very complex systems that we need to prove. And the last thing I'd like to do is just suggest some applications outside of the area of verification that might be interesting. So as I mentioned, you know, in AI or in machine learning, you are often generalizing from examples. But here we can generalize from deductions. So how could you use that principle. You know, could you use it, for example, if you have large knowledge bases, like Web ontologies or expert systems or other world-based systems and you want to get some explanation out in a language that you understand about some very complex set of rules, why some simple query is not working. So would it allow you to get explanations of very complex -- not from very complex knowledge bases. Or if you're doing program verification and you have an error trace or a large number of error traces, can you use this to get explanations of failures that are, again, deductive rather than inductive and therefore perhaps more likely to be focused on relevant information. Or another application is if you're doing random constraint solving, you can learn things about subsets of the variables that you're solving for from interpolants, so that you can learn things about the space in terms of vocabulary that you're interested in, or, you know, in time where you have some large logical system that you don't understand, perhaps even executions of malicious code, you might be able to get explanations of failures out of those systems. And of course you might, you know, imagine various other areas, for example, robotic planning or any other kinds of systems where you have complex sets of rules from which you want to derive explanations. So the last thing I want to leave you with is just idea that these notions of relevance that are coming out of formal verification and things we've learned about SAT around program proving and so on could potentially have very broad applications in computer science, if we know how to apply them. So that's all I have to say. Thank you. [applause] >> Tom Ball: Wow. We have time for questions. Thank you, Ken, for such a nice talk. >> We've got time for questions? >> Tom Ball: Yes. >> Kenneth McMillan: At least five minutes. >> Can you go back a few slides. >> Kenneth McMillan: Sure. >> Oh, here. Parsimony principle. I have a comment about the parsimony principle. So I [inaudible] but I also think that the parsimony principle will not work unless you provide into the vocabulary over which the parsimonious proof is going to be constructed the rights of abstractions. So, for example, I think often -- I think what you are trying to really do here is discover the proof that is in the programmer's head. I mean, we're imagining that the programmer is really trying to make a program work, right, and that's the reason why the program is working. So oftentimes programmers use some fairly sophisticated abstractions that are not formalized. Now I think if you provide that in some kind of a formal system, then I can imagine this working. But if that is not -- if the system is not even aware of the abstraction, then I think this is hopeless. >> Kenneth McMillan: Well, okay, so I would say several things about that. So you're -- when you are trying to do generalization, you need some kind of principle by which to do it. And one possible principle which is common in machine learning is to have an inductive bias which says I like some facts more than others. So in some sense what you're saying is the programmer has some idea in mind of what facts are better than other facts, therefore I want to guide the program -- guide the prover to use those facts, more or less. >> Well, no, I'm saying something even worse than that. So imagine -- imagine trying to prove that the derivative of X squared with respect to X is 2X [inaudible] principles. >> Kenneth McMillan: Yeah. >> Right? That's a very hairy thing. But because somebody has figured out that, you know, you don't have to do everything from first principles, there is this template that the derivative of X to the power of N is N times X to the power of minus 1, and now you can use that. >> Kenneth McMillan: Right. >> Now that template is a very powerful abstraction. Without that, you know, just trying to do the limit proof of first principle is you'll just get lost if you're trying to do that automatically. >> Kenneth McMillan: Right. Okay. So what I'm saying is that there are different kinds of biases. So another bias -- I mean, I talked about what's a simple proof. Well, you should ask in what proof system. >> Yes. >> Kenneth McMillan: All right? Okay. So in general if I have a richer proof system, I will be able to find simpler proofs, which is good, right? Now -- of course I am talking about relatively shallow properties, right, so I am sort of assuming that things have been broken down into relatively small steps. But even given that, of course you have different possible proof systems in which you can solve the problem. So if you give me a richer proof system, then I will be able to find a simpler proof. And that's what I'm talking about the parsimony principle. We want to get down to the simplest possible proofs. So if I am using, say, Boolean logic, I may still have a proof, but it will be much more complex than the proofs that I could get if I know a little bit of arithmetic. Or if I know a little bit of analysis, it might help me to solve your problem. So there is a bias here. The bias is a proof system. Rather than saying I think this is the fact that you should be proving, I'm saying this is the proof system in which you should be doing the proof. And so if you look at the examples that were done on -- that I did on, for example, linked lists, what I did was I set up a proof system that was appropriate for those problems, where I had a collection of axioms, I defined some quantities and I gave some axioms over those quantities to help the system do the kinds of proofs that I wanted it to do. And that was a bias. >> [inaudible] >> Kenneth McMillan: Right. And that was a bias. And of course what I'm saying, if you have no bias, you cannot generalize. If you have no preference of one fact over another fact, you cannot generalize. And I'm just suggesting that there are different kinds of biases. There are language biases, which is typical in machine learning. And are -- but a deductive system can also be a bias. In other words, giving some -- giving some axioms, for example, can be a bias in terms of what kinds of proofs you can get and can allow the system to then get simpler proofs by making proofs of certain things simpler, you are effectively biasing the system towards certain kinds of generalizations. >> If you look at interpolants, so you're using interpolants as possible explanations between two facts, right? >> Kenneth McMillan: Yeah, um-hmm. >> [inaudible] interpolant also has a restriction [inaudible]. >> Kenneth McMillan: Right. >> So that might actually sometimes work against you, right, because there might be some crucial [inaudible] entities that might result in concise formulas [inaudible]. >> Kenneth McMillan: Right. So if you have a linear time interpolation lemma, then that won't happen in the sense that you know if you have a simple proof using these other quantities that you want to eliminate, ergo you can get an interpolant that is linear size in the size of your proof. If you don't have a linear time interpolation result, then you're right, you might say, oh, by introducing some additional variables, I could get a much simpler representation. Now, of course anytime you're introducing additional variables, you are introducing another kind of difficulty, though, which is those variables are essentially quantified. They're essentially hidden, if you will. And then you have to be able to cope with the quantifiers which makes life then much more difficult for any subsequent provers that want to make use of that fact. So you have to think about -- you know, one of the nice things about interpolants in arithmetic is that they don't introduce quantifiers, which is just very nice. In other theories, sometimes you wind up necessarily introducing quantifiers to be able to get -- to get interpolates. >> [inaudible] the last interpolant examples? >> Kenneth McMillan: I would say certainly use the equality theory but could it have just been done with linear arithmetic for the BLASTing samples? I mean, probably. >> So did anyone try a combination of [inaudible] and unsatisfiable [inaudible] as opposed to [inaudible]? >> Kenneth McMillan: Yeah, so -- well, of course that is one way of generating a proof. I mean, quantifier elimination is a way of organizing the proof in a particular direction. >> But it's not necessarily minimum, which -- >> Kenneth McMillan: Yeah, okay, so the difficult -- right. So one of the nice things about -- it could be done. It could be done that way. Once you say I want to just localize and eliminate some facts and then do quantifier elimination, which would be essentially computing an extreme interpolant, the strongest of the weakest, depending on how you did it, you know, my sense is that you probably don't want to do that because you wind up generating potentially a lot of irrelevant information by doing things that way. >> [inaudible] satisfy the course [inaudible]. >> Kenneth McMillan: It's possible. It's possible. So I certainly -- there are many different proof systems. And so if you look at the way that I dealt with superposition, for example, it's much more in the flavor of what you're talking about, where you are doing a superposition formula which is eliminating variables in a particular order, much like you would do in quantifier elimination. Although, in the end, if you did a quantifier elimination proof, you could possibly even extract a core from that because you could say, oh, my quantifier elimination technique generated all these constraints, but if you trace it back not all of them were actually used. Or you could say the same thing for a Groebner basis proof or something like that. So there are different proof systems, and each one, you know -- different proof systems might have different ways of getting explanations out of them. But, yes, it's possible that it would work. >> I focus off and on on dynamic data structures [inaudible] things like that, and then a theory that's quite useful to [inaudible] the theory of [inaudible] quantifications. Could you say again what the [inaudible] invariants that make use of quantifications, but there are other [inaudible] you could just have theories lying around in other places [inaudible] verification conditions [inaudible] that you get. Can you say more about [inaudible] the state of quantifiers with respect to interpolation or resources exist? >> Kenneth McMillan: Okay. So for any given -- right. So if you have, for example, a theory that you can axiomatize, then you can ->> Using quantifiers. >> Kenneth McMillan: Well, with quantifiers, yeah. Then you can use some of the techniques that I was talking about. You can just axiomatize your favorite theory. You can introduce your favorite predicates, you can axiomatize them. And then that essentially is biasing the prover. It's saying use these facts. Use these axioms to see if you can get a simple proof. So that will tend to focus the prover on using certain facts. On the other hand, you might wind up with a theory that's not finitely axiomatizable, so the theory -- so say you want to introduce a reachability predicate. So, you know, one of the things that I did was I said, well, I will do a partial axiomatization of that. And so I will wind up with an incomplete prover for that. So apart from that, other than just the ability to axiomatize your theory, if you can't do that, then you wind up with a -- if you can't either axiomatize it or encode it in some existing theory, then you have to solve the problem of -- for this proof system, is this proof system interpolating, is your logic -- does your logic have the correct property. So it's not -- so there is a pretty wide range of things you can do, but you might wind up having to come up with some original result in order to have handle a particular theory that you want to introduce. But, I mean, one of the things that I've found is just the ability to give axioms is very nice in terms of guiding the system towards doing certain kinds of -- doing certain kinds of deductions. >> Tom Ball: Okay. Let's thank Ken again. [applause]