>> Shaz Qadeer: So it's my pleasure to introduce Akash Lal to you. Akash is a graduate student at the University of Wisconsin, and he's about to wrap up his dissertation on interprocedural analysis of concurrent programs. Very exciting stuff. He was advised by Professor Tom Reps over there. And he is going to join Microsoft Research India starting in the fall, and he's going to tell us about his dissertation work today. >> Akash Lal: Thank you for the introduction. Microphone on? I think it is. All right. Let's get started. So this is a modified version of my job doc. I got rid of most of the fluff, and I don't think I need to motivate the need for verification. I'm sure everyone here has had some disturbing experience with software failure. My recent experience was I was playing Diablo, which is a computer game, and I was about to kill the big boss when that program crashed. And that was pretty disturbing. I had to work up all the way again. So not only is software everywhere, software is also getting concurrent. Intel just released an 80-core chip for teleflow computations that might be coming to desktop soon. And as software gets more concurrent, the bugs are getting more complicated and we need verification of concurrent programs. As you all know, testing is not very effective when it comes to concurrent programs because if you execute the same trace with one [inaudible] the bug doesn't show up, but a different [inaudible], the bug -- there's a crash. So even when you fix the [inaudible] to a concurrent program, there is still a lot of non-determinism and you're not sure if you tested the program enough that you've ruled out the bugs. So not only is testing hard, but recently concurrent software is also hard. So this is -- I'm going to tell you the Bluetooth driver story, which is catching up in the verification community. In 2004 Kadid [phonetic] and Wu [phonetic], they did the [inaudible] case, and they found a bug and assertion violation in the version of the Bluetooth driver, and they released a fixed version for it. In 2006 the same assertion violation was found in the fixed version. In 2008 the same assertion violation was found in the fixed fixed version. Now, this story has to be taken with a grain of salt because every time a new fixed version was proposed, some part of the story was lost. So this bug was essentially violated in OS invariant, which was -- which they were not aware of and which was implicit in the previous work. But in any case, we know reasoning about current programs is hard. So what I'm going to be talking about is my thesis work on verification of concurrent programs and my contributions to it. I know that in the abstract I promised the verification of machine code as well, but I left that out just to spend more time here, and I can talk about verification of machine code offline. All right. So let's look at the problem of concurrency a little harder, a little more closely. So in terms of testing, we can say concurrency is hard because even with a fixed input there can be an exponential number of behaviors in the program. That's why testing is hard. You need to run the program so many number of times. But in verification, whether you're doing verification of sequential programs or concurrent programs, we anyway have to reason about an unbounded number of behaviors. So what's an exponential factor of an unbounded number. So the way to answer that question is let's look at how we actually verify programs. We typically use some sort of abstraction, which is you consider an old approximation of the program and you try to verify that or approximate it more. And what happens is even with the common abstractions that we have used for sequential programs, the presence of concurrency makes the analysis harder even for those abstractions. So if you take sort of the simplest kind of abstractions, which is finite abstractions, in the sequential case it's all [inaudible], but in the concurrent case it becomes either PSPACE-complete or undecidable. And what that means is even with the abstractions we have been using, the ones we are familiar with, we can't use them when it comes to concurrent programs. So the solution we choose is that we're not going to do full verification of concurrent programs. We're going to be somewhere in between being sequential and concurrent, which is what we call context bounded concurrency. So context bounded analysis is the analysis of concurrent programs when you fix a bound in the number of contexts that can happen between different threads. So what Shaz had proposed in his previous work was to bound the number of context switches, but we have a slightly different notion, which is bounding the number of execution contexts. These two notions coincide for the case of two threads, and I'll start distinguishing them when I talk about multiple threads. So there's a history to context bounding analysis. I think Shaz started it with his tool chest that did verification of concurrent programs with two context switches , and they found a lot of bugs in drivers, and they had a lull which said that for finite data abstractions, the problem of context boundary analysis is decidable. And subsequently they have this wonderful tool chest that does systematic enumeration of all interleavings given a test harness, and they found many bugs in a few context switches. So this leads to discussion what can we do if we don't fix the test harness, if you don't fix the test input, can we still do verification under a context bound. So my work starts here. First thing we showed was that this bubble is [inaudible], it's NP-complete. Then we gave decidability under more complicated abstractions, including infinite [inaudible]. But except for KISS, all this work was still theoretical. And we subsequently extended to do practical verification of concurrent programs. And these two worked, so I'm not going to be talking about the first one, which is -- which involved an interesting mix of automata and matrices, but what that showed is this decidability, the third one, that showed us that the world of context bounded analysis is not that much different from sequential analysis. And subsequently we showed that we can in fact do context bounded analysis using sequential analysis, and this will what I'll be talking about. So just to summarize what context bounded analysis has done for us is that it's added a third column to our table saying that full concurrency was too hard, but context bounded analysis has a better hope of scaling. All right. So let's just get down to work what we could do with context bounded analysis. . So we tried to test two hypothesis. First that we wanted to show that most bugs can be found in a few contact switches because only then would context bounded analysis really be useful. And also in terms of components, that context bounded analysis can compete well with full verification techniques. So we did our study in this tool called DDVerify, which is like [inaudible] except that it works on concurrent programs. So it has a predicate abstraction front end that produces a concurrent boolean model, so every thread in the program is abstracted to a boolean program. And then the way they've structured it right now is that this Boolean model is [inaudible] to that model checker SMV, and that either says that the program is correct or it returns a counter example and there's a refinement there. And what we did was we took out the model checker SMV and we put our own model checker that does context bounded analysis and we fed it some context bound K. And we compared what SMV could achieve, which is what we could achieve. And this is what we found. So every dot here is a different [inaudible] example. This is the time our tool took, and this is the time that SMV took. It's a log log plot. And every dot is classified into two categories. Either that SMV said that the program was correct, meaning that there were no bugs in that program, or SMV said that the program is buggy. So the [inaudible] of whether the program is correct or buggy was as SMV said. And what we find is, first of all, we were about -- median speed-up was about 30 times -- okay, so for all the bugs, all the dots that are buggy, all the cases that are buggy, this number says the number of contexts, which is before we found the very same bug. And what this shows is that our tool with a bound of K equals 7 was able to find all the bugs that SMV could find. So what that means is -- so in this case all bugs occurred in six contexts switches or fewer, we were able to find it, and with a bound of K equals seven, we were 30 times faster than SMV. >>: So one question. For the correct ones, what was the K you fed to ->>: Akash Lal: K equals 7. >>: You didn't have a K, right? >> Akash Lal: No. SMV is full verification. Okay. So that's what we could do. Now let's get down to how we build our tool that does context bounded analysis. And the way this is structured is that you take a concurrent bound -- a context bound K, take in the current program, and we do a reduction into a sequential program such that checking properties of this sequential program is sufficient to conclude properties of this concurrent program aren't in that context bound. Yes? >>: And that's samples that you have, how many [inaudible]. >> Akash Lal: Most of them will have [inaudible], but they can have more. >>: And you expect this -- so [inaudible] times speed up. >> Akash Lal: Uh-huh. >>: Do you expect that number to increase, your speed-up to increase as you increase the number of [inaudible] or decrease? >> Akash Lal: I would -- well, that's a good question. It could really depend on the program, but more threads I would expect our speed-up to increase. I will give you some evidence later. Yes? >>: [inaudible] this data K equals 7, why 7? Did you try 6 and then there are lots of [inaudible]. >> Akash Lal: Yes. That's why K equals 7, because I knew I could find all bugs, and I had to give an odd number. Yeah, that's fine. So ->>: [inaudible]. >> Akash Lal: So the programs that DDVerify produces, the Boolean programs, they have all procedures in lined. So all the Boolean borders are single procedure. And they had to do this because SMV could not handle procedures. But our tool actually can handle procedures. So there's hope of extending DDVerify so that it produces more complicated models. >>: So, for example, what you did was you took the [inaudible] procedure [inaudible] and then the context bounded analysis ->> Akash Lal: Yes. >>: You did not change the translation so that ->> Akash Lal: No. [inaudible] so I could not do that. >>: Did you do any comparative with [inaudible]? >> Akash Lal: I have. I will show them later, later on. Anything else? So I have some more results later on. All right. So we do this reduction to sequential programs, and what that means is that we can now borrow all the cool stuff that we did in the sequential world and apply it to the concurrent world. And whatever I describe next, we only focused on safety properties and that do only assertion checking. And if we have other sorts of properties like, for example, the array should never be cleared between an insert and an increment of the length, then we instrument programs so that what we verify are just assertions. So this is just to say that we are only going to worry about assertion checking and other properties can be reduced to it. Okay. So start off by considering just the case of two threads. The model of a program is that these two threads have their own local memory, and they communicate through shared memory. And in whatever I describe next, it will be useful to think of a thread as just describing a transition system. So it takes in a shared state S1, its own local state, L1, runs for a while and produces this -- and changes the shared state to S2 and changes the local state to L2. So every thread is just describing a transition system. Now, in the case of two threads the execution will proceed with the control alternating between the two threads. So first T1 goes, then T2, then T1, then T2, then T1, and so on. And the problem with the analysis of comparing programs is when you do execute one thread, so T1 executes and changes S1 L1 to S2 L2. Now, the shared state has to go here to T2. It needs to know where it has to start from. But the local state has to be saved and then restored when T1 gets execution back again. And whenever there's a split in the way the data is flowing, you risk losing correlation between the two. And that happens because what we're doing is that we're not propagating a single state but what we're propagating are sets of state, set of all reachable states. So when we feed in S1 L1 to T1, it can reach this huge number of states, SI LI. And the set SI has to be passed to T2, but the set LI has to be saved and then restored. But what happens here is that you list the correlation between which SI or which SJ prime is related to which LI. And the way that's typically solved is that you start pairing local states. So you put in the local state of T2 in your state [inaudible]. So you say it's going to be S1 L1 N1, and now you don't have any forks, but what happens is now you started pairing local states, and that gives you a state space explosion. So you have an exponential growth with the number of threads. And that's what makes concurrent analysis hard, because you have to consider the local states of all of the sets. So the way we're going to solve our problem is slightly different. What we say is that -- so this fork is bad. So we're not going to do a fork, and instead what we're going to analyze is that we're going to analyze T1 first, be done with it, and then we'll start analyzing T2. And the way we're going to simulate context which is in the execution of T1 is that we're just going to guess what T2 did. So during the execution of T1, if you want to do a context switch, we just change -- we just guessed a new shared state, S3, so we just guessed that the execution of T2 is going to change the shared state from S2 to S3, and when we finally get to T2 we're going to verify whether the guess was correct or not. So what we're going to do is we're going to guess K minus one global states where K is the number of context switches. So we say S1 is the initial state of the shared memory where the program is supposed to start execution, and S2 to SK are guesses. So basically they're just arbitrary shared states. Sorry. K is not the number of context switches, K is the number of chances that each thread gets to run. So the number of context switches is 2K minus one. All right. So the execution of T1 will look as follows. You start -- we start execution of T1 with S1 L1. So we use the first state, S1, we start it off, we let it run for a while, and at some non-deterministically chosen point in time we just use our next guess, S2. So we change the shared state from S1 prime to S2, and we let it run again, then we use our next guess, S3, and so on, until we exhaust all our guesses. And then we pass control to T2. And now for T2, these S1 primes, these prime states are going to act as the guesses. So you start with S1 prime, let it run for a while, and then change it to S2 prime, let it run for a while, and so on. And what this means is that right here we've guessed that the first time that T2 gets a chance to run, it has to change the shared state from S1 prime to S2. But it actually changes it from S1 prime to S1 double prime. So we have a check in the end that says S1 double prime should be equal to S2, meaning all the guesses that T1 made, T2 actually satisfied those guesses. And if you pass all these checks, then we have these arrows equating these states, and if you get rid of the [inaudible] arrows, what we have is a [inaudible] execution of the concurrent program. And the good thing here is that the local states are short-circuited. They did not have to be saved and restored. They were just reused during an execution. So one thing that -- so this is the strategy we're going to use, a guess and a check strategy. The next insight we use is that guess and check is not something that's new to verification of programs. In fact, it's done all the time for the verification of sequential programs. So the problem for verification of sequential programs is that you need to guess an input for which the sequential program reaches a bad state. So you have to guess an input and check if a bad state was reachable. But in fact, the tools that do analysis of sequential programs don't do a guess and a check, they actually find the input for which a bad state is reachable. So we're going to use this intuition to do our guess and check strategy. And when we do that, we're pushing all the guesses into the input state, and then we're going to ask sequential analysis tool to find the input for which a bad state is reachable instead of just guessing it. So what this means is that the final sequential program that we produce is going to have K copies of the shared memory as input. So if the original concurrent program had four variables -- w, x, y, z -- and let's say K equals three, then the input to the sequential program is going to be three copies of the shared memory and an extra count of I that's going to count from 1 to 3. And the way we change the data, we change the code, is as follows. So let's say st is a program statement, then tau of st, I, is a syntactic renaming of that statement where every global variable g is changed into its ith [phonetic] copy, and the local variables remain unchanged. And the way we change -- transform each statement is given a statement ST, we produce this block of code in the sequential program. And what that does is it says if the value of i is 1, then I'll put it on the first copy of the shared memory, if the value of i is 2, then I'll put it on the second copy of the shared memory, and so on, until i equals K. And this is the code that simulates the context switch. So at a non-deterministic point in time we estimate the value of i and start using the next copy of the shared memory. And if i reaches the bound, then you set it to one and you jump to the next thread. The way the assertions are transformed is that we add a new Boolean variable no error, and that basically just records the value of that assertion in the current value of i. So what we do is we take the two threads, produce the sequential versions using the transformation I described on the previous slide, and then the concurrent program in which T1 and T2 are executing in parallel is going to be this sequential program in which you execute T1S, then you execute T2S, then you have a checker, and then you assert that there was no error in the execution. So the way the sequential program executes is as follows. So the value of i starts with one. In that case T1S is going to do exactly what T1 did, but on the first copy of the shared memory, then it reaches some point at which it increments the value of i. So i becomes two, then it's just going to start operating in the second copy of the shared memory. And then i becomes three. It operates on the third copy. Then you pass control to T2, reset the value of i to one. T2S operates on the first copy of the shared memory, and it operates on the second copy, and it operates on the third copy. And then you pass control to a checker that's going to check that this state was equal to that state. So in this case T1 assume that T2 is going to change the shared state from this guy to that guy. It will actually change it from this to this, so you check whether these two are the same or not. So this checker just checks this equals that and this equals that, and if the checker passes, then this is a reachable state of the concurrent program and we can do our assertion checking on this reachable state. So the good things about this reduction is that the local state of the program does not increase. So the local state of this sequential program is only the sum of the sizes of the local state of the original threads. A nice would have a product here. So you got a product down to a sum. In terms of static analysis, the global state of our sequential program has K times the number of variables because we had to introduce K copies of the shared memory. In terms of model checking, the shared space grows exponentially with K because K times the number of variables means that your shared space increases by a factor of K. So this sort of means that there is no free lunch. The problem is in the completing K, and we're sort of bound to have an algorithm [phonetic] that's going to be exponential in K. And the main idea that we've used here is that we've reduced the control non-determinism that comes from concurrency into data non-determinism, and data non-determinism is something that's already handled in the sequential world. All right. So the situation changes slightly when we have multiple threads. So let's continue the case when we have n threads and K context switches. So the problem in that case is that there are too many thread schedulings that can happen in such a program. So you can have either T1 goes first, then T5, then T2, then T4 or you can have T1, T3, T4, T2, and so on. You can have n machines one to the K number of thread schedulings. And just enumerating each of them is going to make even the best case complexity exponential in K, and that's something that want to avoid. So our solution that we're going to consider only one thread schedule, and that's going to be a round-robin thread schedule of length n times K. So in this round-robin schedule every thread gets K chances to run. So it's K execution contexts per thread. And the properties of exploding just this one round-robin should do is that it considers strictly more behaviors than all thread schedules with K context switches or fewer. And the reason for this is as follows. Supposing we have this thread schedule which is not round-robin and has two context switches. This thread schedule simulated by a round-robin schedule of length two times three, which is this one. But in this round-robin schedule, what happens is T2 decides to do nothing when it gets a chance to execute, then T1 here decides to do nothing and T3 here decides to do nothing. So this way round-robin schedule can simulate other schedules which are not round-robin. So what we have is if we consider one round-robin thread schedule of length n times K, then we consider all behaviors with K context switches or fewer, but we can get also other behaviors that have more than K context switches because this schedule has nK minus one number of context switches. And the other good thing about using this round-robin thread schedule is the sequential program is -- producing the sequential program that simulates this round-robin schedule is easy. And that just looks like that. So you just run T1 first, then T2, then T3 and so on. And the reason why this works is as follows. When T1S runs, it operates on its K copies of the shared memory, and because in the round-robin schedule T1 always passes on the state to T2, we just have to run T2 next. So when we run T2 next, it just picks up what T1 left it at. So in terms of this timeline, what we had to do here was we sort of did the execution of T1 for its three execution contexts, the three chances that it got to run, and T2 we know has to start off from where T1 left off. So T2 has to go next. And similarly, T3 has to go next. And T3 always picks up the state from T2. So the one [inaudible] was we only had to make -- the number of cases we had to make was not proportion proportionate to the number of context switches. We only had to guess here where T1 started off its next execution context. So the guesses are only going to be this state and that state because we know where T2 is going to start off and we know where T3 is going to start off. So even though we considered it a round-robin schedule of length n times K, the sequential program has only K times more variables. It only needs K more copies. >>: It's a little confusing. In your second bullet you go from T1 to Tn, but then you have to again execute T1, right? >> Akash Lal: Here? >>: Yeah. >> Akash Lal: Yeah. So in this concurrent program, yes. >>: Huh? >> Akash Lal: In the concurrent program, yes. But not in the sequential program. The reason is when we run T1, we're going to run it for -- we run all its execution contexts at that time. So in this picture let's say the value of K is three. >>: Yes. >> Akash Lal: So T1 gets three chances to run. So T1S operates on three copies of the shared memory. So it already does all its execution contexts when it gets to run. And when it's done, then T2 does all its execution contexts, and then T3 does all its execution contexts. So what we have is a round-robin -- similar to a round-robin schedule in which every thread got three chances to run for three threads. >>: So but to simulate K, all executions of K context switches, you have to give each thread n times K chances to run, right? >> Akash Lal: Just K chances to run, not n times K. >>: Just K chances to run? >> Akash Lal: Right. Because if you have K context switches, a thread can at most get K chances to run. >>: Sure. Okay. That's right. >> Akash Lal: So we've given every thread K chances to run and we're considering a round-robin schedule in which all of them get K chances. Yes? >>: Every time you get a chance to run, you could be starting [inaudible] and ending [inaudible]. So I don't see any -- I don't see why you have to run them nK times. >> Akash Lal: Um ->>: Because you stop non-deterministically at any point in time and then you ->> Akash Lal: So one thing is each of these threads know where they're going to start. So like, for example, T2 knows that it has to start from a state that T1 left you at. So T2 is picking up everything that T1 did automatically. So ->>: I'm just -- let's forget the second and third execution of T1. >> Akash Lal: Uh-huh. >>: Is there any behavior that could appear in the second part which could not appear in the third part? I think that's [inaudible]. >> Akash Lal: Well, I mean, it could do more here, right? So, okay, so this is in terms of the global state. But what is kept is the local state. So when we switch to use the next copy of the shared memory, we're reusing the local state that it left off at. So the local state is constraining how T1 is going forward. Even though the global state is sort of reset here when you start using the next guess, the local state is saved. You have to reuse the same local state. So if there were no local state, what you're saying is correct, that all these three would be isomorphic to each other, just repeating the same thing twice. But it's the local state that sort of ties them together, constrains them to one particular behavior. >>: But every possible initial local state for the third [inaudible] could have been a possibility [inaudible] of the second. >>: I don't think that -- he's not doing that from the local states. What you're saying is true only for local states. He starts from [inaudible]. >>: I know. But let's say you start it from statement one and execute it to five and then second [inaudible] you execute it from five to eight. In the very first shot you could gone from T1 to ->> Akash Lal: Right. But the difference is that they will state reset, a new global state at statement five. So in some ->>: In one, execution would be executed from a different state. In the other case, one [inaudible] is executed without any interruption. Right? >> Akash Lal: Maybe. So ->>: Okay. I guess there could be -- I mean, if you want compute some of these, you could do something better, some [inaudible]. >> Akash Lal: We are going to compute [phonetic] some of these. I will come to that. In some sense what's happening here is that when we're running T1, we're considering a vector of global states. So when we give it three chances to run, we're asking it to tell us what's the three sequences of global state changes that it can make. So when we're going to summarize the behavior of T1S, we're going to summarize it as a strength of length six, which says that there is an execution of T1 that goes from state this to state that, and if there's a context switch that leaves it at this state, then it changes the state to that one, and if there's a context switch that changes the state here, then it can leave it at that one. So I will come to that a bit later. All right. Just to summarize what we did for multiple threads, so we compared two things. Either considering K context switches and all thread schedules with K context switches versus what we said, which is K execution context per thread and round-robin thread scheduling, then we get strictly more behaviors. We only considered one thread schedule, so we avoid this exponential factor. And the complexity remains the same because in either case we had to make K guesses and not more. So it's [inaudible]. All right. Some more results. So another property of the reduction with multiple threads is that because the number of guesses are only dependent on K and not on n, the size of the sequential program only linearly grows with the number of threads. So you sort of expect that if you have more threads, your analysis should only grow linearly. The analysis time should only grow linearly. And that's sort of -- there is some evidence here. So in this graph we took a Bluetooth driver. I think it's a fixed fixed fixed version of the Bluetooth driver. These are the number of threads. We vary the number of threads, and we give each thread four chances to run. So we're systematically adding more and more threads, and the time grows linearly at least for a little bit. [Inaudible] jumps up. We don't know why it jumps up. But at least it was linear for some time. This sort of says there's no free lunch. In this case we fixed the number of threads to be three and we vary the number of execution contexts per thread. And the time grows exponentially with the number of execution contexts per thread. So this is a comparison with SPIN. So we -- so there is a benchmark suite called BEEM, B-E-E-M, that has a whole bunch of examples meant for explicit state model checkers like SPIN. So we took some common mutual exclusion algorithms [phonetic], and this is, I think, a network protocol. And the benchmarks in each case has a correct and a buggy instance. And what we found is that we found in the buggy version we always were able to find the bug with a small number of execution contexts per thread, either two or three. And we compared [inaudible] with SPIN. >>: So this number of shared is the number of Boolean shared variables? >> Akash Lal: Boolean shared variables. >>: So the Msmie has 23 Boolean shared variables at 20 threads and SPIN version at 31 seconds? >> Akash Lal: There's not a concurrency here. So one of the ->>: [inaudible]. >> Akash Lal: I think so. >>: Okay. >> Akash Lal: I think so. One thing to take out here is -- another thing to take out is -- so SPIN, because it's explicit state, it has a termination. In the set of states is finite, then going to terminate. But sort of don't have any such guarantee when we do context bounded analysis. We sort of don't check if we already visited the state before. So, I mean, if you keep increasing the bound R, time is going to keep growing even though it's exploding the same states again and again. So that's something we need to put in in the future. >>: So I missed one thing. So are you actually [inaudible] the analysis on the sequential program that you run? >> Akash Lal: Right. Okay. So that's a good point. We do symbolic model checking. So we used [inaudible] that does -- that uses [inaudible] to construct function summaries and used BDS-based analysis. >>: [inaudible] when you could have used a [inaudible]. >> Akash Lal: We could have used a [inaudible], but we just supported one thing. Yeah. So sort of this comparison isn't fair because we're doing symbolic model checking here, whereas SPIN is exponential state. So -- okay. So one of the downsides of this reduction is all these guesses that we make. And these guesses lead us to -- most of the times will lead us to behaviors are actually not possible in the concurrent state. So we're going to explore a lot of redundant behaviors when we explore T1S and T2S, but the checker is going to rule out most of them, and only a few will be valid concurrent behavior. So what we're going to do is we're going to saying that, okay -- the way we're going to solve this is we're not going to analyze them in this order, but we're going to analyze them side by side. And the way we do this is by constructing thread summaries. So every third one we construct thread summaries and we're going to build them for T1S and T2S at the same time. So a summary is going to look as follows. So a summary of T1 will have a value of K, which is the number of execution contexts we've considered so far. G1 and G2 are vectors of global states, shared states, and the length of those vectors is equal to K. So if the value of K is 3, then the summary is going to look as follows. You're going to start from -- the summary says that T1S, when started in this state, can leave us in this state. So the local state is not copied, so we have just one local state. But there are K copies of the global state. So this is a summary, what the summary means. And the way these summaries are processed are either they're processed in an intrathread fashion, so if there is a statement ST in T1S, so we know every statement operates on the last copy of the shared memory. So if st is something that can take GL2 to G prime, L3, meaning that it can take this state, modify it to that state, then this is how a summary is updated. Then the summary can go from G1, L1 to G2, G prime, L3. So G1, L1, G2 prime, G prime, G2, G prime, L3. >>: So you have G2. G2 is a vector, but G is just a single ->> Akash Lal: So G2 is a vector of 2 in this case. So this bracket thing is a coupling operator. >>: Oh, okay. >> Akash Lal: So we say that if G is the last component of this vector, so if G is distinct, and the statement st changes G to G prime ->>: Let me ask my question again. >> Akash Lal: Yeah. >>: Is the size of G1 one greater than the size of G2 vector? >>: Akash Lal: Yes. >>: I see. Okay. >>: Akash Lal: So particularly this is just an -- this is all that you need to know. It's just an intrathread update of the summary. Okay? And the way the eager analysis work is that it updates the summary as follows. It says that if we have for a value of K, the summary says we can go from G1 to -G1, L1 to G2, L2, then the incremental value of K, meaning that we will consider the next execution context, and we're going to extend our vector of global state by any state G here. So this rule applies for all G. So this is like making a guess. For the next execution context we make all possible guesses. So in this case, this is what the eager analysis will look like. But what we're going to do in our lazy analysis is that we're not going to update this E1 for all G. We're only going to update it for those G that actually occur on the value at concurrent behavior. And the way we do it is as follows. So the rule is going to look the same as before as far as eager analysis. So all I've done here is that I've expanded the vector to be where the first component is in it, which is the initial state of the shared memory, and we're going to update this component -- I should use the pointer -- we're going to update the first component with G and we're going to update the second vector with also the same G. But the way we pick this G is going to be different now. So this E1 says that T1 can go from innate T L1 to G2, L2. Innate T L1 to G2, L2. And now we're going to look at the summary of T2, what T2 can do. So in this case the summary says that it can go from T2, L3 to DGL4. And what we've done is we have an implicit checking step. We've said that this G2 equals that G2. So G2 used here has to be the same as the G2 used there, which is one of the steps that the checker needs to do. The other thing is that this D equals this D, because what that means is that there is a valid concurrent behavior here that goes from innate to the first component here, which is the same as the first component there, which goes there, which is the same as this one. It goes there, that, that, and so on. So this condition enforces that we're going to pass the checker, which means that we have a valid concurrent behavior, and in which case this G is going to be that G. So we extended our summary using just that G. So it's similar -- so this is the only rule that changes when we go from an eager analysis to a lazy analysis, and this enforces that all the guesses we made are actually valid guesses, so we don't need a checker in the end. >>: Can you [inaudible] for example, run the summary computation for the different threads? >>: Akash Lal: Yes. >>: On separate -- in separated threads [inaudible]? >>: Akash Lal: Yes. >>: The second [inaudible]? >>: Akash Lal: Computation [inaudible]. Yeah, so that holds more true for the eager analysis, because there is absolutely no communication between the summaries of the different threads. Then you get to the checker phase. So you can construct the summaries of each of the threads independently. >>: I see. >>: Akash Lal: Whereas, in the lazy analysis, you sort of have to communicate every time you increment the value of K. >>: Okay. >>: You wouldn't get that if you were using a sequential -- I mean a tool or -- for analyzing a sequential program that would be compositional? >>: Not quite, because it's case sensitive. >> Akash Lal: Yeah. So ->>: [inaudible] I don't know. >> Akash Lal: Yeah. So ->>: [inaudible]. >> Akash Lal: Right. So we can figure out a way of doing this using a [inaudible] to a sequential program because we're sort of -- once we produce the [inaudible] program, we're sort of restricted to analyze the program in program order in some sense. But there has been subsequent work. There's going to be a paper at KAV [phonetic] that gives a lazy reduction. So analysis of the sequential program that they produce actually corresponds to our lazy analysis. But there's a slightly different reduction. Yes? >>: How is this different from [inaudible]? >> Akash Lal: This is [inaudible] increasing the context [inaudible]. Yeah. But you sort of do it in lockstep in that you know that these two summaries are being computed side by side. When you are doing incremental value of K you increment each of them and then you go forward. >>: Another nice thing about this is that it is, like what someone was saying, completely [inaudible]. >> Akash Lal: Uh-huh. >>: You can just keep chugging along ->> Akash Lal: Right. >>: -- and you get to reuse whatever happened in previous contexts. And the other thing, if you just construct a sequential program, you run it for K. Then if you want -- if you didn't find a bug and you want to run for it K plus 1, now you have lost all work. You have to transform the program again and run it. >> Akash Lal: Right. That's true. >>: [inaudible]. >> Akash Lal: Yeah, you can do that, but if you're producing a new sequential program, then sort of you're relearning the sequential analysis again. If you have a smart sequential analysis that knows how the program changed, then it's okay. >>: [inaudible]. >>: But the behavior that -- the behavior that you observe is K plus one either it would be a superset of what you [inaudible] observed for K. >> Akash Lal: Yes. >>: And so in that sense it's [inaudible]. >> Akash Lal: Yes. >>: So it could be engineered. >> Akash Lal: Yes, it could be. Definitely. >>: It's not like K plus 1 is a completely different program. >> Akash Lal: Right. >>: [inaudible]. >> Akash Lal: All right. Let me get a little bit into future work. Some of the things I would like to be working on at [inaudible]. So I told you of the Bluetooth driver story. What I didn't tell you is why those bugs were missed. So this was KISS, so it did analysis for two context switches. This was a tool that go up to more context switches, so they considered four and found a new bug. This considered two stopper processes, whereas both these previous work considered this one stopper process. And they were able to find another bug. So what this -- this is one limitation context bounded analysis is always going to have is that you're always going to miss bugs. There's always potential of missing bugs. And what I would like to do is even for -- my hypothesis is that even for doing full verification, context bounded analysis is a good approach in getting there. So there are a few observations that sort of lead me to believe this. The first is a study that Shaz did in the early IO7, which was to say that doing context bounded analysis, you cover a large portion of the interesting behavior in a few context switches. So if you have a handle on most of the behaviors, can you make a guess on the set of all behaviors. And the second observation we made is that mutual exclusion invariants can often be found just by looking at behaviors with two context switches. And I'll give you an example in the next slide. But the hypothesis here is that you're going to use context bounded analysis, get to know something about how the program is behaving, then generalize from that and get an abstraction for doing full verification of the [inaudible] program. So let's look at how CBA can help in mutual exclusion. So this is a program in which the access to X is always predicted lock lck. And if you look this, the sort of invariant we want to prove is -- or the negation of the invariant is that a data race can occur whenever a different thread, when it's started with the lock acquired, then it can eventually access x. So if you start a different thread with the lock acquired and it accesses x, that means there can be a data race in this program. Or you start the thread with lock equals zero, then it's able to access x and keep lock to be zero. So this is sort of the negation of the invariant that says you can only access lock -- you can only access x when lock is acquired. >>: I don't understand what you're saying here. Is that [inaudible]. >> Akash Lal: Definitely one thread, yeah. So we know what this thread looks like. And we want to know when can there still be data races on this thread, not knowing what the other threads are. So this is the obligations on the other threads. So there will be a data race if the other thread, when it's started with a value lock equals 1, then it's able to access x. >>: Why would there be data race on that thread? >> Akash Lal: Because the execution of T1, you can get to this point, the lock is acquired, then you switch to the next thread, and then because it satisfies this, it's going to access x, and then you access x here and then you have a data race. >>: Okay. >> Akash Lal: And similarly, the other thing here is that you context switch here, so the value of lock is zero, then the second thread accesses x but keeps the value of lock equals zero, then you can come here and then you can access x. So these are the things that you would want to find out looking at this program. And context bounded analysis allows you to find this, and that is as follows. So we consider T1S, so we know the source code of T1, so we consider T1S with K equals 2. We're giving it T1S, T1 two chances to run, and we're going consider the set of its behaviors from the start of the execution until the point when it accesses x. And because K equals 2, T1S operates on 2 copies of the shared memory. In this case the only shared thing is lock. So we have two copies of the variable lock, and so initially the value of lock is zero. And then we want to fill in these question marks. So basically we can start doing the summary from start to access of x, and we look at what the summary looks like. So if you run the analysis, you're going to find two things in the summary. One is this, which says the value of lock is zero, it remains zero. There's a context switch, lock is zero, and then lock is one, which is actually this invariant. Because it's saying that -- so this thread can get to an access of x. When there was a context switch, that then chained the value of lock. So there will be a data race when the other thread can access x starting in log equals zero remains log equals one. So this basically is putting in an assumption of the next thread. If the next thread can do this, then there will be a data race. And the other thing we find in the summary is this, that the execution of T1S goes from zero to 1, then there's a context switch in some state, and that's supposed to be this. So basically it means that if we start the [inaudible] thread with lock equals one, no matter how it changes the value of lock, if it's still able to access x, then there's going to be a data race. So the thing to take out here is by looking at T1S, you know, the obligations of what the other threads should be -- can do to reach error. So you get an order approximation of what things can force you to an error, and once you have these invariants, and the idea is to abstract, construct this sort of predicate abstraction based proof where we just abstract the other threads using just these invariants. It's sufficient to check these invariants on the other thread, and then we know that there are no data races. >>: But what I don't understand is that you are trying to find some assumption on the environment that is going to help you prove the property locally. >> Akash Lal: Yes. >>: What prevents you from [inaudible]? >> Akash Lal: I don't know the answer to that. So the summary we construct here is a state of all behaviors of T1. Right? So we're considering all behaviors of T1. And the things we find from here, these two things, they're sufficient to rule out all bugs for two context switches. >>: So what you're saying is that maybe you are trying to find out the most [inaudible]. >> Akash Lal: Right. >>: And then you're saying there's a minimum constraint, the weakest constraint on it somehow. >> Akash Lal: Right. >>: Okay. I see. >> Akash Lal: So the thing to take here is when you do find this weak invariant, it sort of can give you the right thing. But the only thing that it ensures is that if you do enforce this on the other thread, then you're ruling out bugs for two context switches. But the hope is that they're still [inaudible] enough that they're going to allow you to verify for any number of context switches. So it's basically you find assumptions using an under-approximation and those assumptions just happen to be, like, for any number of context switches. >>: [inaudible] write an assumption pretending that there are lots of context switches. I mean, when you come up with it, right, you will have, like, X0, X1, X2, something like that, some finite number of variable, but [inaudible] ->> Akash Lal: Right. >>: -- [inaudible]. >> Akash Lal: In this example there's sort of just one context switch here because you're considering just one run. But you sort of look at what's important here, lock equals one, lock equals zero, and then you use those predicates to abstract the program and hopefully that should work. >>: Okay. >> Akash Lal: Maybe. So this was a simple example in which we had a simple variable lock, but the same thing applies in a more complicated setting where in this case there are two variables, old and state, and they together enforce mutual exclusion on the access of variable x. So this is Decker's [phonetic] mutual exclusion, and basically there's sort of an implicit lock, and the lock is acquired when the value of old is 0 and the value of state is one. So the same reasoning also applies in more complicated settings when you're not really sure what the locking invariant is. All right. So something else I want you to do, which has already been done partially, is to apply the context bounded analysis on low level code. And one thing that's missing in the reduction that does not allow us to do this is that you still need to know what is shared and what is local. Because if you assume that everything is shared, then you're not really going to have an analysis that's going to stay scaled. So you need to have knowledge about as much state that is local. And for programs like binaries, it's not really clear what's shared and what's local. So one of the other things that bugs me a lot is writing stubs for the operating system. All static tools will have some stubs for modeling the operating system, modeling the environment or some such thing, and they are never published. You never know which ones are the right ones. But they're extremely relevant for the performance of the tool. So what I'd like to do is to automate stub generation in some sense so that everyone -- to sort of have a benchmark of which stubs are relevant for which kind of properties so that everyone can use the same stubs. And the way -- the challenges here is as follows. So like, for example, let's suppose this is the code for writing to a file. This is the actual operating system code that writes to a file. And supposing we're only interested in the property of knowing the size of that file. So from this code, and knowing the property, we would like to produce this code, there are only these increments to evaluate the length of the file. If you're only interested in whether a program produces a UNIX or a DOS file, then you're just interested in whether the input buffer contains a return, and in that case it's a UNIX file or a DOS file, whichever is the case. So the challenges here is, this is sort of like doing partial evaluation with respect to a property and also instrumentation. So in this case this was a new flag that was [inaudible] saying whether file was a UNIX file or not. So you want to do instrumentation of the property and you want to do partial evaluation. And the goal is to produce a program that's sort of as deterministic as possible. So it's not just abstraction, but you want to have some certain more properties so that the stubs are actually useful when you use them. So the big thing, I think, is to know what we want to do after we do verification. So there has been a lot of work in ranking bug reports that verification tool produces. We know that a user can't go through all bug reports, so there are heuristics that sort of bug report according to elements and then you want to look at the top few. So there's been a lot of work in that area. But there has not been enough work to say what happens when a verification tool comes back and says that, yes, your property does hold. In that case what happens is the user just says, well, I already knew that the property held and moves on. So there's nothing much gained in that setting. And what we would like to do is have a way of making the information persist through different verification runs. Even when you have proved a program to be correct, you want that thing to persist in the program either in the form of some language that humans can understand or in the form of a language that machines can understand so that the next time you do verification with the same program, you know something about what was there in the original program. And then this has interesting things, like do we actually know how to write proofs for different programs, is there a good way of writing them. Most of the tools right now just do an exploration of all reachable states, and even though they can prove properties, it's not really a proof of the program. It's not something that you can write down. So there has to be some work on minimizing proofs, finding something that corresponds to loop invariants for concurrent programs, and so on. In certain settings when you know that all a proof is doing is that it's proving initial exclusion, then it's just easier to write it as saying that, look, these things are -there's a mutual exclusion there. And what I believe is that this -- if you know how to write concurrent programs, this can also influence future development work for writing concurrent programs and also for designing languages that allow you to write better concurrent programs. And that's it. Thank you. >> Shaz Qadeer: Questions? More questions? >> Akash Lal: Yes. >>: I note that the agenda on your last slide of this one, the previous one, goes against all your talk here where you were reducing concurrent analysis, sequential programs. So what's wrong with that for being [inaudible]? >> Akash Lal: Well, the thing is, too, we're not able to produce a proof of the concurrent program. So, again, what we're doing is -- analysis is only -- is exploring all the set of reachable states. And there is no way of having that processed, no way of communicating that to the user. Even when I'm debugging the tool, that proof is completely useless to me. I have no idea whether the set is actually correct or incorrect. And if there's a way of -- once you do prove that a program is correct, you can just tinker around the proof and find out some nice properties of what it's actually trying to establish, then there might be a better hope of ->>: But there's a larger problem, which is I think an economic problem. I think that to some extent in the program analysis community, we have managed to put a lot of value on bug reports. And therefore, there's this whole, you know, like, research and industry that has evolved around it. >> Akash Lal: Uh-huh. >>: But to my knowledge, I don't know if anybody has managed to put an economic value on the proofs. >> Akash Lal: Right. >>: I mean, I think that there's a lot of people who believe that, but I don't think that in the widespread software engineering community anybody believes that there's any value in proofs. >> Akash Lal: Would you say the same thing holds for sequential program as well? >>: No. I mean, there's a broader issue [inaudible] concurrence. >> Akash Lal: Okay. >>: Because, you know, the attitude is exactly this. Your tool comes up and says, you know, verify it. What now? The programmer already [inaudible] was correct. >> Akash Lal: Right. >>: What extrapolation do you get [inaudible] the programmer? >>: Because the [inaudible]. So in that case we have no feedback. The tool wasn't able [inaudible]. >> Akash Lal: But there is one difference between sequential and concurrent programs is that at least, you know, it's sort of well established that for sequential programs, what you need is pre, post conditions and loop invariants. And some extrapolation [inaudible] based tools are going to give you that. And that's helpful. And other analysis tools build on that. They know all they need to find is pre, post conditions and loop invariants. But there's no such corresponding notion in concurrent programs. Once we can get a handle on what the proofs look like, maybe we can make more progress there. >>: [inaudible]. >>: [inaudible]. >>: No, this is a quantitative information. >>: That's right. I guess in your analysis the [inaudible] of your analysis is not that oh, your program works pretty good [laughter]. >>: [inaudible] the only thing I feel proofs is going to enable is automated programs [inaudible]. >>: Right. So -- but so far we're saying that what you have said is that proofs are good as [inaudible] consumed by other program analyses, right? And that is good. But the number of people who write program analyses and explore them is very small, right? Most people write software that does something, and there's a larger class of people who don't even write software, they just use software. And I don't know, I mean, can we somehow set proofs for them? That's what I [inaudible]. >>: The other economic argument is [inaudible] all of programming, but I'm sure there are specific areas where people are ready to pay money to get some sort of certification of correctness. Like if you think of a pacemaker or if you think of security critical components of a system. So I think the economic argument is, you know, if we actually could do this, then people would pay for it, I think. >>: Yeah, so the -- I mean, the application that I was [inaudible] and you are willing to pack it up with some warrant that this app is certifiably not [inaudible]. >>: But what prevent somebody from fraud -- from defrauding users? You know, I can just build some [inaudible] and say oh, yeah, I've certified it. >>: But that's assertion problem. I mean if the developers are the [inaudible] agency understands what proofs are and [inaudible]. And if there is a regulation, that won't happen. For example, for the pacemaker, the FDA approves the pacemaker, and if you have a pacemaker with software in it, then that software has to be certified. Currently the certification process basically relies on software engineering principles and basically you have to follow some particular model of software development, and it's not [inaudible]. But one can easily [inaudible] that, you know, you switch from process to proofs. And the [inaudible] is definitely one area [inaudible]. >>: And also, I mean, [inaudible] I mean of the next step. We'd love to have a verification tool or something close to it and [inaudible] because testing, when you stop testing [inaudible] and so [inaudible], including sage, but it costs a fortune, and you never when to stop. It's a huge problem for [inaudible] something even appropriate verification. It's true. I agree with your comment. We're still very early. But we need -- and there's more value [inaudible], I agree with you, but ->>: Actually, I think it's a little worse than that. For example, I think that as a community, there is no widespread notion -- there's no common ground for saying what a proof is. Basically every tool that people come up with will have its own notion of proof, have proofs of them. And no proof-driven tool will ever agree, as far as I know, on what is proof is. Because there will be all sorts of random assumptions [inaudible] because you need those [inaudible]. So I think that to make [inaudible] was talking about a reality, there has to be a foundation so everybody, no matter what technique you use, there has to be some notion of what a proof is. >>: It could be a certificate. I declare whatever. You have your black box, your tool. You say there are no bugs of that type in this stuff. That could be the [inaudible] proof. And then whoever wants can challenge that, right? That's your certificate. Now, how about a proof? Now, I give you, let's say a book basically [inaudible] that nobody would ever look at or I could [inaudible]. So I don't know ->>: What is the certifying agency going to look at? >>: [inaudible]. >>: Machine checkable proof. >>: But, I mean, of course, it's [inaudible]. I was telling you about that lock list [inaudible]. So I don't think -- we cannot write a proof. Forget about the proof. I don't think anyone can [inaudible] a proof of why that [inaudible] is correct. And so when there is concurrency [inaudible] we really don't understand the correct proofs of really intricate [inaudible] mutual exclusion is -- I think that's from [inaudible] and now maybe we understand how correct the proofs are. >> Shaz Qadeer: Okay. [applause]