>> Shaz Qadeer: Hello, and welcome to everybody. It's my great pleasure to welcome Zvonimir Rakamaric here. You all -- I'm sure you all know Zvonimir. He has been an intern several times over at MSR. And he is graduate student at the University of British Columbia. He's going to graduate soon. He's very well known for his work on static analysis and program verification, and for coming up with project names that are very violent. He has worked on projects named HAVOC, Smack, STORM. And any other such violent names? >> Zvonimir Rakamaric: No, that's it. >> Shaz Qadeer: That's it. He's also an MSR fellow. Pardon? >> [inaudible] >> Shaz Qadeer: So today he's going to tell us about some of his recent work on analysis of concurrent programs. >> Zvonimir Rakamaric: Thanks for the introduction, Shaz. So actually maybe 70 percent of this talk will be on analysis concurrent programs, 30 percent of other stuff for my Ph.D. which is on analysis of sequential programs. So as we all know, today software is everywhere. It's running on my laptop, on my cell phone. It's running in the car I took to get down here and then so on. And also software has errors. Software systems are generally large, complex, and prone to errors, and they're also getting larger and more complex every day, especially with the recent emergence of multicores and many cores which requires switching to highly concurrent software. They're also getting more error prone and harder to get right. Software errors are also very costly, according to U.S. National Institute of Standards & Technology. Software bugs cost the U.S. economy only an estimated six-thousand-billion dollars each year. Another example, our estimated damages of Code Red worm, which was caused by a buffer overrun bug in Windows Web server, I think, are estimated to around $2.6 billion. More importantly, software bugs something also cause loss of human lives. And because of that obviously improving software quality and reliability is a major software engineering concern. So on a more personal note, while working in the industry, I realized the need for better development, maintenance, understanding of programs. So, therefore, when I decide to go back to grad school, I made that the main topic of my research. For my master's thesis I worked on a logic and decision procedure for verification of heap-manipulating programs. The logic contained constructs for unbounded reachability in linked data structures. I implemented decision procedure integrated into an SMT solver which enabled theory combination, and using that I verified the number of example heap-manipulating programs. So that was my master's thesis work. We can talk about it offline if you're interested. Here I'll concentrate on the stuff I did for my Ph.D. thesis, which is on checking system software and more particularly concurrent system software. So system software, it's important and critical since it's the foundation between all general-purpose application programs. Typically it's written in C. And it's also hard to get right and hard to verify, mainly because it performs low-level memory operations such as dynamic memory allocation and pointer manipulation, and also it's highly concurrent, with shared-memory communication being the main programming paradigm. So here is before we were -- my talk. So after this introduction I'll talk about the work I did on modeling memory in software verifies and on inference of frame axioms, and that's the part I mentioned which is related to verification of sequential programs. And then the rest of the talk will be on analysis of concurrent programs. And I'll give some conclusions and future work in the end. So before I start, I think you'll understand the material I'll present later better if I give you the overview of the tool flows I built in the past three years. So I've been working on three tools: SMACK, HAVOC, and STORM. So they're all built on top of Boogie. Boogie is a verification condition generator for Microsoft Research which takes BoogiePL programs -- BoogiePL is a simple procedure language, and based on the inputs, Boogie generates a VC which is handed over to SMT solver, usually V3, and based on SMT solver's output, Boogie either verifies the program or returns an error trace. So I developed two independent tool flows. The one on the left is based on LLVM and GCC. I used it to try out some of my ideas at UBC. And the one on the right is based on Microsoft infrastructure, and that's the one I was mainly working on during my internships at Microsoft. So let's see just a couple of slides of the work I did on memory models. So why do we even need memory models. Well, because available memory in today's systems is humongous and a faithful representation of a verifier simply doesn't scale. Because of that, software verifiers rely on memory models that provide a level of abstraction with the usual tradeoff between precision and scalability. And, also, using memory models, we translate away complexities of source language since system code written in C is really messy because of all the low-level operations it performs on heap. In this work I introduce two well-known memory models: Monolithic memory model, which splits memory into disjoint objects, where each objects is defined by its unique reference; and Burstall's memory model, which in addition to splitting memory to disjoint objects also splits memory based on types of memory locations. So I implemented model-based memory models as part of my tool SMACK, and I did the experimental comparison between them. And it turns out that Burstall's is consistently better than monolithic. It gives around 3x speedup on easier benchmarks and solves harder ones that monolithic memory cannot even solve. But, on the other hand, we also pay the price. Burstall's assumes that the memory is strongly typed, which is potentially unsound for low-level C code which can perform type unsafe operations. Yes. >> [inaudible] >> Zvonimir Rakamaric: Excuse me? >> What is one hard ones, solves harder ones? >> Zvonimir Rakamaric: Harder examples. That monolithic memory cannot. So my contribution is this area. So the question, then, is how to ensure soundness of Burstall's in the presence of type unsafe operations while preserving scalability. And the intuition is that most of the parts of the code within C are usually type safe. And my contribution is a complete automatic technique that gives us sound and scalable memory model. So the idea behind this technique is to use lightweight pointer analysis up front eagerly that gives us conservative type information and is also very fast and precise enough. And then based on the type information that's returned to us by the pointer analysis, we can use Burstall's model in the parts of the code when we are type safe and then move to monolithic in the parts where we are type unsafe. And, again, I implemented this technique in SMACK and showed scalability on a number of experiments. >> So you -- this means that you were able to statically partition the heap into two pieces, one that you're going to represent using Burstall's model and the other one you're going to represent using the monolithic model? >> Zvonimir Rakamaric: It means that I can find out when you have to unify types, like memory maps. >> I see. I see. So you start out with the Burstall's and then you keep unifying them until you become sound. >> Zvonimir Rakamaric: Exactly. >> I see. >> Zvonimir Rakamaric: But you can do it -- I can do it eagerly. So it's not kind of refinement. Like I can know that up front. So I unify them up front. So it's kind of like that annotation [inaudible] which does unification, just you can do it up front without any user kind of guidance. So this is just a brief overview. Again, I can talk more about it offline. I want to concentrate this talk on the work I did on concurrency. Now, let's see the research I did on inference of frame axioms. So model verification in software analysis is key to scalability. However, the problem is that it requires user-provided annotations in the form of preconditions, postconditions, loop invariants, and frame axioms. So fame axioms are used to define what is not being changed by a piece of code, and that's relatively easy to do if there are no loops, no recursion, and if we have only scalar variables, but it gets way harder in the presence of unbounded data structures. In practice, in model verifiers user typically specifies what might be changed by some piece of code. So, for instance, in Spec#, HAVOC and SMACK, we have a modifies clause. In JML we have assignable, in Caduceus we have assigns and so on. So these clauses are typically very complex and difficult to write. And that is especially true for system code which performs low-level pointer operations. So here is example -- simple example which illustrates why we need a modifies clause. So we have two global integer pointers -- oops. Sorry -- x and y. We have procedure foo which sets *y to 5. Then we have procedure bar which allocates x and y. It sets *x to 1. It calls foo and then it tries to check whether *x is still 1. Now, if we want to check this assertion modularly, which means without analyzing the code of foo, we have to know that foo doesn't modify *x. And in model verifiers, this is being accomplished by putting this modifies clause on foo. So we are going to assume that modifies clause when checking this assertion, and we are going to check it when we check foo. So now this is a simple example and a very straightforward modifies clause. But even we have a little bit bigger example which kind of operates on arrays and change elements of arrays, modifies clause can get way trickier, as you can see from this expression below, where we array constructors which generate -- yes? >> So I'm just not familiar with this notation. So when you say for foo that it modifies y, can you assume from that that it doesn't modify anything else? >> Zvonimir Rakamaric: Yes. >> Do you have to assume that modifies is complete? >> Zvonimir Rakamaric: So modifies has to be an approximation of the memory locations being modified by foo. So it's maybe not the most precise, but, yeah, it doesn't modify anything else. So, yeah, in this expression you have array constructors, we generate pointers belonging to an array, then you have pointer increments, unions, and so on. So things get very hairy. And the goal of this research is to -- >> [inaudible] that you cannot change what 5 is pointing to or the contents of what y points to? >> Zvonimir Rakamaric: So modifies y means that you change the location pointed to by y. You can change the location. >> [inaudible] >> Zvonimir Rakamaric: Right. Right. >> So if you wanted to express that y has been changed, then you will write modifies [inaudible]? >> Zvonimir Rakamaric: And y. Yes. >> I see. >> Zvonimir Rakamaric: Okay. So here I proposed the Novel algorithm that uses pointer analysis and source code to automatically generate these complex frame axioms. The algorithm is a three-step process. In the first step from data structure graphs that you get is an output of data structure analysis that -- which is a pointer analysis that use. I generate modify sets by marking a set of potentially modified locations in a data structure graph for each function and loop. Then in the second step from these modify sets I generate modifies clauses, which are expressions similar to the one I showed previously by coding these sets into specification logic by traversing data structure graph, graphs starting from the roots. And third step is a straightforward set of generating -- straightforward step of generating frame axioms from modifies clauses. >> Question. >> Zvonimir Rakamaric: Go ahead. >> Don't you have to have a whole program to do this? >> Zvonimir Rakamaric: You do. >> But you never do. You never ever have the whole ->> Well, sometimes you do. For a C program? Are you kidding me? >> So there's a [inaudible]? >> We have a source code for the library. >> You're not online [inaudible]. [multiple people speaking at once] >> Zvonimir Rakamaric: So you can assume the worst case that the clause you don't know can modify anything, which will be terrible. But then you can always go in and write modifies just manually. >> [inaudible] in practice? >> Zvonimir Rakamaric: In practice I wrote them manually. So I'll walk you through these three steps on an example. So here on this picture we have a data structure graph which is going to be an output of the pointer analysis I'm using. And this is basically an overapproximation of points to relation of this loop, of a procedure. So in the graph we have two pointer variables, A1 and A2, which are pointing into object H1 and H2. H1 and H2 are arrays. And they represent unbounded number of elements, and each element has two fields, F1 and F2. It offsets zero and four. Now, we want to generate in this graph a location that are modified by this loop down there. So the first statement of the loop is you're going to then modify field F2 of the area pointed to by A1. The second one is going to modify field F1 of the area pointed to by A2. So, therefore, our modify set has two elements. Each element is a pair. Each element is object [inaudible] pair. So we have H1, 4 and H2, 0. And know that each of these elements -- each of these elements represent -- represents an unbounded number of elements belonging to these arrays. So in the second step based on the modifies set we are going to generate expressions in our annotation language that are going to be modifies clauses representing this set. So we start doing that by traversing these graphs starting from roots. So for H1, 4 we start from A1. We hit an array object, so we use the array constructor to generate pointers belonging to this array. Then we have to increment this stet of pointers to get to field F2, and that's it. For the second one we start from A2. Again we hit an array. In this case increment by zero, and our modify set is going to be union of these two guys. Now, the third step is a straightforward typical conversion of these modifies clauses into frame axioms. Nothing smart going on there; they're just going to be -- blow up into this quantified formula. So implemented this algorithm again in SMACK. And I tried it on buffer-overflow benchmarks suite which was proposed at ASC 2007. So the suite has 12 programs, 22 vulnerabilities which amounts to roughly 290 test cases, both faulty and patched, and 18,000 lines of code. I assessed the precision of the automatically generated modifies clauses as well as the verification of runtime overhead. So this is the first table where I assess the precision. So we have a total of 289 test cases. And now when I go in ->> Sorry. What do you mean by test cases? You were doing some static analysis, right? >> Zvonimir Rakamaric: Oh, I see. I mean benchmarks. >> Oh, okay. >> Zvonimir Rakamaric: Sorry. Sorry about that. By test cases I mean benchmarks. >> Okay. It's a single program Apache. >> Zvonimir Rakamaric: So each -- well, they're kind of carved out of Apache. So it's not the whole Apache. So all 290 of them are 80,000 lines of code. So it's not the whole Apache, just ->> I see. So there are libraries and functions and ->> Zvonimir Rakamaric: So these guys -- like this paper published at ASE 2007, so these guys browsed the Internet, found bugs, went in and kind of carved them out into these self-contained benchmarks. >> But if you do that, don't you lose all the context so that when you do your static analysis all that code that they carved away you don't have to deal with, right? >> Zvonimir Rakamaric: Well, so they did -- so the benchmarks are self-contained. It's like you have main in everything, right? So exactly. You don't have to deal with that other code. Now, so there are two things related to that. One is that, well, DSAs -- like the pointer analysis I'm using is very fast. So it can go through the whole Apache in a few seconds. So this is not a problem. But the verification, like SMACK, is going to choke. Like even if DSA goes through and I manage to generate modifies clauses, there's no way I'll verify the whole Apache. >> One thing is -- I mean, there's probably -- each benchmark has lots of procedures, right? So just verifying are they also annotative, SMACK is a modular verification tool. >> Zvonimir Rakamaric: So, okay ->> [inaudible] modify if you need ->> Zvonimir Rakamaric: Yes. Yes. Exactly. >> -- [inaudible] how do you ->> Zvonimir Rakamaric: So, no, I manually write those. >> You're writing those manually. >> Zvonimir Rakamaric: So, yeah, in this manual step, I wrote everything manually. So I wrote precondition, postconditions, and modifies clauses, everything manually. And theirs make -- if I do that SMACK and check 226 out of 289. So there is 60 which SMACK wouldn't check. And it's mainly because of -- like it doesn't understand bit-vector arithmetic. And for some of them you need to be -- to handle that. >> [inaudible] couldn't find a known defect because of its limitations [inaudible] couldn't check. >> Zvonimir Rakamaric: Yes. >> Okay. So it could analyze it, but it couldn't find the defect. >> Zvonimir Rakamaric: Yeah, yeah, sure. >> So it was an unfound due to ->> Zvonimir Rakamaric: I'm not sure it was only the just missing bugs, maybe it was on also reporting false positives which I couldn't get rid of. So it's kind of like I couldn't get the expected answer. Like either ->> [inaudible] >> Zvonimir Rakamaric: Yeah. >> So [inaudible] for each test case there is a known defect. >> Zvonimir Rakamaric: Exactly. >> Okay. >> Zvonimir Rakamaric: And there is a fixed version. >> And there's a fixed version. And the challenge is to find the known defect and find it ->> Zvonimir Rakamaric: Well, here I use it only to kind of show that this algorithm I'm doing can handle these guys and that the modifies clauses it generates are -- automatically are good enough to check them. >> Okay, okay. So this is really now the comparison that you were looking at is then ->> Zvonimir Rakamaric: Exactly. Between kind of these two numbers. >> Okay. >> Zvonimir Rakamaric: 203 and 226. So when I manually write the modifies clauses, I could check 226 of them. Now, when I use the algorithm that described, I can check 203 of them. So for 20 of them, because the modifies clauses are not precise enough, he's not going to be able to either prove the property or -- yeah. >> One more question. So the modifies clauses that you wrote manually, is the tool checking that they're correct? >> Zvonimir Rakamaric: Yes. Well, even the automatically ones are going to be checked. Because they're kind of the best guess. They're not -- there are no guarantees that they are overapproximations. >> One more question. So can you also comment on how much annotation overhead that you save by automatically inferring modifies clauses? Because clearly you are still manually writing some other annotations. So, for example, why is it justified to go after modified clauses? Why not try to infer, you know, preconditions, postcondition and things like that. >> Zvonimir Rakamaric: Modifies clauses are typically the biggest ones that are seen. >> I see. >> Zvonimir Rakamaric: Well, especially when you kind of call a few procedures. So like the top-level procedure has to capture everything that's being modified by the whole kind of chain of calls. So you will have to go down and see what the -- so they're the biggest ones. They were the biggest ->> So, I mean, can you estimate -- can you give me like, you know, okay, so all the annotations that were present in the final code that SMACK verified, what fraction were the inferred modifies clauses? >> Zvonimir Rakamaric: I'm not sure. I mean, it would be a really rough guess. I don't think we want to. >> [inaudible] had this example of the modifies y. >> Zvonimir Rakamaric: Okay. >> If the x and y condition is the same thing, we would find X true, right? >> Yes. >> Zvonimir Rakamaric: Sure. >> But to do a run [inaudible] this is something -- do you run some points, global points, true analysis before checking the modifies clauses? >> Zvonimir Rakamaric: Well, the pointer analysis I use is like a global analysis. But the other thing -- yeah, so I do. >> So I'm not going to let you off as easy [inaudible] so your programs were approximately 18,000 lines of code you say? >> Zvonimir Rakamaric: No, no, no. Programs are way smaller. So these are small benchmarks. >> These benchmarks, the total. >> Zvonimir Rakamaric: They're totally. So they're like a few hundred lines of code. >> Divide 18,000 by 289 and get the average. >> Okay. I want to talk about the total right now. So you ran 18,000 lines of code through. >> Zvonimir Rakamaric: Sure. >> Of original code. Right? >> Zvonimir Rakamaric: Of original -- well, original benchmark code. >> Of the original benchmarks. So in the manual case, how many lines of annotations did you have to have? >> Zvonimir Rakamaric: Oh. If you count modifies clauses ->> Yeah, yeah. >> Zvonimir Rakamaric: -- probably 50 percent of the size of the code. >> Okay. So another 9,000 lines of code approximately. >> Zvonimir Rakamaric: I guess so. >> Quarter magnitude. >> Zvonimir Rakamaric: Let's say. Yeah. >> Okay. So and then in the case the automatic case, so if you remove the modifies clauses, how many lines do you get? >> Zvonimir Rakamaric: Well, that's another 50 percent down. So it's -- from 9,000 you go to maybe 4-. >> 4,000. Thank you. >> Zvonimir Rakamaric: All very approximate. So here are the running times. So we don't really see almost any overhead if you use this automatic algorithm, the running times go up only a little bit. So, to conclude, I've showed my Noble algorithm for appearance of complex frame axioms. The algorithm's completely automatic and it can handle unbounded data structures such as arrays. I used it on a number of benchmarks. I show that it's precise enough in practice. And verification runtime overhead is small. Okay. Now we'll go to the analysis of concurrent programs. So the main goal of this line of research is to statically and precisely find concurrency errors in real systems code. So the key points are that we want to do it statically to achieve high coverage and precisely to have no false alarms or really low rate of false alarms. Then we would like to check general concurrency errors, which means violations of any user-provided assertions as opposed to having a specialized static analysis which finds only one kind of error, such as date erases or violations of low properties and so on. And last but not least, we would like to apply our techniques on real systems code that is written in C, use a shared memory, and it's big and messy. So an appealing approach to this problem is to transform context bounded analysis of concurrent programs into analysis of sequential programs. And now when we do that we can leverage on the huge body of research that has already been done on the analysis of sequential programs. So this approach was first proposed in a tool called KISS which was done by Shaz and Wu in 2004. And that could handle up to two context switches. And then recently in CAV 2008 and 2009 more general transformations were proposed that can handle up to N context switches. However, those transformations were applied on small, manually constructed Boolean programs. Well, the main contributions of this work is to handle real code and not this Boolean models. So in their direction we defined a context bounded translation for C programs that can handle heap, and also our method is based on VC generation and SMT solvers which makes it automatic and very precise. Also we introduce field abstraction which is a technique that greatly improves scalability, and our product implementation is applied on real system code. So the translation we are proposing if we go back to my tool flow is done in two steps. In the first step we translate concurrent C into concurrent BoogiePL using HAVOC, and in the second step we translate concurrent BoogiePL in sequential BoogiePL which can then be handed over to Boogie. So I'll show you these two steps into more details in the next slides. So let's see step one first. So system code written in C is messy mainly because of heap. So it generates dynamic objects on the heap, performs low-level pointer manipulation and so on. And therefore we are going to translate away complexities of C into BoogiePL scalars and maps which are then much easier to deal with. And that is being done using a C checker called HAVOC. So HAVOC translates away heap based on its memory model, which is very similar to the Burstall's memory model I talked about in the beginning. So in HAVOC we have one heap map per structure field and pointer type. So if you have field F, we are going to have map memF, field G, map memG type T*, map memT* and so on. And those maps map addresses into values. In Boogie the maps are handled using theory of arrays with select and update axioms. So here is a simple example. If in C code we assign 1 to field F for pointer X, in BoogiePL after this C code, this process by HAVOC, this is going to be translated as this statement, which address map memF with 1 at the location X plus offset of field F. Yes. >> Can you go back one slide. So looking at the last slide of Boogie code, you have a map for F and you have an offset for F. >> Zvonimir Rakamaric: Yes. >> What's -- I guess what's the assurance that I never create mems of G and X plus offset F? Like what happens if I get those out of synch? >> Zvonimir Rakamaric: So you can add checks in HAVOC which will add assertions in the Boogie code, which will ensure that you cannot do that. And there is a paper of how to prove these assertions and the whole techniques -technique of how to do that. In this line of work we are not using those. So we can be unsound. >> Okay. But for now we just trust that your translation won't introduce these mistakes. >> Zvonimir Rakamaric: For now we are -- well, in the actual implementation currently we can be unsound. Because we are not checking those. There is a knob or switch in HAVOC that can introduce those checks. We haven't really tried it. I think the -- I don't know how scalability will be in -- if we do that. >> I mean this -- I'm showing this kind of like type safety, and the idea is that you were trying to check that separately using techniques that are very different from these kind of techniques. So the goal of STORM. The goal of STORM is to -- just put two [inaudible] to check concurrency errors, you don't want to mix up type checking with that because you want to divide and conquer in that [inaudible]. >> Why do you need the sets of F [inaudible]? Why not say M -- memF of X. Why do you need those sets of F there? >> Zvonimir Rakamaric: Well, one of the reasons is definitely if you want to actually check that this -- so if you want to introduce this check that he was asking about, you have to prove that at this offset you actually have field F. So you have to -- you need to know the offsets. And also you can have pointer arithmetic and things like that. But, yeah, that's a good question. Actually I got the same one from my supervisor. So now that we have a concurrent BoogiePL, in step two we are going to use a translation based on the paper published by Lal-Reps in CAV 2008 to translate this concurrent BoogiePL into sequential BoogiePL. So they published a translation that works for scalars. We lifted it to support field maps as well and therefore memory. So I'm going to illustrate the translation on a simple example which translate just one concurrent trace. So the trace has two threads, thread 1 and thread 2, and one shared variable, G. It also has three context switches, which means that we have four execution segments total. So the main idea behind this translation is to avoid storing local state since local state is typically way bigger than the shared one. And it's going to be done by introducing unconstrained symbolic values instead of still unavailable future global values. And then later on when the future [inaudible] become available, we are going to constrain them. So I'm going to show how this works on an example. So let me talk about the notation first. So we have two threads, thread 1 and thread 2. We have four execution contexts, which are these stickers marked with 1, 2, 3, and 4. We have one shared variable, G, and G1 and G1 prime and so on are marking the values of G in the beginning and end of the representative execution context. And black arrows represent context switches between these two threads. So now a straightforward way of encoding this concurrent execution into a sequential one would be to introduce go tools instead of context switches. So here we will simply, you know, introduce a go to which we jump to thread 2, it will jump back and so on. Now, if we do that, know that here we have to restore the values of locals after execution context 1, which means that we have to store them after execution context 1. And that's exactly the thing we are trying to avoid since local state -- storing local state, which is way bigger than shared state, is not a good idea. So how do we go about doing this translation that we don't have to store local state. Well, we simply stack this execution context sequentially one after another. So now in this part here when we move from 1 to 3, local state gets preserved and nothing has to be saved. But the problem here is that if you look at the -- these Gs that represent the value of the shared variable G, they're not aligned properly and correspondingly to the concurrent execution of the left. So at this point, for instance, our G3 here on the left has to be equal to G2 prime and not G1 prime. And the trick that I talked about, about introducing unconstrained symbolic constants, come into play here. So G2 prime is available to us only in the future. And therefore at this point we are going to set G to an unconstrained symbolic constant. We have three that is going to represent this unavailable future value. And then down here once that G2 prime becomes available, we are going to constrain this V3 with G2 prime. And this is the main idea behind the Lal-Reps translation. We need some more switching. So at this point, G2, if we look in concurrent execution, has to be equal to G1 prime. So we introduce this assignment. And over here G4 has to be equal to G2. G3 prime again. We have to introduce another assignment. Now, if we switch the code like this in a sequential execution, we are guaranteed that G4 prime here after sequential execution is going to be equal to this G4 prime here after the concurrent one. And this is kind of the high-level idea behind the [inaudible]. So I showed it only on one concurrent trace. But this can be generalized to whole programs. So I'm going to show, again, high level how this is done for the whole program. So in the program we are assuming that we allow N context per thread, and again we have one shared variable G. And our concurrent program has two threads, T1 and T2, and in the end we want to check assertion F. So the sequential program is going to have some initialization in the beginning. Then we're going to have code of a properly translated thread T1 followed by the code of a properly translated thread T2. Some switching at the end. And, again, checking our same assertion. So in the initialization we are going to introduce copies of our shared global, one copy per each context. And also we are going to introduce those unconstrained [inaudible] constants that I was talking about, those Vs from the previous slide. We also need context counter K, which is going to be initialized to one in the beginning and which is going to be updated every time we switch to another context. And in the beginning we have to constrain -- we have to set our shared global to the unconstrained symbolic constants we introduced. So each statement of every thread is going to be blown up into this switch case where based on the value of K we are going to call procedure schedule, which means context switch, and in the statement we are going to replace our shared G with the appropriate copy. And now schedule, as I said, mimics context switch. So schedule is -- if we still need to reach the maximum number of contexts that we have available, we are nondeterministically going to perform a context switch in increment K, or if we reached our maximum number of contexts, we are going to reset K to 1 and jump to next thread. In the end we have to introduce those assumes that I talked about in the previous slide which are going to constrain the unconstrained symbolic [inaudible] introduced with the values that are now available to us, and then we can check the assertion in the end. So this is kind of how the whole program transformation of Lal-Reps works. Now, in our translation Lal and Reps did it for scalar, so we lifted it to handle heap. And because we have this memory model with maps, I talked about that enables translation that is very similar to scalars, basically we introduce these copies for these maps and also constrain them appropriately, the same thing that you would do for scalars. I would like to know that map constraints are only assumed equalities, which means that we don't need extensionality axioms inside the theorem prover, which improves the performance. So we implemented this translation and tried it on a number of like smaller benchmarks. And it worked great. But then as we tried to scale it to more realistic examples, it basically blew up. And the main reason for that is because realistic examples are going to have a lot of memory maps and -- even like hundreds, and precisely tracking the state of which -- which of the memory maps and introducing these copies of memory maps is simply not going to scale. So our answer to this problem is an algorithm we called field abstraction. And I want to illustrate how it works on example. So the main idea is to split a set of fields into two sets, thread fields and unthread fields. And we are going to precisely trick the values of thread fields while, on the other hand, abstracting unthread fields. So each read from an untracked field is going to be replaced with a nondeterministic value and each write is going to be skipped. So in this example we have three statements which read and write fields F and G. And our set of tracked fields is going to be only field F. So the algorithm's going to be applied to each statement. So starting from the first one, the first sentence reading field F, which is now a set of thread fields, so we cannot do anything there, the second one is reading field G, so we can abstract it with a nondeterministic value. The third one is writing to G, so we can simply skip it. So we can see that applying this algorithm greatly simplifies the code. And, indeed, when we run it on our benchmarks, we manage to check actual real device drivers. However, know that in this case we require user input because user has to provide this set of thread fields which is good enough for discharging assertions that we have in the example. Now, the question is whether we can do something to generate this set of trackers automatically, not require any user input. And, again, the answer to that is an algorithm based on CounterExample Guided Abstraction Refinement, or CEGAR, framework. So here is how the algorithm works. So we start with an empty set of thread fields. We enter our usual CEGAR loop. We abstract our program P based on these set of thread fields into the abstraction A. We check the abstraction. If we manage to check it, we are done. If not, we get the counterexample, check if it's real error. If it is, we return an error trace. If it's not, we enter the counterexample. And based on this analysis, we're going to add new fields which we believe are relevant into our set of track fields. And then once we do that, we repeat our CEGAR loop. So the important thing to note is that this loop is always going to terminate since in the limit we are going to add basically all of the fields and check our concrete program. So I implemented this as DS in a prototype implementation called STORM and applied it on four Windows device drivers. So to be able to do that I created the harness to close off the drivers and to mimic the behavior of the OS. So the harness I created is going to create a driver request that gets processed concurrently by multiple routines in different scenarios. So, for instance, we'll have a dispatch routine in parallel cancellation or a dispatch in parallel cancellation and parallel completion and so on. And the property I checked for all these drivers is that a driver request cannot be used after it has been completed, which is like a simple -- which is like use-after-free property. So here are some running times when we varied the number of contexts we allow per thread while providing thread space manually. So drivers range from 4- to 30,000 lines of code roughly. And if you look at the running times for 5 contexts per thread, there are a few hundreds of seconds, so scalability currently is not an issue. Also I would like to know that in the process I found a bug in the usbsamp example. I'll illustrate how the bug looks like in the later slides. Another set of experiments where we tried out our CEGAR algorithm, so one thing to take a look at is how precise our CEGAR algorithm is and how well it works. So if we take a look, for instance, in the mqueue example, mqueue has the total of 72 fields. And now when I went in and manually provided the most precise set of thread fields necessary for discharging this example, I came up with a set of seven fields. And when I ran our CEGAR algorithm, CEGAR generated sets of fields which have eight or nine elements, which is very close to this seven. So the heuristics we have for choosing [inaudible] fields works extremely well. Run and time jumped to a few thousands of seconds, which is understandable because we have to repeat the CEGAR loop quite a few times. So as I mentioned, I found a bug in the usbsamp example. Usbsamp is a sample driver in Windows Driver Development Kit. It's actually an example of how to write device drivers and as such is going to be copy-pasted by driver vendors that download Windows WinDDK from the Internet. This driver has been checked using existing tools. The bug was confirmed and fixed and it requires at least three context switches to be discovered. So therefore SLAM, for instance, that checks only sequential code or KISS that can check only up to two context switches, couldn't find this bug. So here is a very simplified code excerpt showing how the bug looks like. So to see the bug, we need two threads. Thread 1 is executing dispatch routine; thread 2 is executing cancellation routine. So thread 1 gets to execute first. If we heed this statement that marks the request as cancellable and sets its cancel routine, now cancel routine gets to run. Cancel routine is going to check where the request has still not been completed and whether the cancel routine was set. So this check is going to pass. And now the context switch. We are back in our dispatch routine which completes the request. And the third context switch takes us to a cancellation routine which tries to access the request which was just completed, and this is a bug. So to conclude, I introduced a method for reducing context-bounded analysis of concurrent C code to analysis of sequential code. The method can handle heap and is based on VC generation, SMT solvers, which makes it automatic and very precise. I also show the -- an algorithm called field abstraction which greatly improves scalability, and I implement these ideas in a tool called STORM, which is an assertion checker for current system code. I applied STORM on a number of experiments on the real-life system code and in the process I found a bug. So these were basic results from maybe a year ago. And then because the promise that this line of research showed -- yes? >> [inaudible] can you give some kind of insight as to why the bug you found wasn't found with testing? Seems like [inaudible] was it very infrequent or if it was ->> Zvonimir Rakamaric: So I'm not sure -- okay. So very often you will have to have these context switches in very, very specific places. Like I've seen examples where you have two statements and the context switch has to happen right in between these two statements to see the bug. So I can imagine that this is very hard to hit using testing. Yeah, the other thing is cancellation is kind of an [inaudible] occurrence event. So, again, it has to happen exactly at a certain point for this bug to occur. >> So is something like CHESS allowed to find this? >> Zvonimir Rakamaric: So CHESS would definitely help you with introducing context switches at right places. But, again, you have these asynchronous inputs, which I don't know how -- like CHESS, I'm not sure, but as far as I know, I don't think it helps you if you want to generate those. So let's say, you know, interrupts can happen, right, so you want to be able to generate these interrupts at all these points during your execution, and I don't think CHESS helps you with that. >> [inaudible] >> Zvonimir Rakamaric: Well, that's another thing on top of everything. Let's say you managed to somehow get it to work on kernel code. >> This is kernel code. >> Zvonimir Rakamaric: This is kernel code. Right. I don't know whether the technical issues are related to making CHESS work on kernel code. The things that we're saying is if we imagine that you can actually get it to work. >> [inaudible] pull this out of kernel like you did, right? You didn't run the drivers for this in the kernel. >> Zvonimir Rakamaric: No, I don't run anything. Everything is static. So but CHESS actually has to run. [multiple people speaking at once] >> So it's very hard to extract something like this out of the kernel and make it actually ->> Zvonimir Rakamaric: Runable. Yeah, I don't know. [multiple people speaking at once] >> In principle there are lots of techniques that could have [inaudible] but you form [inaudible]. >> Because otherwise we can apply that to basically a lot of things that ->> I mean, in general I think there's one point that I would like to make, is that for this kind of code, doing things statically actually helps a lot because it gives you a lot more [inaudible] forget about like [inaudible] things like that. Gives you a lot more control over the execution and the environment in which you try to execute the unit. >> Right. >> What happens then if you just depend on dynamic testing, you need a lot more like testing [inaudible]. >> Right. Have you set up the context. >> So if you look at, for example, the way we stubbed this thing was like maybe a handful of lines, maybe a hundred or two hundred lines, right? >> Right. >> [inaudible] amount of infrastructure that you would need to create a real executable test harness would be significantly larger. >> Zvonimir Rakamaric: Okay. I want to leave time for questions, I should probably speed up. So, yeah, I come back from -- came back for another internship, and next ten slides are kind of the latest news, what's the state of STORM currently. So we added a bunch of new features that enable easier modeling of OS features in stubs and in the harness. So we added support for function pointers, ghost maps, thread IDs, atomic blocks, dynamic thread creation and so on. So I'm now going to show this on a few examples. So here is how we, for instance, use function pointers in stubs. So if -- so drivers can set completion routines which amounts to setting a field of an IRP with a pointer for a completion routine, and then in our code, which mimics a call to a completion routine, we can simply read this field and use function pointer to call this routine. So before we had -- so before we supported function pointers, modeling this in a harness was way trickier. We had to introduce global variables to store this completion routines. And it was very hairy. Then we introduce ghost maps, thread IDs and atomic blocks. Here I show how we model spinlocks using those. So we introduce a ghost map, which is used for tracking -ghost map called Lock which is used for tracking a state of a lock. Then using thread IDs we can actually add these assertions that check that spinlocks are being used appropriately. And we also need atomic blocks to be able to specify these stubs. Another important feature that we add is support for dynamic thread creation. And this is actually another feature that is not supported by any other published translations as far as I know. So the syntax is that -- in the syntax we introduce the new keyword called async. And async can be used to annotate procedures. And the semantics is that it creates a new thread in which it runs the procedure that it annotates. So down here you can see the translation of this async on the kind of BoogiePL-like level. So we have our context counter K, we have our thread IDs and thread ID count which is used to create a fresh thread ID for the thread we are creating. And async is going to be blown up into this expression. We are essentially going to store all values of thread ID in K, create a new thread ID called foo, and then restore the values. And since -- because of the nature of the translation, this is kind of a very nice and elegant way to support dynamic thread creation. So, again, the dynamic thread creation helps a lot when modeling important asynchronous OS features such as interrupt, deferred procedures calls, timers, and so on. For instance, IoCallDriver is a stub which is called by a driver when the driver wants to pass a driver request to a low-level driver, and the low-level driver can synchronously or asynchronously complete the request. And in the stub, this is accomplished by a simple nondeterministic if statement which either CallCompletionRoutine asynchronously or we add this async in front of CompletionRoutine, which calls it asynchronously. Again, very straightforward way to do it before we had, again, like global auxiliary variables to help us achieve that and so on. So while I was here, in the three months we completely improved the scalability of STORM by maybe an order of magnitude. We also check -- in addition to the completion property we check spinlock properties as well. Because of all that, we managed to push more -- way more drivers through STORM. So we have more examples in WinDDK plus a number of drivers from Win8 enlistment. Also a played a little bit with a non-Windows code. I ran STORM on Xen hypervisor timer module and checked some properties there as well as. I still don't have running times. This was running when I was leaving Microsoft, and I can't exit this code, so it's a few thousands of seconds. If you run STORM it's CEGAR loop. Because we run it on way more drivers, we also found more bugs. So, yeah, I was looking through the locks as I was leaving, so the number may be ten new bugs. Three of those bugs got confirmed. One of them is related to concurrency. It's kind of similar to the one I showed. It requires three context switches and cannot be discovered using sequential checking. Two of them can be found in sequential cases as well, and those two are related to spinlock properties we're checking. We still have to explore why Slam didn't find them. I have no idea. I mean ->> [inaudible] >> Zvonimir Rakamaric: I think both of them are, maybe, you know, some error, but where the lock doesn't get initialized and then it still exits. Like something like that. It seems very -- like Slam should be able to find those. I don't know why it didn't. >> [inaudible] >> Zvonimir Rakamaric: Seven of those bugs are false bugs. One is due to the need for bit-vector arithmetic. We are planning to add support for that soon, I guess. >> [inaudible] >> Zvonimir Rakamaric: It's use-after-free. You know, you complete the request, you shouldn't use it. And spinlock properties, well, we -- if -- when you release the lock, you have to acquire it before. And when you acquire a lock. Yeah. So ->> So there could be many more properties you could potentially check. >> Zvonimir Rakamaric: Yes. >> You're actually limited -- you're only looking at a fairly small number compared to, for example ->> Zvonimir Rakamaric: Yes. Well, Slam checks like hundreds of them. So we're still trying to see like which properties are interesting to check in a concurrent kind of setting. So that's -- but, yeah, you are right. We are planning to add more properties. >> Are you doing annotations? Are you putting annotations into those drivers? >> Zvonimir Rakamaric: No annotations. You have to put -- well, okay. So we have to put assertions. So STORM checks assertions. Now, all the assertions that we put are actually in the harness. So we didn't touch driver code at all. Now, we can do that because currently we are inlining everything and unrolling loops. So we need a better kind of story for that. But there are no annotations. It's very like push-a-button approach. Six of those seven bugs are due to harness. In particular I was missing some initialization which have to be added, and if I don't have that, then I get the false bug. So I would like to know that none of them are actually [inaudible] itself. Most of them are because of the harness. It's not precise enough. This is kind of the latest news that I have from last week related to using STORM in the context of the beta software. So there is a company in Vancouver called Critical System Labs, which is involving development of so-called formal methods supplement for the new certification process for flight software. And they are supposed to give guidance to this consortium of which formal tools and techniques to use to check flight software. So they have this ongoing test study where -- case study where they were collecting all the tools that they could find that could check concurrent C programs we couldn't see. Well, they're actually looking into Safe C, which is subset of C. And then they are going to recommend the tools to this group that is working on this supplement. And STORM is currently part of the this KISS study. So I'm going to, you know, give them support and we'll see how far we'll push STORM in their direction. Now, to conclude, so I showed my research that is addressing key challenges in system software verification, which is how do deal with heap, modularity and concurrency. My research is very much driven by applications, and it goes from theory to results, usable and practice. I build project limitations, use them to do case studies and real-life code in the process, find real bugs. There is bunch of areas for future work. In the context of memory models, for a long time I had this idea of so-called eager heap separation where you would further improve memory modeling by using regional or alias information from pointer analysis. Then for STORM, more case studies is of course always important, so I would like to try STORM or something else. The device driver, so for instance a filesystem, maybe USB driver stack where problems read with hardware/software co-verification will have to be solved and so on. Also currently, as I said, we are inlining all procedures and rolling loops, and we've seen -- as we go along we've seen blowups happening because of that. And some form of lazy inlining is crucial for scalability of this technique. And we have ideas of how to perform this lazy inlining inside of a theorem prover that have to be tried out. Then, as I said, modular verification is the key to scalability. And how to use some kind of lightweight contracts in the presence of concurrency, and this transformation that is derived is also a research area I would like to explore. Furthermore, there is a project at Microsoft called ConcurrencySAL which adds kind of lightweight annotations to Windows code. And, yeah, that's kind of an ongoing project. And I believe we could leverage on already present annotations that ConcurrencySAL introduce to the code to improve scalability of STORM. And also, on the other hand, we could potentially use STORM to more precisely check these concurrency annotations, and by doing that filter false alarms that they are getting. There is also an idea to use similar infrastructure to check parallelization of sequential code by building on the symbolic diff project that [inaudible] is working on. And, well, [inaudible] this is kind of the grand master plan. So in the course of developing STORM, we would also develop something we currently call Static Concurrency Exploration Platform. So it would sit on top of Boogie and it would take concurrent BoogiePL as an input. So it would extend BoogiePL with a construct such as async and atomic that are needed for concurrency. And then many tools that currently translate code that's written in C or C++ or C# could use this structure we will build to translate concurrent code into a current BPL, and then we will handle that. And, yeah, with this final slide I would like to close my talk. Thank you. [applause] >> Zvonimir Rakamaric: Yes? >> Shaz Qadeer: Questions?