>> Shaz Qadeer: Hello, and welcome to everybody. ... Zvonimir Rakamaric here. You all -- I'm sure you...

advertisement
>> Shaz Qadeer: Hello, and welcome to everybody. It's my great pleasure to welcome
Zvonimir Rakamaric here. You all -- I'm sure you all know Zvonimir. He has been an
intern several times over at MSR. And he is graduate student at the University of British
Columbia. He's going to graduate soon. He's very well known for his work on static
analysis and program verification, and for coming up with project names that are very
violent. He has worked on projects named HAVOC, Smack, STORM. And any other
such violent names?
>> Zvonimir Rakamaric: No, that's it.
>> Shaz Qadeer: That's it. He's also an MSR fellow. Pardon?
>> [inaudible]
>> Shaz Qadeer: So today he's going to tell us about some of his recent work on analysis
of concurrent programs.
>> Zvonimir Rakamaric: Thanks for the introduction, Shaz. So actually maybe 70
percent of this talk will be on analysis concurrent programs, 30 percent of other stuff for
my Ph.D. which is on analysis of sequential programs.
So as we all know, today software is everywhere. It's running on my laptop, on my cell
phone. It's running in the car I took to get down here and then so on.
And also software has errors. Software systems are generally large, complex, and prone
to errors, and they're also getting larger and more complex every day, especially with the
recent emergence of multicores and many cores which requires switching to highly
concurrent software. They're also getting more error prone and harder to get right.
Software errors are also very costly, according to U.S. National Institute of Standards &
Technology. Software bugs cost the U.S. economy only an estimated
six-thousand-billion dollars each year.
Another example, our estimated damages of Code Red worm, which was caused by a
buffer overrun bug in Windows Web server, I think, are estimated to around $2.6 billion.
More importantly, software bugs something also cause loss of human lives. And because
of that obviously improving software quality and reliability is a major software
engineering concern.
So on a more personal note, while working in the industry, I realized the need for better
development, maintenance, understanding of programs.
So, therefore, when I decide to go back to grad school, I made that the main topic of my
research. For my master's thesis I worked on a logic and decision procedure for
verification of heap-manipulating programs. The logic contained constructs for
unbounded reachability in linked data structures. I implemented decision procedure
integrated into an SMT solver which enabled theory combination, and using that I
verified the number of example heap-manipulating programs.
So that was my master's thesis work. We can talk about it offline if you're interested.
Here I'll concentrate on the stuff I did for my Ph.D. thesis, which is on checking system
software and more particularly concurrent system software.
So system software, it's important and critical since it's the foundation between all
general-purpose application programs.
Typically it's written in C. And it's also hard to get right and hard to verify, mainly
because it performs low-level memory operations such as dynamic memory allocation
and pointer manipulation, and also it's highly concurrent, with shared-memory
communication being the main programming paradigm.
So here is before we were -- my talk. So after this introduction I'll talk about the work I
did on modeling memory in software verifies and on inference of frame axioms, and
that's the part I mentioned which is related to verification of sequential programs. And
then the rest of the talk will be on analysis of concurrent programs. And I'll give some
conclusions and future work in the end.
So before I start, I think you'll understand the material I'll present later better if I give you
the overview of the tool flows I built in the past three years.
So I've been working on three tools: SMACK, HAVOC, and STORM. So they're all
built on top of Boogie. Boogie is a verification condition generator for Microsoft
Research which takes BoogiePL programs -- BoogiePL is a simple procedure language,
and based on the inputs, Boogie generates a VC which is handed over to SMT solver,
usually V3, and based on SMT solver's output, Boogie either verifies the program or
returns an error trace.
So I developed two independent tool flows. The one on the left is based on LLVM and
GCC. I used it to try out some of my ideas at UBC. And the one on the right is based on
Microsoft infrastructure, and that's the one I was mainly working on during my
internships at Microsoft. So let's see just a couple of slides of the work I did on memory
models.
So why do we even need memory models. Well, because available memory in today's
systems is humongous and a faithful representation of a verifier simply doesn't scale.
Because of that, software verifiers rely on memory models that provide a level of
abstraction with the usual tradeoff between precision and scalability.
And, also, using memory models, we translate away complexities of source language
since system code written in C is really messy because of all the low-level operations it
performs on heap.
In this work I introduce two well-known memory models: Monolithic memory model,
which splits memory into disjoint objects, where each objects is defined by its unique
reference; and Burstall's memory model, which in addition to splitting memory to disjoint
objects also splits memory based on types of memory locations.
So I implemented model-based memory models as part of my tool SMACK, and I did the
experimental comparison between them. And it turns out that Burstall's is consistently
better than monolithic. It gives around 3x speedup on easier benchmarks and solves
harder ones that monolithic memory cannot even solve.
But, on the other hand, we also pay the price. Burstall's assumes that the memory is
strongly typed, which is potentially unsound for low-level C code which can perform
type unsafe operations. Yes.
>> [inaudible]
>> Zvonimir Rakamaric: Excuse me?
>> What is one hard ones, solves harder ones?
>> Zvonimir Rakamaric: Harder examples. That monolithic memory cannot.
So my contribution is this area. So the question, then, is how to ensure soundness of
Burstall's in the presence of type unsafe operations while preserving scalability.
And the intuition is that most of the parts of the code within C are usually type safe. And
my contribution is a complete automatic technique that gives us sound and scalable
memory model.
So the idea behind this technique is to use lightweight pointer analysis up front eagerly
that gives us conservative type information and is also very fast and precise enough. And
then based on the type information that's returned to us by the pointer analysis, we can
use Burstall's model in the parts of the code when we are type safe and then move to
monolithic in the parts where we are type unsafe.
And, again, I implemented this technique in SMACK and showed scalability on a number
of experiments.
>> So you -- this means that you were able to statically partition the heap into two pieces,
one that you're going to represent using Burstall's model and the other one you're going to
represent using the monolithic model?
>> Zvonimir Rakamaric: It means that I can find out when you have to unify types, like
memory maps.
>> I see. I see. So you start out with the Burstall's and then you keep unifying them until
you become sound.
>> Zvonimir Rakamaric: Exactly.
>> I see.
>> Zvonimir Rakamaric: But you can do it -- I can do it eagerly. So it's not kind of
refinement. Like I can know that up front. So I unify them up front. So it's kind of like
that annotation [inaudible] which does unification, just you can do it up front without any
user kind of guidance.
So this is just a brief overview. Again, I can talk more about it offline. I want to
concentrate this talk on the work I did on concurrency.
Now, let's see the research I did on inference of frame axioms. So model verification in
software analysis is key to scalability. However, the problem is that it requires
user-provided annotations in the form of preconditions, postconditions, loop invariants,
and frame axioms.
So fame axioms are used to define what is not being changed by a piece of code, and
that's relatively easy to do if there are no loops, no recursion, and if we have only scalar
variables, but it gets way harder in the presence of unbounded data structures.
In practice, in model verifiers user typically specifies what might be changed by some
piece of code. So, for instance, in Spec#, HAVOC and SMACK, we have a modifies
clause. In JML we have assignable, in Caduceus we have assigns and so on.
So these clauses are typically very complex and difficult to write. And that is especially
true for system code which performs low-level pointer operations.
So here is example -- simple example which illustrates why we need a modifies clause.
So we have two global integer pointers -- oops. Sorry -- x and y. We have procedure foo
which sets *y to 5. Then we have procedure bar which allocates x and y. It sets *x to 1.
It calls foo and then it tries to check whether *x is still 1.
Now, if we want to check this assertion modularly, which means without analyzing the
code of foo, we have to know that foo doesn't modify *x. And in model verifiers, this is
being accomplished by putting this modifies clause on foo.
So we are going to assume that modifies clause when checking this assertion, and we are
going to check it when we check foo.
So now this is a simple example and a very straightforward modifies clause. But even we
have a little bit bigger example which kind of operates on arrays and change elements of
arrays, modifies clause can get way trickier, as you can see from this expression below,
where we array constructors which generate -- yes?
>> So I'm just not familiar with this notation. So when you say for foo that it modifies y,
can you assume from that that it doesn't modify anything else?
>> Zvonimir Rakamaric: Yes.
>> Do you have to assume that modifies is complete?
>> Zvonimir Rakamaric: So modifies has to be an approximation of the memory
locations being modified by foo. So it's maybe not the most precise, but, yeah, it doesn't
modify anything else.
So, yeah, in this expression you have array constructors, we generate pointers belonging
to an array, then you have pointer increments, unions, and so on. So things get very
hairy. And the goal of this research is to --
>> [inaudible] that you cannot change what 5 is pointing to or the contents of what y
points to?
>> Zvonimir Rakamaric: So modifies y means that you change the location pointed to by
y. You can change the location.
>> [inaudible]
>> Zvonimir Rakamaric: Right. Right.
>> So if you wanted to express that y has been changed, then you will write modifies
[inaudible]?
>> Zvonimir Rakamaric: And y. Yes.
>> I see.
>> Zvonimir Rakamaric: Okay. So here I proposed the Novel algorithm that uses
pointer analysis and source code to automatically generate these complex frame axioms.
The algorithm is a three-step process. In the first step from data structure graphs that you
get is an output of data structure analysis that -- which is a pointer analysis that use. I
generate modify sets by marking a set of potentially modified locations in a data structure
graph for each function and loop.
Then in the second step from these modify sets I generate modifies clauses, which are
expressions similar to the one I showed previously by coding these sets into specification
logic by traversing data structure graph, graphs starting from the roots.
And third step is a straightforward set of generating -- straightforward step of generating
frame axioms from modifies clauses.
>> Question.
>> Zvonimir Rakamaric: Go ahead.
>> Don't you have to have a whole program to do this?
>> Zvonimir Rakamaric: You do.
>> But you never do. You never ever have the whole ->> Well, sometimes you do. For a C program? Are you kidding me?
>> So there's a [inaudible]?
>> We have a source code for the library.
>> You're not online [inaudible].
[multiple people speaking at once]
>> Zvonimir Rakamaric: So you can assume the worst case that the clause you don't
know can modify anything, which will be terrible. But then you can always go in and
write modifies just manually.
>> [inaudible] in practice?
>> Zvonimir Rakamaric: In practice I wrote them manually.
So I'll walk you through these three steps on an example. So here on this picture we have
a data structure graph which is going to be an output of the pointer analysis I'm using.
And this is basically an overapproximation of points to relation of this loop, of a
procedure.
So in the graph we have two pointer variables, A1 and A2, which are pointing into object
H1 and H2. H1 and H2 are arrays. And they represent unbounded number of elements,
and each element has two fields, F1 and F2. It offsets zero and four.
Now, we want to generate in this graph a location that are modified by this loop down
there. So the first statement of the loop is you're going to then modify field F2 of the area
pointed to by A1. The second one is going to modify field F1 of the area pointed to by
A2.
So, therefore, our modify set has two elements. Each element is a pair. Each element is
object [inaudible] pair. So we have H1, 4 and H2, 0.
And know that each of these elements -- each of these elements represent -- represents an
unbounded number of elements belonging to these arrays.
So in the second step based on the modifies set we are going to generate expressions in
our annotation language that are going to be modifies clauses representing this set.
So we start doing that by traversing these graphs starting from roots. So for H1, 4 we
start from A1. We hit an array object, so we use the array constructor to generate
pointers belonging to this array. Then we have to increment this stet of pointers to get to
field F2, and that's it.
For the second one we start from A2. Again we hit an array. In this case increment by
zero, and our modify set is going to be union of these two guys.
Now, the third step is a straightforward typical conversion of these modifies clauses into
frame axioms. Nothing smart going on there; they're just going to be -- blow up into this
quantified formula.
So implemented this algorithm again in SMACK. And I tried it on buffer-overflow
benchmarks suite which was proposed at ASC 2007. So the suite has 12 programs, 22
vulnerabilities which amounts to roughly 290 test cases, both faulty and patched, and
18,000 lines of code.
I assessed the precision of the automatically generated modifies clauses as well as the
verification of runtime overhead.
So this is the first table where I assess the precision. So we have a total of 289 test cases.
And now when I go in ->> Sorry. What do you mean by test cases? You were doing some static analysis, right?
>> Zvonimir Rakamaric: Oh, I see. I mean benchmarks.
>> Oh, okay.
>> Zvonimir Rakamaric: Sorry. Sorry about that. By test cases I mean benchmarks.
>> Okay. It's a single program Apache.
>> Zvonimir Rakamaric: So each -- well, they're kind of carved out of Apache. So it's
not the whole Apache. So all 290 of them are 80,000 lines of code. So it's not the whole
Apache, just ->> I see. So there are libraries and functions and ->> Zvonimir Rakamaric: So these guys -- like this paper published at ASE 2007, so
these guys browsed the Internet, found bugs, went in and kind of carved them out into
these self-contained benchmarks.
>> But if you do that, don't you lose all the context so that when you do your static
analysis all that code that they carved away you don't have to deal with, right?
>> Zvonimir Rakamaric: Well, so they did -- so the benchmarks are self-contained. It's
like you have main in everything, right? So exactly. You don't have to deal with that
other code.
Now, so there are two things related to that. One is that, well, DSAs -- like the pointer
analysis I'm using is very fast. So it can go through the whole Apache in a few seconds.
So this is not a problem. But the verification, like SMACK, is going to choke. Like even
if DSA goes through and I manage to generate modifies clauses, there's no way I'll verify
the whole Apache.
>> One thing is -- I mean, there's probably -- each benchmark has lots of procedures,
right? So just verifying are they also annotative, SMACK is a modular verification tool.
>> Zvonimir Rakamaric: So, okay ->> [inaudible] modify if you need ->> Zvonimir Rakamaric: Yes. Yes. Exactly.
>> -- [inaudible] how do you ->> Zvonimir Rakamaric: So, no, I manually write those.
>> You're writing those manually.
>> Zvonimir Rakamaric: So, yeah, in this manual step, I wrote everything manually. So
I wrote precondition, postconditions, and modifies clauses, everything manually. And
theirs make -- if I do that SMACK and check 226 out of 289. So there is 60 which
SMACK wouldn't check. And it's mainly because of -- like it doesn't understand
bit-vector arithmetic. And for some of them you need to be -- to handle that.
>> [inaudible] couldn't find a known defect because of its limitations [inaudible] couldn't
check.
>> Zvonimir Rakamaric: Yes.
>> Okay. So it could analyze it, but it couldn't find the defect.
>> Zvonimir Rakamaric: Yeah, yeah, sure.
>> So it was an unfound due to ->> Zvonimir Rakamaric: I'm not sure it was only the just missing bugs, maybe it was on
also reporting false positives which I couldn't get rid of. So it's kind of like I couldn't get
the expected answer. Like either ->> [inaudible]
>> Zvonimir Rakamaric: Yeah.
>> So [inaudible] for each test case there is a known defect.
>> Zvonimir Rakamaric: Exactly.
>> Okay.
>> Zvonimir Rakamaric: And there is a fixed version.
>> And there's a fixed version. And the challenge is to find the known defect and find
it ->> Zvonimir Rakamaric: Well, here I use it only to kind of show that this algorithm I'm
doing can handle these guys and that the modifies clauses it generates are -- automatically
are good enough to check them.
>> Okay, okay. So this is really now the comparison that you were looking at is then ->> Zvonimir Rakamaric: Exactly. Between kind of these two numbers.
>> Okay.
>> Zvonimir Rakamaric: 203 and 226. So when I manually write the modifies clauses, I
could check 226 of them. Now, when I use the algorithm that described, I can check 203
of them.
So for 20 of them, because the modifies clauses are not precise enough, he's not going to
be able to either prove the property or -- yeah.
>> One more question. So the modifies clauses that you wrote manually, is the tool
checking that they're correct?
>> Zvonimir Rakamaric: Yes. Well, even the automatically ones are going to be
checked. Because they're kind of the best guess. They're not -- there are no guarantees
that they are overapproximations.
>> One more question. So can you also comment on how much annotation overhead that
you save by automatically inferring modifies clauses? Because clearly you are still
manually writing some other annotations.
So, for example, why is it justified to go after modified clauses? Why not try to infer,
you know, preconditions, postcondition and things like that.
>> Zvonimir Rakamaric: Modifies clauses are typically the biggest ones that are seen.
>> I see.
>> Zvonimir Rakamaric: Well, especially when you kind of call a few procedures. So
like the top-level procedure has to capture everything that's being modified by the whole
kind of chain of calls. So you will have to go down and see what the -- so they're the
biggest ones. They were the biggest ->> So, I mean, can you estimate -- can you give me like, you know, okay, so all the
annotations that were present in the final code that SMACK verified, what fraction were
the inferred modifies clauses?
>> Zvonimir Rakamaric: I'm not sure. I mean, it would be a really rough guess. I don't
think we want to.
>> [inaudible] had this example of the modifies y.
>> Zvonimir Rakamaric: Okay.
>> If the x and y condition is the same thing, we would find X true, right?
>> Yes.
>> Zvonimir Rakamaric: Sure.
>> But to do a run [inaudible] this is something -- do you run some points, global points,
true analysis before checking the modifies clauses?
>> Zvonimir Rakamaric: Well, the pointer analysis I use is like a global analysis. But
the other thing -- yeah, so I do.
>> So I'm not going to let you off as easy [inaudible] so your programs were
approximately 18,000 lines of code you say?
>> Zvonimir Rakamaric: No, no, no. Programs are way smaller. So these are small
benchmarks.
>> These benchmarks, the total.
>> Zvonimir Rakamaric: They're totally. So they're like a few hundred lines of code.
>> Divide 18,000 by 289 and get the average.
>> Okay. I want to talk about the total right now. So you ran 18,000 lines of code
through.
>> Zvonimir Rakamaric: Sure.
>> Of original code. Right?
>> Zvonimir Rakamaric: Of original -- well, original benchmark code.
>> Of the original benchmarks. So in the manual case, how many lines of annotations
did you have to have?
>> Zvonimir Rakamaric: Oh. If you count modifies clauses ->> Yeah, yeah.
>> Zvonimir Rakamaric: -- probably 50 percent of the size of the code.
>> Okay. So another 9,000 lines of code approximately.
>> Zvonimir Rakamaric: I guess so.
>> Quarter magnitude.
>> Zvonimir Rakamaric: Let's say. Yeah.
>> Okay. So and then in the case the automatic case, so if you remove the modifies
clauses, how many lines do you get?
>> Zvonimir Rakamaric: Well, that's another 50 percent down. So it's -- from 9,000 you
go to maybe 4-.
>> 4,000. Thank you.
>> Zvonimir Rakamaric: All very approximate.
So here are the running times. So we don't really see almost any overhead if you use this
automatic algorithm, the running times go up only a little bit.
So, to conclude, I've showed my Noble algorithm for appearance of complex frame
axioms. The algorithm's completely automatic and it can handle unbounded data
structures such as arrays. I used it on a number of benchmarks. I show that it's precise
enough in practice. And verification runtime overhead is small.
Okay. Now we'll go to the analysis of concurrent programs. So the main goal of this line
of research is to statically and precisely find concurrency errors in real systems code.
So the key points are that we want to do it statically to achieve high coverage and
precisely to have no false alarms or really low rate of false alarms.
Then we would like to check general concurrency errors, which means violations of any
user-provided assertions as opposed to having a specialized static analysis which finds
only one kind of error, such as date erases or violations of low properties and so on.
And last but not least, we would like to apply our techniques on real systems code that is
written in C, use a shared memory, and it's big and messy.
So an appealing approach to this problem is to transform context bounded analysis of
concurrent programs into analysis of sequential programs. And now when we do that we
can leverage on the huge body of research that has already been done on the analysis of
sequential programs.
So this approach was first proposed in a tool called KISS which was done by Shaz and
Wu in 2004. And that could handle up to two context switches. And then recently in
CAV 2008 and 2009 more general transformations were proposed that can handle up to N
context switches. However, those transformations were applied on small, manually
constructed Boolean programs.
Well, the main contributions of this work is to handle real code and not this Boolean
models. So in their direction we defined a context bounded translation for C programs
that can handle heap, and also our method is based on VC generation and SMT solvers
which makes it automatic and very precise.
Also we introduce field abstraction which is a technique that greatly improves scalability,
and our product implementation is applied on real system code.
So the translation we are proposing if we go back to my tool flow is done in two steps. In
the first step we translate concurrent C into concurrent BoogiePL using HAVOC, and in
the second step we translate concurrent BoogiePL in sequential BoogiePL which can then
be handed over to Boogie. So I'll show you these two steps into more details in the next
slides.
So let's see step one first. So system code written in C is messy mainly because of heap.
So it generates dynamic objects on the heap, performs low-level pointer manipulation and
so on. And therefore we are going to translate away complexities of C into BoogiePL
scalars and maps which are then much easier to deal with. And that is being done using a
C checker called HAVOC.
So HAVOC translates away heap based on its memory model, which is very similar to
the Burstall's memory model I talked about in the beginning.
So in HAVOC we have one heap map per structure field and pointer type. So if you have
field F, we are going to have map memF, field G, map memG type T*, map memT* and
so on. And those maps map addresses into values.
In Boogie the maps are handled using theory of arrays with select and update axioms.
So here is a simple example. If in C code we assign 1 to field F for pointer X, in
BoogiePL after this C code, this process by HAVOC, this is going to be translated as this
statement, which address map memF with 1 at the location X plus offset of field F. Yes.
>> Can you go back one slide. So looking at the last slide of Boogie code, you have a
map for F and you have an offset for F.
>> Zvonimir Rakamaric: Yes.
>> What's -- I guess what's the assurance that I never create mems of G and X plus offset
F? Like what happens if I get those out of synch?
>> Zvonimir Rakamaric: So you can add checks in HAVOC which will add assertions in
the Boogie code, which will ensure that you cannot do that.
And there is a paper of how to prove these assertions and the whole techniques -technique of how to do that. In this line of work we are not using those. So we can be
unsound.
>> Okay. But for now we just trust that your translation won't introduce these mistakes.
>> Zvonimir Rakamaric: For now we are -- well, in the actual implementation currently
we can be unsound. Because we are not checking those.
There is a knob or switch in HAVOC that can introduce those checks. We haven't really
tried it. I think the -- I don't know how scalability will be in -- if we do that.
>> I mean this -- I'm showing this kind of like type safety, and the idea is that you were
trying to check that separately using techniques that are very different from these kind of
techniques. So the goal of STORM. The goal of STORM is to -- just put two [inaudible]
to check concurrency errors, you don't want to mix up type checking with that because
you want to divide and conquer in that [inaudible].
>> Why do you need the sets of F [inaudible]? Why not say M -- memF of X. Why do
you need those sets of F there?
>> Zvonimir Rakamaric: Well, one of the reasons is definitely if you want to actually
check that this -- so if you want to introduce this check that he was asking about, you
have to prove that at this offset you actually have field F. So you have to -- you need to
know the offsets. And also you can have pointer arithmetic and things like that.
But, yeah, that's a good question. Actually I got the same one from my supervisor.
So now that we have a concurrent BoogiePL, in step two we are going to use a translation
based on the paper published by Lal-Reps in CAV 2008 to translate this concurrent
BoogiePL into sequential BoogiePL.
So they published a translation that works for scalars. We lifted it to support field maps
as well and therefore memory.
So I'm going to illustrate the translation on a simple example which translate just one
concurrent trace. So the trace has two threads, thread 1 and thread 2, and one shared
variable, G. It also has three context switches, which means that we have four execution
segments total.
So the main idea behind this translation is to avoid storing local state since local state is
typically way bigger than the shared one. And it's going to be done by introducing
unconstrained symbolic values instead of still unavailable future global values. And then
later on when the future [inaudible] become available, we are going to constrain them.
So I'm going to show how this works on an example. So let me talk about the notation
first. So we have two threads, thread 1 and thread 2. We have four execution contexts,
which are these stickers marked with 1, 2, 3, and 4. We have one shared variable, G, and
G1 and G1 prime and so on are marking the values of G in the beginning and end of the
representative execution context.
And black arrows represent context switches between these two threads. So now a
straightforward way of encoding this concurrent execution into a sequential one would be
to introduce go tools instead of context switches.
So here we will simply, you know, introduce a go to which we jump to thread 2, it will
jump back and so on.
Now, if we do that, know that here we have to restore the values of locals after execution
context 1, which means that we have to store them after execution context 1. And that's
exactly the thing we are trying to avoid since local state -- storing local state, which is
way bigger than shared state, is not a good idea.
So how do we go about doing this translation that we don't have to store local state.
Well, we simply stack this execution context sequentially one after another. So now in
this part here when we move from 1 to 3, local state gets preserved and nothing has to be
saved.
But the problem here is that if you look at the -- these Gs that represent the value of the
shared variable G, they're not aligned properly and correspondingly to the concurrent
execution of the left.
So at this point, for instance, our G3 here on the left has to be equal to G2 prime and not
G1 prime. And the trick that I talked about, about introducing unconstrained symbolic
constants, come into play here.
So G2 prime is available to us only in the future. And therefore at this point we are going
to set G to an unconstrained symbolic constant. We have three that is going to represent
this unavailable future value.
And then down here once that G2 prime becomes available, we are going to constrain this
V3 with G2 prime. And this is the main idea behind the Lal-Reps translation.
We need some more switching. So at this point, G2, if we look in concurrent execution,
has to be equal to G1 prime. So we introduce this assignment. And over here G4 has to
be equal to G2. G3 prime again. We have to introduce another assignment.
Now, if we switch the code like this in a sequential execution, we are guaranteed that G4
prime here after sequential execution is going to be equal to this G4 prime here after the
concurrent one.
And this is kind of the high-level idea behind the [inaudible].
So I showed it only on one concurrent trace. But this can be generalized to whole
programs.
So I'm going to show, again, high level how this is done for the whole program. So in the
program we are assuming that we allow N context per thread, and again we have one
shared variable G. And our concurrent program has two threads, T1 and T2, and in the
end we want to check assertion F.
So the sequential program is going to have some initialization in the beginning. Then
we're going to have code of a properly translated thread T1 followed by the code of a
properly translated thread T2. Some switching at the end. And, again, checking our
same assertion.
So in the initialization we are going to introduce copies of our shared global, one copy
per each context. And also we are going to introduce those unconstrained [inaudible]
constants that I was talking about, those Vs from the previous slide.
We also need context counter K, which is going to be initialized to one in the beginning
and which is going to be updated every time we switch to another context.
And in the beginning we have to constrain -- we have to set our shared global to the
unconstrained symbolic constants we introduced.
So each statement of every thread is going to be blown up into this switch case where
based on the value of K we are going to call procedure schedule, which means context
switch, and in the statement we are going to replace our shared G with the appropriate
copy.
And now schedule, as I said, mimics context switch. So schedule is -- if we still need to
reach the maximum number of contexts that we have available, we are
nondeterministically going to perform a context switch in increment K, or if we reached
our maximum number of contexts, we are going to reset K to 1 and jump to next thread.
In the end we have to introduce those assumes that I talked about in the previous slide
which are going to constrain the unconstrained symbolic [inaudible] introduced with the
values that are now available to us, and then we can check the assertion in the end.
So this is kind of how the whole program transformation of Lal-Reps works.
Now, in our translation Lal and Reps did it for scalar, so we lifted it to handle heap. And
because we have this memory model with maps, I talked about that enables translation
that is very similar to scalars, basically we introduce these copies for these maps and also
constrain them appropriately, the same thing that you would do for scalars.
I would like to know that map constraints are only assumed equalities, which means that
we don't need extensionality axioms inside the theorem prover, which improves the
performance.
So we implemented this translation and tried it on a number of like smaller benchmarks.
And it worked great. But then as we tried to scale it to more realistic examples, it
basically blew up. And the main reason for that is because realistic examples are going
to have a lot of memory maps and -- even like hundreds, and precisely tracking the state
of which -- which of the memory maps and introducing these copies of memory maps is
simply not going to scale.
So our answer to this problem is an algorithm we called field abstraction. And I want to
illustrate how it works on example.
So the main idea is to split a set of fields into two sets, thread fields and unthread fields.
And we are going to precisely trick the values of thread fields while, on the other hand,
abstracting unthread fields.
So each read from an untracked field is going to be replaced with a nondeterministic
value and each write is going to be skipped.
So in this example we have three statements which read and write fields F and G. And
our set of tracked fields is going to be only field F.
So the algorithm's going to be applied to each statement. So starting from the first one,
the first sentence reading field F, which is now a set of thread fields, so we cannot do
anything there, the second one is reading field G, so we can abstract it with a
nondeterministic value. The third one is writing to G, so we can simply skip it.
So we can see that applying this algorithm greatly simplifies the code. And, indeed,
when we run it on our benchmarks, we manage to check actual real device drivers.
However, know that in this case we require user input because user has to provide this set
of thread fields which is good enough for discharging assertions that we have in the
example.
Now, the question is whether we can do something to generate this set of trackers
automatically, not require any user input. And, again, the answer to that is an algorithm
based on CounterExample Guided Abstraction Refinement, or CEGAR, framework.
So here is how the algorithm works. So we start with an empty set of thread fields. We
enter our usual CEGAR loop. We abstract our program P based on these set of thread
fields into the abstraction A. We check the abstraction. If we manage to check it, we are
done. If not, we get the counterexample, check if it's real error. If it is, we return an error
trace. If it's not, we enter the counterexample.
And based on this analysis, we're going to add new fields which we believe are relevant
into our set of track fields. And then once we do that, we repeat our CEGAR loop.
So the important thing to note is that this loop is always going to terminate since in the
limit we are going to add basically all of the fields and check our concrete program.
So I implemented this as DS in a prototype implementation called STORM and applied it
on four Windows device drivers. So to be able to do that I created the harness to close
off the drivers and to mimic the behavior of the OS.
So the harness I created is going to create a driver request that gets processed
concurrently by multiple routines in different scenarios. So, for instance, we'll have a
dispatch routine in parallel cancellation or a dispatch in parallel cancellation and parallel
completion and so on.
And the property I checked for all these drivers is that a driver request cannot be used
after it has been completed, which is like a simple -- which is like use-after-free property.
So here are some running times when we varied the number of contexts we allow per
thread while providing thread space manually. So drivers range from 4- to 30,000 lines
of code roughly. And if you look at the running times for 5 contexts per thread, there are
a few hundreds of seconds, so scalability currently is not an issue.
Also I would like to know that in the process I found a bug in the usbsamp example. I'll
illustrate how the bug looks like in the later slides.
Another set of experiments where we tried out our CEGAR algorithm, so one thing to
take a look at is how precise our CEGAR algorithm is and how well it works. So if we
take a look, for instance, in the mqueue example, mqueue has the total of 72 fields. And
now when I went in and manually provided the most precise set of thread fields necessary
for discharging this example, I came up with a set of seven fields.
And when I ran our CEGAR algorithm, CEGAR generated sets of fields which have
eight or nine elements, which is very close to this seven. So the heuristics we have for
choosing [inaudible] fields works extremely well.
Run and time jumped to a few thousands of seconds, which is understandable because we
have to repeat the CEGAR loop quite a few times.
So as I mentioned, I found a bug in the usbsamp example. Usbsamp is a sample driver in
Windows Driver Development Kit. It's actually an example of how to write device
drivers and as such is going to be copy-pasted by driver vendors that download Windows
WinDDK from the Internet.
This driver has been checked using existing tools. The bug was confirmed and fixed and
it requires at least three context switches to be discovered. So therefore SLAM, for
instance, that checks only sequential code or KISS that can check only up to two context
switches, couldn't find this bug.
So here is a very simplified code excerpt showing how the bug looks like. So to see the
bug, we need two threads. Thread 1 is executing dispatch routine; thread 2 is executing
cancellation routine.
So thread 1 gets to execute first. If we heed this statement that marks the request as
cancellable and sets its cancel routine, now cancel routine gets to run. Cancel routine is
going to check where the request has still not been completed and whether the cancel
routine was set. So this check is going to pass.
And now the context switch. We are back in our dispatch routine which completes the
request. And the third context switch takes us to a cancellation routine which tries to
access the request which was just completed, and this is a bug.
So to conclude, I introduced a method for reducing context-bounded analysis of
concurrent C code to analysis of sequential code. The method can handle heap and is
based on VC generation, SMT solvers, which makes it automatic and very precise.
I also show the -- an algorithm called field abstraction which greatly improves scalability,
and I implement these ideas in a tool called STORM, which is an assertion checker for
current system code.
I applied STORM on a number of experiments on the real-life system code and in the
process I found a bug.
So these were basic results from maybe a year ago. And then because the promise that
this line of research showed -- yes?
>> [inaudible] can you give some kind of insight as to why the bug you found wasn't
found with testing? Seems like [inaudible] was it very infrequent or if it was ->> Zvonimir Rakamaric: So I'm not sure -- okay. So very often you will have to have
these context switches in very, very specific places. Like I've seen examples where you
have two statements and the context switch has to happen right in between these two
statements to see the bug. So I can imagine that this is very hard to hit using testing.
Yeah, the other thing is cancellation is kind of an [inaudible] occurrence event. So,
again, it has to happen exactly at a certain point for this bug to occur.
>> So is something like CHESS allowed to find this?
>> Zvonimir Rakamaric: So CHESS would definitely help you with introducing context
switches at right places. But, again, you have these asynchronous inputs, which I don't
know how -- like CHESS, I'm not sure, but as far as I know, I don't think it helps you if
you want to generate those.
So let's say, you know, interrupts can happen, right, so you want to be able to generate
these interrupts at all these points during your execution, and I don't think CHESS helps
you with that.
>> [inaudible]
>> Zvonimir Rakamaric: Well, that's another thing on top of everything. Let's say you
managed to somehow get it to work on kernel code.
>> This is kernel code.
>> Zvonimir Rakamaric: This is kernel code. Right. I don't know whether the technical
issues are related to making CHESS work on kernel code. The things that we're saying is
if we imagine that you can actually get it to work.
>> [inaudible] pull this out of kernel like you did, right? You didn't run the drivers for
this in the kernel.
>> Zvonimir Rakamaric: No, I don't run anything. Everything is static. So but CHESS
actually has to run.
[multiple people speaking at once]
>> So it's very hard to extract something like this out of the kernel and make it actually ->> Zvonimir Rakamaric: Runable. Yeah, I don't know.
[multiple people speaking at once]
>> In principle there are lots of techniques that could have [inaudible] but you form
[inaudible].
>> Because otherwise we can apply that to basically a lot of things that ->> I mean, in general I think there's one point that I would like to make, is that for this
kind of code, doing things statically actually helps a lot because it gives you a lot more
[inaudible] forget about like [inaudible] things like that. Gives you a lot more control
over the execution and the environment in which you try to execute the unit.
>> Right.
>> What happens then if you just depend on dynamic testing, you need a lot more like
testing [inaudible].
>> Right. Have you set up the context.
>> So if you look at, for example, the way we stubbed this thing was like maybe a
handful of lines, maybe a hundred or two hundred lines, right?
>> Right.
>> [inaudible] amount of infrastructure that you would need to create a real executable
test harness would be significantly larger.
>> Zvonimir Rakamaric: Okay. I want to leave time for questions, I should probably
speed up. So, yeah, I come back from -- came back for another internship, and next ten
slides are kind of the latest news, what's the state of STORM currently.
So we added a bunch of new features that enable easier modeling of OS features in stubs
and in the harness.
So we added support for function pointers, ghost maps, thread IDs, atomic blocks,
dynamic thread creation and so on. So I'm now going to show this on a few examples.
So here is how we, for instance, use function pointers in stubs. So if -- so drivers can set
completion routines which amounts to setting a field of an IRP with a pointer for a
completion routine, and then in our code, which mimics a call to a completion routine, we
can simply read this field and use function pointer to call this routine.
So before we had -- so before we supported function pointers, modeling this in a harness
was way trickier. We had to introduce global variables to store this completion routines.
And it was very hairy.
Then we introduce ghost maps, thread IDs and atomic blocks. Here I show how we
model spinlocks using those. So we introduce a ghost map, which is used for tracking -ghost map called Lock which is used for tracking a state of a lock. Then using thread IDs
we can actually add these assertions that check that spinlocks are being used
appropriately. And we also need atomic blocks to be able to specify these stubs.
Another important feature that we add is support for dynamic thread creation. And this is
actually another feature that is not supported by any other published translations as far as
I know.
So the syntax is that -- in the syntax we introduce the new keyword called async. And
async can be used to annotate procedures. And the semantics is that it creates a new
thread in which it runs the procedure that it annotates.
So down here you can see the translation of this async on the kind of BoogiePL-like
level. So we have our context counter K, we have our thread IDs and thread ID count
which is used to create a fresh thread ID for the thread we are creating. And async is
going to be blown up into this expression.
We are essentially going to store all values of thread ID in K, create a new thread ID
called foo, and then restore the values.
And since -- because of the nature of the translation, this is kind of a very nice and
elegant way to support dynamic thread creation.
So, again, the dynamic thread creation helps a lot when modeling important
asynchronous OS features such as interrupt, deferred procedures calls, timers, and so on.
For instance, IoCallDriver is a stub which is called by a driver when the driver wants to
pass a driver request to a low-level driver, and the low-level driver can synchronously or
asynchronously complete the request.
And in the stub, this is accomplished by a simple nondeterministic if statement
which either CallCompletionRoutine asynchronously or we add this async in front of
CompletionRoutine, which calls it asynchronously.
Again, very straightforward way to do it before we had, again, like global auxiliary
variables to help us achieve that and so on.
So while I was here, in the three months we completely improved the scalability of
STORM by maybe an order of magnitude. We also check -- in addition to the
completion property we check spinlock properties as well. Because of all that, we
managed to push more -- way more drivers through STORM. So we have more examples
in WinDDK plus a number of drivers from Win8 enlistment.
Also a played a little bit with a non-Windows code. I ran STORM on Xen hypervisor
timer module and checked some properties there as well as.
I still don't have running times. This was running when I was leaving Microsoft, and I
can't exit this code, so it's a few thousands of seconds. If you run STORM it's CEGAR
loop. Because we run it on way more drivers, we also found more bugs.
So, yeah, I was looking through the locks as I was leaving, so the number may be ten new
bugs. Three of those bugs got confirmed. One of them is related to concurrency. It's
kind of similar to the one I showed. It requires three context switches and cannot be
discovered using sequential checking. Two of them can be found in sequential cases as
well, and those two are related to spinlock properties we're checking.
We still have to explore why Slam didn't find them. I have no idea. I mean ->> [inaudible]
>> Zvonimir Rakamaric: I think both of them are, maybe, you know, some error, but
where the lock doesn't get initialized and then it still exits. Like something like that. It
seems very -- like Slam should be able to find those. I don't know why it didn't.
>> [inaudible]
>> Zvonimir Rakamaric: Seven of those bugs are false bugs. One is due to the need for
bit-vector arithmetic. We are planning to add support for that soon, I guess.
>> [inaudible]
>> Zvonimir Rakamaric: It's use-after-free. You know, you complete the request, you
shouldn't use it. And spinlock properties, well, we -- if -- when you release the lock, you
have to acquire it before. And when you acquire a lock. Yeah. So ->> So there could be many more properties you could potentially check.
>> Zvonimir Rakamaric: Yes.
>> You're actually limited -- you're only looking at a fairly small number compared to,
for example ->> Zvonimir Rakamaric: Yes. Well, Slam checks like hundreds of them. So we're still
trying to see like which properties are interesting to check in a concurrent kind of setting.
So that's -- but, yeah, you are right. We are planning to add more properties.
>> Are you doing annotations? Are you putting annotations into those drivers?
>> Zvonimir Rakamaric: No annotations. You have to put -- well, okay. So we have to
put assertions. So STORM checks assertions. Now, all the assertions that we put are
actually in the harness. So we didn't touch driver code at all.
Now, we can do that because currently we are inlining everything and unrolling loops.
So we need a better kind of story for that. But there are no annotations. It's very like
push-a-button approach.
Six of those seven bugs are due to harness. In particular I was missing some initialization
which have to be added, and if I don't have that, then I get the false bug.
So I would like to know that none of them are actually [inaudible] itself. Most of them
are because of the harness. It's not precise enough.
This is kind of the latest news that I have from last week related to using STORM in the
context of the beta software. So there is a company in Vancouver called Critical System
Labs, which is involving development of so-called formal methods supplement for the
new certification process for flight software. And they are supposed to give guidance to
this consortium of which formal tools and techniques to use to check flight software.
So they have this ongoing test study where -- case study where they were collecting all
the tools that they could find that could check concurrent C programs we couldn't see.
Well, they're actually looking into Safe C, which is subset of C. And then they are going
to recommend the tools to this group that is working on this supplement.
And STORM is currently part of the this KISS study. So I'm going to, you know, give
them support and we'll see how far we'll push STORM in their direction.
Now, to conclude, so I showed my research that is addressing key challenges in system
software verification, which is how do deal with heap, modularity and concurrency.
My research is very much driven by applications, and it goes from theory to results,
usable and practice. I build project limitations, use them to do case studies and real-life
code in the process, find real bugs.
There is bunch of areas for future work. In the context of memory models, for a long
time I had this idea of so-called eager heap separation where you would further improve
memory modeling by using regional or alias information from pointer analysis.
Then for STORM, more case studies is of course always important, so I would like to try
STORM or something else. The device driver, so for instance a filesystem, maybe USB
driver stack where problems read with hardware/software co-verification will have to be
solved and so on.
Also currently, as I said, we are inlining all procedures and rolling loops, and we've
seen -- as we go along we've seen blowups happening because of that. And some form of
lazy inlining is crucial for scalability of this technique. And we have ideas of how to
perform this lazy inlining inside of a theorem prover that have to be tried out.
Then, as I said, modular verification is the key to scalability. And how to use some kind
of lightweight contracts in the presence of concurrency, and this transformation that is
derived is also a research area I would like to explore.
Furthermore, there is a project at Microsoft called ConcurrencySAL which adds kind of
lightweight annotations to Windows code. And, yeah, that's kind of an ongoing project.
And I believe we could leverage on already present annotations that ConcurrencySAL
introduce to the code to improve scalability of STORM.
And also, on the other hand, we could potentially use STORM to more precisely check
these concurrency annotations, and by doing that filter false alarms that they are getting.
There is also an idea to use similar infrastructure to check parallelization of sequential
code by building on the symbolic diff project that [inaudible] is working on.
And, well, [inaudible] this is kind of the grand master plan. So in the course of
developing STORM, we would also develop something we currently call Static
Concurrency Exploration Platform. So it would sit on top of Boogie and it would take
concurrent BoogiePL as an input. So it would extend BoogiePL with a construct such as
async and atomic that are needed for concurrency.
And then many tools that currently translate code that's written in C or C++ or C# could
use this structure we will build to translate concurrent code into a current BPL, and then
we will handle that.
And, yeah, with this final slide I would like to close my talk. Thank you.
[applause]
>> Zvonimir Rakamaric: Yes?
>> Shaz Qadeer: Questions?
Download