>> Shaz Qadeer: So it's my pleasure to introduce... graduate student at the University of Wisconsin, and he's about...

advertisement
>> Shaz Qadeer: So it's my pleasure to introduce Akash Lal to you. Akash is a
graduate student at the University of Wisconsin, and he's about to wrap up his
dissertation on interprocedural analysis of concurrent programs. Very exciting
stuff. He was advised by Professor Tom Reps over there. And he is going to
join Microsoft Research India starting in the fall, and he's going to tell us about
his dissertation work today.
>> Akash Lal: Thank you for the introduction. Microphone on? I think it is.
All right. Let's get started.
So this is a modified version of my job doc. I got rid of most of the fluff, and I
don't think I need to motivate the need for verification. I'm sure everyone here
has had some disturbing experience with software failure.
My recent experience was I was playing Diablo, which is a computer game, and I
was about to kill the big boss when that program crashed. And that was pretty
disturbing. I had to work up all the way again.
So not only is software everywhere, software is also getting concurrent. Intel just
released an 80-core chip for teleflow computations that might be coming to
desktop soon.
And as software gets more concurrent, the bugs are getting more complicated
and we need verification of concurrent programs.
As you all know, testing is not very effective when it comes to concurrent
programs because if you execute the same trace with one [inaudible] the bug
doesn't show up, but a different [inaudible], the bug -- there's a crash. So even
when you fix the [inaudible] to a concurrent program, there is still a lot of
non-determinism and you're not sure if you tested the program enough that
you've ruled out the bugs.
So not only is testing hard, but recently concurrent software is also hard. So this
is -- I'm going to tell you the Bluetooth driver story, which is catching up in the
verification community.
In 2004 Kadid [phonetic] and Wu [phonetic], they did the [inaudible] case, and
they found a bug and assertion violation in the version of the Bluetooth driver,
and they released a fixed version for it.
In 2006 the same assertion violation was found in the fixed version. In 2008 the
same assertion violation was found in the fixed fixed version.
Now, this story has to be taken with a grain of salt because every time a new
fixed version was proposed, some part of the story was lost.
So this bug was essentially violated in OS invariant, which was -- which they
were not aware of and which was implicit in the previous work.
But in any case, we know reasoning about current programs is hard.
So what I'm going to be talking about is my thesis work on verification of
concurrent programs and my contributions to it. I know that in the abstract I
promised the verification of machine code as well, but I left that out just to spend
more time here, and I can talk about verification of machine code offline.
All right. So let's look at the problem of concurrency a little harder, a little more
closely.
So in terms of testing, we can say concurrency is hard because even with a fixed
input there can be an exponential number of behaviors in the program. That's
why testing is hard. You need to run the program so many number of times.
But in verification, whether you're doing verification of sequential programs or
concurrent programs, we anyway have to reason about an unbounded number of
behaviors. So what's an exponential factor of an unbounded number.
So the way to answer that question is let's look at how we actually verify
programs. We typically use some sort of abstraction, which is you consider an
old approximation of the program and you try to verify that or approximate it
more.
And what happens is even with the common abstractions that we have used for
sequential programs, the presence of concurrency makes the analysis harder
even for those abstractions. So if you take sort of the simplest kind of
abstractions, which is finite abstractions, in the sequential case it's all [inaudible],
but in the concurrent case it becomes either PSPACE-complete or undecidable.
And what that means is even with the abstractions we have been using, the ones
we are familiar with, we can't use them when it comes to concurrent programs.
So the solution we choose is that we're not going to do full verification of
concurrent programs. We're going to be somewhere in between being sequential
and concurrent, which is what we call context bounded concurrency.
So context bounded analysis is the analysis of concurrent programs when you fix
a bound in the number of contexts that can happen between different threads.
So what Shaz had proposed in his previous work was to bound the number of
context switches, but we have a slightly different notion, which is bounding the
number of execution contexts. These two notions coincide for the case of two
threads, and I'll start distinguishing them when I talk about multiple threads.
So there's a history to context bounding analysis. I think Shaz started it with his
tool chest that did verification of concurrent programs with two context switches ,
and they found a lot of bugs in drivers, and they had a lull which said that for
finite data abstractions, the problem of context boundary analysis is decidable.
And subsequently they have this wonderful tool chest that does systematic
enumeration of all interleavings given a test harness, and they found many bugs
in a few context switches.
So this leads to discussion what can we do if we don't fix the test harness, if you
don't fix the test input, can we still do verification under a context bound.
So my work starts here. First thing we showed was that this bubble is [inaudible],
it's NP-complete. Then we gave decidability under more complicated
abstractions, including infinite [inaudible]. But except for KISS, all this work was
still theoretical.
And we subsequently extended to do practical verification of concurrent
programs.
And these two worked, so I'm not going to be talking about the first one, which is
-- which involved an interesting mix of automata and matrices, but what that
showed is this decidability, the third one, that showed us that the world of context
bounded analysis is not that much different from sequential analysis. And
subsequently we showed that we can in fact do context bounded analysis using
sequential analysis, and this will what I'll be talking about.
So just to summarize what context bounded analysis has done for us is that it's
added a third column to our table saying that full concurrency was too hard, but
context bounded analysis has a better hope of scaling.
All right. So let's just get down to work what we could do with context bounded
analysis. .
So we tried to test two hypothesis. First that we wanted to show that most bugs
can be found in a few contact switches because only then would context
bounded analysis really be useful. And also in terms of components, that context
bounded analysis can compete well with full verification techniques.
So we did our study in this tool called DDVerify, which is like [inaudible] except
that it works on concurrent programs. So it has a predicate abstraction front end
that produces a concurrent boolean model, so every thread in the program is
abstracted to a boolean program.
And then the way they've structured it right now is that this Boolean model is
[inaudible] to that model checker SMV, and that either says that the program is
correct or it returns a counter example and there's a refinement there.
And what we did was we took out the model checker SMV and we put our own
model checker that does context bounded analysis and we fed it some context
bound K.
And we compared what SMV could achieve, which is what we could achieve.
And this is what we found.
So every dot here is a different [inaudible] example. This is the time our tool
took, and this is the time that SMV took. It's a log log plot. And every dot is
classified into two categories. Either that SMV said that the program was correct,
meaning that there were no bugs in that program, or SMV said that the program
is buggy.
So the [inaudible] of whether the program is correct or buggy was as SMV said.
And what we find is, first of all, we were about -- median speed-up was about 30
times -- okay, so for all the bugs, all the dots that are buggy, all the cases that
are buggy, this number says the number of contexts, which is before we found
the very same bug.
And what this shows is that our tool with a bound of K equals 7 was able to find
all the bugs that SMV could find. So what that means is -- so in this case all
bugs occurred in six contexts switches or fewer, we were able to find it, and with
a bound of K equals seven, we were 30 times faster than SMV.
>>: So one question. For the correct ones, what was the K you fed to ->>: Akash Lal: K equals 7.
>>: You didn't have a K, right?
>> Akash Lal: No. SMV is full verification.
Okay. So that's what we could do. Now let's get down to how we build our tool
that does context bounded analysis.
And the way this is structured is that you take a concurrent bound -- a context
bound K, take in the current program, and we do a reduction into a sequential
program such that checking properties of this sequential program is sufficient to
conclude properties of this concurrent program aren't in that context bound.
Yes?
>>: And that's samples that you have, how many [inaudible].
>> Akash Lal: Most of them will have [inaudible], but they can have more.
>>: And you expect this -- so [inaudible] times speed up.
>> Akash Lal: Uh-huh.
>>: Do you expect that number to increase, your speed-up to increase as you
increase the number of [inaudible] or decrease?
>> Akash Lal: I would -- well, that's a good question. It could really depend on
the program, but more threads I would expect our speed-up to increase. I will
give you some evidence later.
Yes?
>>: [inaudible] this data K equals 7, why 7? Did you try 6 and then there are lots
of [inaudible].
>> Akash Lal: Yes. That's why K equals 7, because I knew I could find all bugs,
and I had to give an odd number. Yeah, that's fine. So ->>: [inaudible].
>> Akash Lal: So the programs that DDVerify produces, the Boolean programs,
they have all procedures in lined. So all the Boolean borders are single
procedure. And they had to do this because SMV could not handle procedures.
But our tool actually can handle procedures. So there's hope of extending
DDVerify so that it produces more complicated models.
>>: So, for example, what you did was you took the [inaudible] procedure
[inaudible] and then the context bounded analysis ->> Akash Lal: Yes.
>>: You did not change the translation so that ->> Akash Lal: No. [inaudible] so I could not do that.
>>: Did you do any comparative with [inaudible]?
>> Akash Lal: I have. I will show them later, later on.
Anything else?
So I have some more results later on.
All right. So we do this reduction to sequential programs, and what that means is
that we can now borrow all the cool stuff that we did in the sequential world and
apply it to the concurrent world.
And whatever I describe next, we only focused on safety properties and that do
only assertion checking. And if we have other sorts of properties like, for
example, the array should never be cleared between an insert and an increment
of the length, then we instrument programs so that what we verify are just
assertions.
So this is just to say that we are only going to worry about assertion checking
and other properties can be reduced to it.
Okay. So start off by considering just the case of two threads. The model of a
program is that these two threads have their own local memory, and they
communicate through shared memory.
And in whatever I describe next, it will be useful to think of a thread as just
describing a transition system. So it takes in a shared state S1, its own local
state, L1, runs for a while and produces this -- and changes the shared state to
S2 and changes the local state to L2. So every thread is just describing a
transition system.
Now, in the case of two threads the execution will proceed with the control
alternating between the two threads. So first T1 goes, then T2, then T1, then T2,
then T1, and so on.
And the problem with the analysis of comparing programs is when you do
execute one thread, so T1 executes and changes S1 L1 to S2 L2. Now, the
shared state has to go here to T2. It needs to know where it has to start from.
But the local state has to be saved and then restored when T1 gets execution
back again.
And whenever there's a split in the way the data is flowing, you risk losing
correlation between the two.
And that happens because what we're doing is that we're not propagating a
single state but what we're propagating are sets of state, set of all reachable
states. So when we feed in S1 L1 to T1, it can reach this huge number of states,
SI LI.
And the set SI has to be passed to T2, but the set LI has to be saved and then
restored. But what happens here is that you list the correlation between which SI
or which SJ prime is related to which LI.
And the way that's typically solved is that you start pairing local states. So you
put in the local state of T2 in your state [inaudible]. So you say it's going to be S1
L1 N1, and now you don't have any forks, but what happens is now you started
pairing local states, and that gives you a state space explosion. So you have an
exponential growth with the number of threads. And that's what makes
concurrent analysis hard, because you have to consider the local states of all of
the sets.
So the way we're going to solve our problem is slightly different. What we say is
that -- so this fork is bad. So we're not going to do a fork, and instead what we're
going to analyze is that we're going to analyze T1 first, be done with it, and then
we'll start analyzing T2.
And the way we're going to simulate context which is in the execution of T1 is
that we're just going to guess what T2 did. So during the execution of T1, if you
want to do a context switch, we just change -- we just guessed a new shared
state, S3, so we just guessed that the execution of T2 is going to change the
shared state from S2 to S3, and when we finally get to T2 we're going to verify
whether the guess was correct or not.
So what we're going to do is we're going to guess K minus one global states
where K is the number of context switches. So we say S1 is the initial state of
the shared memory where the program is supposed to start execution, and S2 to
SK are guesses. So basically they're just arbitrary shared states.
Sorry. K is not the number of context switches, K is the number of chances that
each thread gets to run. So the number of context switches is 2K minus one.
All right. So the execution of T1 will look as follows. You start -- we start
execution of T1 with S1 L1. So we use the first state, S1, we start it off, we let it
run for a while, and at some non-deterministically chosen point in time we just
use our next guess, S2. So we change the shared state from S1 prime to S2,
and we let it run again, then we use our next guess, S3, and so on, until we
exhaust all our guesses.
And then we pass control to T2. And now for T2, these S1 primes, these prime
states are going to act as the guesses. So you start with S1 prime, let it run for a
while, and then change it to S2 prime, let it run for a while, and so on.
And what this means is that right here we've guessed that the first time that T2
gets a chance to run, it has to change the shared state from S1 prime to S2. But
it actually changes it from S1 prime to S1 double prime.
So we have a check in the end that says S1 double prime should be equal to S2,
meaning all the guesses that T1 made, T2 actually satisfied those guesses. And
if you pass all these checks, then we have these arrows equating these states,
and if you get rid of the [inaudible] arrows, what we have is a [inaudible]
execution of the concurrent program.
And the good thing here is that the local states are short-circuited. They did not
have to be saved and restored. They were just reused during an execution.
So one thing that -- so this is the strategy we're going to use, a guess and a
check strategy. The next insight we use is that guess and check is not
something that's new to verification of programs. In fact, it's done all the time for
the verification of sequential programs.
So the problem for verification of sequential programs is that you need to guess
an input for which the sequential program reaches a bad state. So you have to
guess an input and check if a bad state was reachable.
But in fact, the tools that do analysis of sequential programs don't do a guess and
a check, they actually find the input for which a bad state is reachable. So we're
going to use this intuition to do our guess and check strategy.
And when we do that, we're pushing all the guesses into the input state, and then
we're going to ask sequential analysis tool to find the input for which a bad state
is reachable instead of just guessing it.
So what this means is that the final sequential program that we produce is going
to have K copies of the shared memory as input.
So if the original concurrent program had four variables -- w, x, y, z -- and let's
say K equals three, then the input to the sequential program is going to be three
copies of the shared memory and an extra count of I that's going to count from 1
to 3.
And the way we change the data, we change the code, is as follows. So let's say
st is a program statement, then tau of st, I, is a syntactic renaming of that
statement where every global variable g is changed into its ith [phonetic] copy,
and the local variables remain unchanged. And the way we change -- transform
each statement is given a statement ST, we produce this block of code in the
sequential program.
And what that does is it says if the value of i is 1, then I'll put it on the first copy of
the shared memory, if the value of i is 2, then I'll put it on the second copy of the
shared memory, and so on, until i equals K.
And this is the code that simulates the context switch. So at a non-deterministic
point in time we estimate the value of i and start using the next copy of the
shared memory.
And if i reaches the bound, then you set it to one and you jump to the next
thread.
The way the assertions are transformed is that we add a new Boolean variable
no error, and that basically just records the value of that assertion in the current
value of i.
So what we do is we take the two threads, produce the sequential versions using
the transformation I described on the previous slide, and then the concurrent
program in which T1 and T2 are executing in parallel is going to be this
sequential program in which you execute T1S, then you execute T2S, then you
have a checker, and then you assert that there was no error in the execution.
So the way the sequential program executes is as follows. So the value of i
starts with one. In that case T1S is going to do exactly what T1 did, but on the
first copy of the shared memory, then it reaches some point at which it
increments the value of i. So i becomes two, then it's just going to start operating
in the second copy of the shared memory. And then i becomes three. It
operates on the third copy. Then you pass control to T2, reset the value of i to
one. T2S operates on the first copy of the shared memory, and it operates on
the second copy, and it operates on the third copy.
And then you pass control to a checker that's going to check that this state was
equal to that state. So in this case T1 assume that T2 is going to change the
shared state from this guy to that guy. It will actually change it from this to this,
so you check whether these two are the same or not.
So this checker just checks this equals that and this equals that, and if the
checker passes, then this is a reachable state of the concurrent program and we
can do our assertion checking on this reachable state.
So the good things about this reduction is that the local state of the program does
not increase. So the local state of this sequential program is only the sum of the
sizes of the local state of the original threads. A nice would have a product here.
So you got a product down to a sum.
In terms of static analysis, the global state of our sequential program has K times
the number of variables because we had to introduce K copies of the shared
memory. In terms of model checking, the shared space grows exponentially with
K because K times the number of variables means that your shared space
increases by a factor of K.
So this sort of means that there is no free lunch.
The problem is in the completing K, and we're sort of bound to have an algorithm
[phonetic] that's going to be exponential in K.
And the main idea that we've used here is that we've reduced the control
non-determinism that comes from concurrency into data
non-determinism, and data non-determinism is something that's already handled
in the sequential world.
All right. So the situation changes slightly when we have multiple threads. So
let's continue the case when we have n threads and K context switches.
So the problem in that case is that there are too many thread schedulings that
can happen in such a program. So you can have either T1 goes first, then T5,
then T2, then T4 or you can have T1, T3, T4, T2, and so on.
You can have n machines one to the K number of thread schedulings. And just
enumerating each of them is going to make even the best case complexity
exponential in K, and that's something that want to avoid.
So our solution that we're going to consider only one thread schedule, and that's
going to be a round-robin thread schedule of length n times K.
So in this round-robin schedule every thread gets K chances to run. So it's K
execution contexts per thread. And the properties of exploding just this one
round-robin should do is that it considers strictly more behaviors than all thread
schedules with K context switches or fewer.
And the reason for this is as follows. Supposing we have this thread schedule
which is not round-robin and has two context switches. This thread schedule
simulated by a round-robin schedule of length two times three, which is this one.
But in this round-robin schedule, what happens is T2 decides to do nothing when
it gets a chance to execute, then T1 here decides to do nothing and T3 here
decides to do nothing.
So this way round-robin schedule can simulate other schedules which are not
round-robin.
So what we have is if we consider one round-robin thread schedule of length n
times K, then we consider all behaviors with K context switches or fewer, but we
can get also other behaviors that have more than K context switches because
this schedule has nK minus one number of context switches.
And the other good thing about using this round-robin thread schedule is the
sequential program is -- producing the sequential program that simulates this
round-robin schedule is easy. And that just looks like that. So you just run T1
first, then T2, then T3 and so on.
And the reason why this works is as follows. When T1S runs, it operates on its K
copies of the shared memory, and because in the round-robin schedule T1
always passes on the state to T2, we just have to run T2 next.
So when we run T2 next, it just picks up what T1 left it at. So in terms of this
timeline, what we had to do here was we sort of did the execution of T1 for its
three execution contexts, the three chances that it got to run, and T2 we know
has to start off from where T1 left off.
So T2 has to go next. And similarly, T3 has to go next. And T3 always picks up
the state from T2.
So the one [inaudible] was we only had to make -- the number of cases we had
to make was not proportion proportionate to the number of context switches. We
only had to guess here where T1 started off its next execution context. So the
guesses are only going to be this state and that state because we know where
T2 is going to start off and we know where T3 is going to start off.
So even though we considered it a round-robin schedule of length n times K, the
sequential program has only K times more variables. It only needs K more
copies.
>>: It's a little confusing. In your second bullet you go from T1 to Tn, but then
you have to again execute T1, right?
>> Akash Lal: Here?
>>: Yeah.
>> Akash Lal: Yeah. So in this concurrent program, yes.
>>: Huh?
>> Akash Lal: In the concurrent program, yes. But not in the sequential
program.
The reason is when we run T1, we're going to run it for -- we run all its execution
contexts at that time. So in this picture let's say the value of K is three.
>>: Yes.
>> Akash Lal: So T1 gets three chances to run. So T1S operates on three
copies of the shared memory. So it already does all its execution contexts when
it gets to run. And when it's done, then T2 does all its execution contexts, and
then T3 does all its execution contexts.
So what we have is a round-robin -- similar to a round-robin schedule in which
every thread got three chances to run for three threads.
>>: So but to simulate K, all executions of K context switches, you have to give
each thread n times K chances to run, right?
>> Akash Lal: Just K chances to run, not n times K.
>>: Just K chances to run?
>> Akash Lal: Right. Because if you have K context switches, a thread can at
most get K chances to run.
>>: Sure. Okay. That's right.
>> Akash Lal: So we've given every thread K chances to run and we're
considering a round-robin schedule in which all of them get K chances.
Yes?
>>: Every time you get a chance to run, you could be starting [inaudible] and
ending [inaudible]. So I don't see any -- I don't see why you have to run them nK
times.
>> Akash Lal: Um ->>: Because you stop non-deterministically at any point in time and then you ->> Akash Lal: So one thing is each of these threads know where they're going to
start. So like, for example, T2 knows that it has to start from a state that T1 left
you at. So T2 is picking up everything that T1 did automatically. So ->>: I'm just -- let's forget the second and third execution of T1.
>> Akash Lal: Uh-huh.
>>: Is there any behavior that could appear in the second part which could not
appear in the third part? I think that's [inaudible].
>> Akash Lal: Well, I mean, it could do more here, right?
So, okay, so this is in terms of the global state. But what is kept is the local state.
So when we switch to use the next copy of the shared memory, we're reusing the
local state that it left off at.
So the local state is constraining how T1 is going forward. Even though the
global state is sort of reset here when you start using the next guess, the local
state is saved. You have to reuse the same local state.
So if there were no local state, what you're saying is correct, that all these three
would be isomorphic to each other, just repeating the same thing twice. But it's
the local state that sort of ties them together, constrains them to one particular
behavior.
>>: But every possible initial local state for the third [inaudible] could have been
a possibility [inaudible] of the second.
>>: I don't think that -- he's not doing that from the local states. What you're
saying is true only for local states. He starts from [inaudible].
>>: I know. But let's say you start it from statement one and execute it to five
and then second [inaudible] you execute it from five to eight. In the very first shot
you could gone from T1
to ->> Akash Lal: Right. But the difference is that they will state reset, a new global
state at statement five. So in some ->>: In one, execution would be executed from a different state. In the other
case, one [inaudible] is executed without any interruption. Right?
>> Akash Lal: Maybe. So ->>: Okay. I guess there could be -- I mean, if you want compute some of these,
you could do something better, some [inaudible].
>> Akash Lal: We are going to compute [phonetic] some of these. I will come to
that.
In some sense what's happening here is that when we're running T1, we're
considering a vector of global states. So when we give it three chances to run,
we're asking it to tell us what's the three sequences of global state changes that
it can make.
So when we're going to summarize the behavior of T1S, we're going to
summarize it as a strength of length six, which says that there is an execution of
T1 that goes from state this to state that, and if there's a context switch that
leaves it at this state, then it changes the state to that one, and if there's a
context switch that changes the state here, then it can leave it at that one. So I
will come to that a bit later.
All right. Just to summarize what we did for multiple threads, so we compared
two things. Either considering K context switches and all thread schedules with
K context switches versus what we said, which is K execution context per thread
and round-robin thread scheduling, then we get strictly more behaviors. We only
considered one thread schedule, so we avoid this exponential factor. And the
complexity remains the same because in either case we had to make K guesses
and not more. So it's [inaudible].
All right. Some more results.
So another property of the reduction with multiple threads is that because the
number of guesses are only dependent on K and not on n, the size of the
sequential program only linearly grows with the number of threads. So you sort
of expect that if you have more threads, your analysis should only grow linearly.
The analysis time should only grow linearly.
And that's sort of -- there is some evidence here. So in this graph we took a
Bluetooth driver. I think it's a fixed fixed fixed version of the Bluetooth driver.
These are the number of threads. We vary the number of threads, and we give
each thread four chances to run.
So we're systematically adding more and more threads, and the time grows
linearly at least for a little bit. [Inaudible] jumps up. We don't know why it jumps
up. But at least it was linear for some time.
This sort of says there's no free lunch. In this case we fixed the number of
threads to be three and we vary the number of execution contexts per thread.
And the time grows exponentially with the number of execution contexts per
thread.
So this is a comparison with SPIN. So we -- so there is a benchmark suite called
BEEM, B-E-E-M, that has a whole bunch of examples meant for explicit state
model checkers like SPIN. So we took some common mutual exclusion
algorithms [phonetic], and this is, I think, a network protocol.
And the benchmarks in each case has a correct and a buggy instance. And what
we found is that we found in the buggy version we always were able to find the
bug with a small number of execution contexts per thread, either two or three.
And we compared [inaudible] with SPIN.
>>: So this number of shared is the number of Boolean shared variables?
>> Akash Lal: Boolean shared variables.
>>: So the Msmie has 23 Boolean shared variables at 20 threads and SPIN
version at 31 seconds?
>> Akash Lal: There's not a concurrency here. So one of the ->>: [inaudible].
>> Akash Lal: I think so.
>>: Okay.
>> Akash Lal: I think so. One thing to take out here is -- another thing to take
out is -- so SPIN, because it's explicit state, it has a termination. In the set of
states is finite, then going to terminate. But sort of don't have any such
guarantee when we do context bounded analysis. We sort of don't check if we
already visited the state before.
So, I mean, if you keep increasing the bound R, time is going to keep growing
even though it's exploding the same states again and again. So that's something
we need to put in in the future.
>>: So I missed one thing. So are you actually [inaudible] the analysis on the
sequential program that you run?
>> Akash Lal: Right. Okay. So that's a good point.
We do symbolic model checking. So we used [inaudible] that does -- that uses
[inaudible] to construct function summaries and used BDS-based analysis.
>>: [inaudible] when you could have used a [inaudible].
>> Akash Lal: We could have used a [inaudible], but we just supported one
thing.
Yeah. So sort of this comparison isn't fair because we're doing symbolic model
checking here, whereas SPIN is exponential state. So -- okay.
So one of the downsides of this reduction is all these guesses that we make.
And these guesses lead us to -- most of the times will lead us to behaviors are
actually not possible in the concurrent state.
So we're going to explore a lot of redundant behaviors when we explore T1S and
T2S, but the checker is going to rule out most of them, and only a few will be
valid concurrent behavior.
So what we're going to do is we're going to saying that, okay -- the way we're
going to solve this is we're not going to analyze them in this order, but we're
going to analyze them side by side.
And the way we do this is by constructing thread summaries. So every third one
we construct thread summaries and we're going to build them for T1S and T2S at
the same time.
So a summary is going to look as follows. So a summary of T1 will have a value
of K, which is the number of execution contexts we've considered so far. G1 and
G2 are vectors of global states, shared states, and the length of those vectors is
equal to K. So if the value of K is 3, then the summary is going to look as
follows.
You're going to start from -- the summary says that T1S, when started in this
state, can leave us in this state. So the local state is not copied, so we have just
one local state. But there are K copies of the global state.
So this is a summary, what the summary means. And the way these summaries
are processed are either they're processed in an intrathread fashion, so if there is
a statement ST in T1S, so we know every statement operates on the last copy of
the shared memory. So if st is something that can take GL2 to G prime, L3,
meaning that it can take this state, modify it to that state, then this is how a
summary is updated. Then the summary can go from G1, L1 to G2, G prime, L3.
So G1, L1, G2 prime, G prime, G2,
G prime, L3.
>>: So you have G2. G2 is a vector, but G is just a single ->> Akash Lal: So G2 is a vector of 2 in this case. So this bracket thing is a
coupling operator.
>>: Oh, okay.
>> Akash Lal: So we say that if G is the last component of this vector, so if G is
distinct, and the statement st changes G to
G prime ->>: Let me ask my question again.
>> Akash Lal: Yeah.
>>: Is the size of G1 one greater than the size of G2 vector?
>>: Akash Lal: Yes.
>>: I see. Okay.
>>: Akash Lal: So particularly this is just an -- this is all that you need to know.
It's just an intrathread update of the summary. Okay?
And the way the eager analysis work is that it updates the summary as follows. It
says that if we have for a value of K, the summary says we can go from G1 to -G1, L1 to G2, L2, then the incremental value of K, meaning that we will consider
the next execution context, and we're going to extend our vector of global state
by any state G here.
So this rule applies for all G. So this is like making a guess. For the next
execution context we make all possible guesses. So in this case, this is what the
eager analysis will look like.
But what we're going to do in our lazy analysis is that we're not going to update
this E1 for all G. We're only going to update it for those G that actually occur on
the value at concurrent behavior.
And the way we do it is as follows. So the rule is going to look the same as
before as far as eager analysis. So all I've done here is that I've expanded the
vector to be where the first component is in it, which is the initial state of the
shared memory, and we're going to update this component -- I should use the
pointer -- we're going to update the first component with G and we're going to
update the second vector with also the same G.
But the way we pick this G is going to be different now. So this E1 says that T1
can go from innate T L1 to G2, L2. Innate T L1 to G2, L2.
And now we're going to look at the summary of T2, what T2 can do. So in this
case the summary says that it can go from T2, L3 to DGL4. And what we've
done is we have an implicit checking step. We've said that this G2 equals that
G2. So G2 used here has to be the same as the G2 used there, which is one of
the steps that the checker needs to do.
The other thing is that this D equals this D, because what that means is that
there is a valid concurrent behavior here that goes from innate to the first
component here, which is the same as the first component there, which goes
there, which is the same as this one. It goes there, that, that, and so on.
So this condition enforces that we're going to pass the checker, which means
that we have a valid concurrent behavior, and in which case this G is going to be
that G. So we extended our summary using just that G.
So it's similar -- so this is the only rule that changes when we go from an eager
analysis to a lazy analysis, and this enforces that all the guesses we made are
actually valid guesses, so we don't need a checker in the end.
>>: Can you [inaudible] for example, run the summary computation for the
different threads?
>>: Akash Lal: Yes.
>>: On separate -- in separated threads [inaudible]?
>>: Akash Lal: Yes.
>>: The second [inaudible]?
>>: Akash Lal: Computation [inaudible]. Yeah, so that holds more true for the
eager analysis, because there is absolutely no communication between the
summaries of the different threads. Then you get to the checker phase. So you
can construct the summaries of each of the threads independently.
>>: I see.
>>: Akash Lal: Whereas, in the lazy analysis, you sort of have to communicate
every time you increment the value of K.
>>: Okay.
>>: You wouldn't get that if you were using a sequential -- I mean a tool or -- for
analyzing a sequential program that would be compositional?
>>: Not quite, because it's case sensitive.
>> Akash Lal: Yeah. So ->>: [inaudible] I don't know.
>> Akash Lal: Yeah. So ->>: [inaudible].
>> Akash Lal: Right. So we can figure out a way of doing this using a [inaudible]
to a sequential program because we're sort
of -- once we produce the [inaudible] program, we're sort of restricted to analyze
the program in program order in some sense.
But there has been subsequent work. There's going to be a paper at KAV
[phonetic] that gives a lazy reduction. So analysis of the sequential program that
they produce actually corresponds to our lazy analysis. But there's a slightly
different reduction.
Yes?
>>: How is this different from [inaudible]?
>> Akash Lal: This is [inaudible] increasing the context [inaudible]. Yeah. But
you sort of do it in lockstep in that you know that these two summaries are being
computed side by side. When you are doing incremental value of K you
increment each of them and then you go forward.
>>: Another nice thing about this is that it is, like what someone was saying,
completely [inaudible].
>> Akash Lal: Uh-huh.
>>: You can just keep chugging along ->> Akash Lal: Right.
>>: -- and you get to reuse whatever happened in previous contexts. And the
other thing, if you just construct a sequential program, you run it for K. Then if
you want -- if you didn't find a bug and you want to run for it K plus 1, now you
have lost all work. You have to transform the program again and run it.
>> Akash Lal: Right. That's true.
>>: [inaudible].
>> Akash Lal: Yeah, you can do that, but if you're producing a new sequential
program, then sort of you're relearning the sequential analysis again. If you have
a smart sequential analysis that knows how the program changed, then it's okay.
>>: [inaudible].
>>: But the behavior that -- the behavior that you observe is K plus one either it
would be a superset of what you [inaudible] observed for K.
>> Akash Lal: Yes.
>>: And so in that sense it's [inaudible].
>> Akash Lal: Yes.
>>: So it could be engineered.
>> Akash Lal: Yes, it could be. Definitely.
>>: It's not like K plus 1 is a completely different program.
>> Akash Lal: Right.
>>: [inaudible].
>> Akash Lal: All right. Let me get a little bit into future work. Some of the
things I would like to be working on at [inaudible].
So I told you of the Bluetooth driver story. What I didn't tell you is why those
bugs were missed.
So this was KISS, so it did analysis for two context switches. This was a tool that
go up to more context switches, so they considered four and found a new bug.
This considered two stopper processes, whereas both these previous work
considered this one stopper process. And they were able to find another bug.
So what this -- this is one limitation context bounded analysis is always going to
have is that you're always going to miss bugs. There's always potential of
missing bugs.
And what I would like to do is even for -- my hypothesis is that even for doing full
verification, context bounded analysis is a good approach in getting there.
So there are a few observations that sort of lead me to believe this. The first is a
study that Shaz did in the early IO7, which was to say that doing context
bounded analysis, you cover a large portion of the interesting behavior in a few
context switches.
So if you have a handle on most of the behaviors, can you make a guess on the
set of all behaviors.
And the second observation we made is that mutual exclusion
invariants can often be found just by looking at behaviors with two context
switches. And I'll give you an example in the next slide.
But the hypothesis here is that you're going to use context bounded analysis, get
to know something about how the program is behaving, then generalize from that
and get an abstraction for doing full verification of the [inaudible] program.
So let's look at how CBA can help in mutual exclusion. So this is a program in
which the access to X is always predicted lock lck. And if you look this, the sort
of invariant we want to prove is -- or the negation of the invariant is that a data
race can occur whenever a different thread, when it's started with the lock
acquired, then it can eventually access x.
So if you start a different thread with the lock acquired and it accesses x, that
means there can be a data race in this program. Or you start the thread with lock
equals zero, then it's able to access x and keep lock to be zero. So this is sort of
the negation of the invariant that says you can only access lock -- you can only
access x when lock is acquired.
>>: I don't understand what you're saying here. Is that [inaudible].
>> Akash Lal: Definitely one thread, yeah. So we know what this thread looks
like. And we want to know when can there still be data races on this thread, not
knowing what the other threads are.
So this is the obligations on the other threads. So there will be a data race if the
other thread, when it's started with a value lock equals 1, then it's able to access
x.
>>: Why would there be data race on that thread?
>> Akash Lal: Because the execution of T1, you can get to this point, the lock is
acquired, then you switch to the next thread, and then because it satisfies this,
it's going to access x, and then you access x here and then you have a data
race.
>>: Okay.
>> Akash Lal: And similarly, the other thing here is that you context switch here,
so the value of lock is zero, then the second thread accesses x but keeps the
value of lock equals zero, then you can come here and then you can access x.
So these are the things that you would want to find out looking at this program.
And context bounded analysis allows you to find this, and that is as follows.
So we consider T1S, so we know the source code of T1, so we consider T1S
with K equals 2. We're giving it T1S, T1 two chances to run, and we're going
consider the set of its behaviors from the start of the execution until the point
when it accesses x.
And because K equals 2, T1S operates on 2 copies of the shared memory. In
this case the only shared thing is lock. So we have two copies of the variable
lock, and so initially the value of lock is zero. And then we want to fill in these
question marks.
So basically we can start doing the summary from start to access of x, and we
look at what the summary looks like.
So if you run the analysis, you're going to find two things in the summary. One is
this, which says the value of lock is zero, it remains zero. There's a context
switch, lock is zero, and then lock is one, which is actually this invariant.
Because it's saying that -- so this thread can get to an access of x. When there
was a context switch, that then chained the value of lock.
So there will be a data race when the other thread can access x starting in log
equals zero remains log equals one. So this basically is putting in an assumption
of the next thread. If the next thread can do this, then there will be a data race.
And the other thing we find in the summary is this, that the execution of T1S goes
from zero to 1, then there's a context switch in some state, and that's supposed
to be this. So basically it means that if we start the [inaudible] thread with lock
equals one, no matter how it changes the value of lock, if it's still able to access
x, then there's going to be a data race.
So the thing to take out here is by looking at T1S, you know, the obligations of
what the other threads should be -- can do to reach error. So you get an order
approximation of what things can force you to an error, and once you have these
invariants, and the idea is to abstract, construct this sort of predicate abstraction
based proof where we just abstract the other threads using just these invariants.
It's sufficient to check these invariants on the other thread, and then we know
that there are no data races.
>>: But what I don't understand is that you are trying to find some assumption on
the environment that is going to help you prove the property locally.
>> Akash Lal: Yes.
>>: What prevents you from [inaudible]?
>> Akash Lal: I don't know the answer to that. So the summary we construct
here is a state of all behaviors of T1. Right? So we're considering all behaviors
of T1. And the things we find from here, these two things, they're sufficient to
rule out all bugs for two context switches.
>>: So what you're saying is that maybe you are trying to find out the most
[inaudible].
>> Akash Lal: Right.
>>: And then you're saying there's a minimum constraint, the weakest constraint
on it somehow.
>> Akash Lal: Right.
>>: Okay. I see.
>> Akash Lal: So the thing to take here is when you do find this weak invariant, it
sort of can give you the right thing. But the only thing that it ensures is that if you
do enforce this on the other thread, then you're ruling out bugs for two context
switches. But the hope is that they're still [inaudible] enough that they're going to
allow you to verify for any number of context switches.
So it's basically you find assumptions using an
under-approximation and those assumptions just happen to be, like, for any
number of context switches.
>>: [inaudible] write an assumption pretending that there are lots of context
switches. I mean, when you come up with it, right, you will have, like, X0, X1, X2,
something like that, some finite number of variable, but [inaudible] ->> Akash Lal: Right.
>>: -- [inaudible].
>> Akash Lal: In this example there's sort of just one context switch here
because you're considering just one run. But you sort of look at what's important
here, lock equals one, lock equals zero, and then you use those predicates to
abstract the program and hopefully that should work.
>>: Okay.
>> Akash Lal: Maybe.
So this was a simple example in which we had a simple variable lock, but the
same thing applies in a more complicated setting where in this case there are
two variables, old and state, and they together enforce mutual exclusion on the
access of variable x.
So this is Decker's [phonetic] mutual exclusion, and basically there's sort of an
implicit lock, and the lock is acquired when the value of old is 0 and the value of
state is one.
So the same reasoning also applies in more complicated settings when you're
not really sure what the locking invariant is.
All right. So something else I want you to do, which has already been done
partially, is to apply the context bounded analysis on low level code. And one
thing that's missing in the reduction that does not allow us to do this is that you
still need to know what is shared and what is local. Because if you assume that
everything is shared, then you're not really going to have an analysis that's going
to stay scaled. So you need to have knowledge about as much state that is
local. And for programs like binaries, it's not really clear what's shared and
what's local.
So one of the other things that bugs me a lot is writing stubs for the operating
system. All static tools will have some stubs for modeling the operating system,
modeling the environment or some such thing, and they are never published.
You never know which ones are the right ones. But they're extremely relevant for
the performance of the tool.
So what I'd like to do is to automate stub generation in some sense so that
everyone -- to sort of have a benchmark of which stubs are relevant for which
kind of properties so that everyone can use the same stubs.
And the way -- the challenges here is as follows. So like, for example, let's
suppose this is the code for writing to a file. This is the actual operating system
code that writes to a file.
And supposing we're only interested in the property of knowing the size of that
file. So from this code, and knowing the property, we would like to produce this
code, there are only these increments to evaluate the length of the file.
If you're only interested in whether a program produces a UNIX or a DOS file,
then you're just interested in whether the input buffer contains a return, and in
that case it's a UNIX file or a DOS file, whichever is the case.
So the challenges here is, this is sort of like doing partial evaluation with respect
to a property and also instrumentation. So in this case this was a new flag that
was [inaudible] saying whether file was a UNIX file or not. So you want to do
instrumentation of the property and you want to do partial evaluation. And the
goal is to produce a program that's sort of as deterministic as possible.
So it's not just abstraction, but you want to have some certain more properties so
that the stubs are actually useful when you use them.
So the big thing, I think, is to know what we want to do after we do verification.
So there has been a lot of work in ranking bug reports that verification tool
produces. We know that a user can't go through all bug reports, so there are
heuristics that sort of bug report according to elements and then you want to look
at the top few.
So there's been a lot of work in that area. But there has not been enough work to
say what happens when a verification tool comes back and says that, yes, your
property does hold.
In that case what happens is the user just says, well, I already knew that the
property held and moves on. So there's nothing much gained in that setting.
And what we would like to do is have a way of making the information persist
through different verification runs. Even when you have proved a program to be
correct, you want that thing to persist in the program either in the form of some
language that humans can understand or in the form of a language that
machines can understand so that the next time you do verification with the same
program, you know something about what was there in the original program.
And then this has interesting things, like do we actually know how to write proofs
for different programs, is there a good way of writing them. Most of the tools right
now just do an exploration of all reachable states, and even though they can
prove properties, it's not really a proof of the program. It's not something that you
can write down.
So there has to be some work on minimizing proofs, finding something that
corresponds to loop invariants for concurrent programs, and so on.
In certain settings when you know that all a proof is doing is that it's proving initial
exclusion, then it's just easier to write it as saying that, look, these things are -there's a mutual exclusion there.
And what I believe is that this -- if you know how to write concurrent programs,
this can also influence future development work for writing concurrent programs
and also for designing languages that allow you to write better concurrent
programs.
And that's it. Thank you.
>> Shaz Qadeer: Questions? More questions?
>> Akash Lal: Yes.
>>: I note that the agenda on your last slide of this one, the previous one, goes
against all your talk here where you were reducing concurrent analysis,
sequential programs. So what's wrong with that for being [inaudible]?
>> Akash Lal: Well, the thing is, too, we're not able to produce a proof of the
concurrent program. So, again, what we're doing is -- analysis is only -- is
exploring all the set of reachable states. And there is no way of having that
processed, no way of communicating that to the user. Even when I'm debugging
the tool, that proof is completely useless to me. I have no idea whether the set is
actually correct or incorrect.
And if there's a way of -- once you do prove that a program is correct, you can
just tinker around the proof and find out some nice properties of what it's actually
trying to establish, then there might be a better hope of ->>: But there's a larger problem, which is I think an economic problem. I think
that to some extent in the program analysis community, we have managed to put
a lot of value on bug reports. And therefore, there's this whole, you know, like,
research and industry that has evolved around it.
>> Akash Lal: Uh-huh.
>>: But to my knowledge, I don't know if anybody has managed to put an
economic value on the proofs.
>> Akash Lal: Right.
>>: I mean, I think that there's a lot of people who believe that, but I don't think
that in the widespread software engineering community anybody believes that
there's any value in proofs.
>> Akash Lal: Would you say the same thing holds for sequential program as
well?
>>: No. I mean, there's a broader issue [inaudible] concurrence.
>> Akash Lal: Okay.
>>: Because, you know, the attitude is exactly this. Your tool comes up and
says, you know, verify it. What now? The programmer already [inaudible] was
correct.
>> Akash Lal: Right.
>>: What extrapolation do you get [inaudible] the programmer?
>>: Because the [inaudible]. So in that case we have no feedback. The tool
wasn't able [inaudible].
>> Akash Lal: But there is one difference between sequential and concurrent
programs is that at least, you know, it's sort of well established that for sequential
programs, what you need is pre, post conditions and loop invariants. And some
extrapolation [inaudible] based tools are going to give you that. And that's
helpful. And other analysis tools build on that. They know all they need to find is
pre, post conditions and loop invariants.
But there's no such corresponding notion in concurrent programs. Once we can
get a handle on what the proofs look like, maybe we can make more progress
there.
>>: [inaudible].
>>: [inaudible].
>>: No, this is a quantitative information.
>>: That's right. I guess in your analysis the [inaudible] of your analysis is not
that oh, your program works pretty good [laughter].
>>: [inaudible] the only thing I feel proofs is going to enable is automated
programs [inaudible].
>>: Right. So -- but so far we're saying that what you have said is that proofs
are good as [inaudible] consumed by other program analyses, right? And that is
good. But the number of people who write program analyses and explore them
is very small, right? Most people write software that does something, and there's
a larger class of people who don't even write software, they just use software.
And I don't know, I mean, can we somehow set proofs for them? That's what I
[inaudible].
>>: The other economic argument is [inaudible] all of programming, but I'm sure
there are specific areas where people are ready to pay money to get some sort
of certification of correctness. Like if you think of a pacemaker or if you think of
security critical components of a system. So I think the economic argument is,
you know, if we actually could do this, then people would pay for it, I think.
>>: Yeah, so the -- I mean, the application that I was [inaudible] and you are
willing to pack it up with some warrant that this app is certifiably not [inaudible].
>>: But what prevent somebody from fraud -- from defrauding users? You know,
I can just build some [inaudible] and say oh, yeah, I've certified it.
>>: But that's assertion problem. I mean if the developers are the [inaudible]
agency understands what proofs are and [inaudible]. And if there is a regulation,
that won't happen. For example, for the pacemaker, the FDA approves the
pacemaker, and if you have a pacemaker with software in it, then that software
has to be certified. Currently the certification process basically relies on software
engineering principles and basically you have to follow some particular model of
software development, and it's not [inaudible]. But one can easily [inaudible]
that, you know, you switch from process to proofs. And the [inaudible] is
definitely one area [inaudible].
>>: And also, I mean, [inaudible] I mean of the next step. We'd love to have a
verification tool or something close to it and [inaudible] because testing, when
you stop testing [inaudible] and so [inaudible], including sage, but it costs a
fortune, and you never when to stop. It's a huge problem for [inaudible]
something even appropriate verification. It's true. I agree with your comment.
We're still very early. But we need -- and there's more value [inaudible], I agree
with you, but ->>: Actually, I think it's a little worse than that. For example, I think that as a
community, there is no widespread notion -- there's no common ground for
saying what a proof is. Basically every tool that people come up with will have its
own notion of proof, have proofs of them. And no proof-driven tool will ever
agree, as far as I know, on what is proof is. Because there will be all sorts of
random assumptions [inaudible] because you need those [inaudible]. So I think
that to make [inaudible] was talking about a reality, there has to be a foundation
so everybody, no matter what technique you use, there has to be some notion of
what a proof is.
>>: It could be a certificate. I declare whatever. You have your black box, your
tool. You say there are no bugs of that type in this stuff. That could be the
[inaudible] proof. And then whoever wants can challenge that, right? That's your
certificate. Now, how about a proof? Now, I give you, let's say a book basically
[inaudible] that nobody would ever look at or I could [inaudible]. So I don't know
->>: What is the certifying agency going to look at?
>>: [inaudible].
>>: Machine checkable proof.
>>: But, I mean, of course, it's [inaudible]. I was telling you about that lock list
[inaudible]. So I don't think -- we cannot write a proof. Forget about the proof. I
don't think anyone can [inaudible] a proof of why that [inaudible] is correct. And
so when there is concurrency [inaudible] we really don't understand the correct
proofs of really intricate [inaudible] mutual exclusion is -- I think that's from
[inaudible] and now maybe we understand how correct the proofs are.
>> Shaz Qadeer: Okay.
[applause]
Download