>> Shaz Qadeer: Okay. Welcome everybody. It's... Alastair Donaldson. He's visiting us today and tomorrow from...

advertisement
>> Shaz Qadeer: Okay. Welcome everybody. It's my pleasure to introduce
Alastair Donaldson. He's visiting us today and tomorrow from Oxford University.
Alastair got his PhD from the University of Glasgow and then he worked for a few
years in the industry at Codeplay Software wording on multicore compilation.
After that he decided to come back to academia. He's now post-doc fellow at the
University of Oxford. And he's going to tell us today about some work he's been
doing on verification of concurrent programs.
>> Alastair Donaldson: Thank you very much for the introduction. During the
talk please feel free to ask questions and interrupt me. I'm sure that what's you
do anyway. So, yeah. That's what I like.
So this is work on applying predicate abstraction to replicated concurrent
programs and trying to do that efficiently by taking advantage of symmetry. And
it's joint work with Daniel Kroening, Alexander Kaiser and mostly Thomas Wahl at
the University of Oxford.
So as you very well know, one of the major former methods success stories has
been the SLAM project. So taking a load of really well understood formal
verification techniques, theory proving techniques, combining them with novel
techniques for making this work on real C programs, yeah, I'm building a tool that
can be used to verify device drivers.
But despite the success, I think it's fair to say that there's not been very much
progress on applying predicate abstraction to shared-variable concurrent
programs. And the reason for this is state -- space explosion. So abstraction
may be more expensive if you have multiple threads and also the verification of a
concurrent Boolean program becomes intractable as the number of threads
increase. Of.
So in this work we've been contribute to go this situation by dining a scalable
predicate abstraction and CEGAR-based model checker which is geared towards
verifying replicated C programs. And we achieve scalability by exploiting the
replicated structure ever these programs using symmetry reduction, and in
particular building on recent work by my collaborators, Thomas Wahl and Daniel
Kroening and also Gerard Basler who was an intern here some years ago on
doing symmetry reduction for symbolic model checking using a technique called
symbolic counter abstraction.
So I'm not going to tell you about symbolic counter abstraction during this
presentation, but that's the technology that makes this work well. As well as the
novel abstraction technique that I'll show you.
>>: Counter [inaudible].
>> Alastair Donaldson: Sorry?
>>: What does counter --
>> Alastair Donaldson: Oh, counter abstraction. Well, I'm going to -- I will
explain that later.
>>: [inaudible] counter that comes up or ->> Alastair Donaldson: So this is -- it's this thing I showed you on the board
where you count the number of processes in each state.
>>: Okay.
>> Alastair Donaldson: So ->>: [inaudible].
>>: [inaudible].
>>: It's counting abstraction. Okay.
>> Alastair Donaldson: Yeah. Well, it's -- I guess maybe that would be a better
name. In the literature it's referred to as counter abstraction. And it's an exact
method. So it doesn't lose precision, despite being an abstraction it doesn't lose
precision. And I'll discuss that a bit more later.
So the course and programs we check don't need to read this program in too
much detail. But what I mean by replicated program is we have some loop that
launches an unknown number of threads and we say that this number is going to
be bounded. So our model check would just stop if it launches more than five
threads if you say the bound is five. And then all threads run the same program.
This is an example of building a lot using test and set instructions. And I'll talk a
little bit more about this example later on.
So all threads are going to run the same program and you'll see there's no use of
thread identity in the code for these threads. So this is the sort of programs we
are thinking about. So a little more specifically. In our model of computation we
assume no recursion. And in this work we inline all procedures. And I'll talk later
about ways we might lift these restrictions to some extent. So, yeah, sorry.
>>: Okay. Why should we do symbolic [inaudible] concurrent programs and
tools that do [inaudible].
>> Alastair Donaldson: So ->>: [inaudible].
>> Alastair Donaldson: If your program has data like integers, then explicit state
exploration won't be able to provide you with full state-space coverage.
>>: And then the interplay between data and concurrent [inaudible].
>> Alastair Donaldson: Well, I suppose that would depend on individual
examples, but -- so I mean if I gave this to like spin, unless you did a manual
extraction to explicitly remove all data, then spin would just get stuck exploring
like data value after data value after data value. It wouldn't necessarily -wouldn't necessarily find any useful bugs or show that no bugs exist.
So if there is lots of interplay between control flow and data values then actually
predicate abstraction probably wouldn't work very well either on these examples.
The idea is predicate abstraction gets rid of the data problem.
>>: [inaudible] SLAM is partially successful [inaudible] all these new techniques
but also because it had device drivers as the motivating example. So I guess
one question is -- once your [inaudible] example of this sort of [inaudible] system
->> Alastair Donaldson: Yeah, so ->>: Template.
>> Alastair Donaldson: So the examples that we found our approach [inaudible]
are actually lock-free data structures. So in this building a lock using atomic
instructions. So we -- yeah, we can check the -- these assertions won't fail and
we don't care too much about the actual context in which this lock is being used,
right?
>>: So the idea is that your main routine is really sort of a harness where you're
simulating an unknown number of clients of essentially a passive library, a library
that might do synchronization but it's not creating threads itself?
>> Alastair Donaldson: Exactly. Well, that is the benchmarks that we've tried
this technique on is exactly the situation. So if you had a larger concurrent
program with many other threads doing different things, then you might need to
extract something from that concurrent program for this technique to tell you
something meaningful.
>>: I see.
>> Alastair Donaldson: Yeah. So okay. So we're assuming this sort of program
structure. We inline all procedures. And then we have some restrictions on
pointers. So we allow pointers -- within a thread we allow pointers between
variables of the thread. We allow pointers from a thread's local variables into the
shared states. And we allow pointers between the shared states. But in this
work we don't allow pointers from the shared state into a thread's local state.
The reason being that this would give us -- this would break our asynchronous
model of computation where threads proceed by modifying their own local state
and the shared state. I feel like these kind of pointers then a thread could use
such a pointer to directly manipulate another thread state. And by barring these
sorts of pointers, we also bar pointers between the local states or threads
because you could only get such a pointer by communicating it through the
shared states.
So we haven't found this restriction to be a problem in practice. This would
correspond really to give away an address of a stack rarely to the shared state
which some programmers will do. But it's I would say generally regarded as
quite bad practice.
>>: Since the local state is only stack states ->> Alastair Donaldson: So in this work, yeah, local state is stack state. We
would regard the heap as being shared state.
>>: Specifically you [inaudible].
>> Alastair Donaldson: Yeah, you could allocate some of your private space.
>>: [inaudible].
>> Alastair Donaldson: So you could allocate your -- you could allocate memory
in the stack and use that in a thread local way. So with our current tool, the tool
would just assume a shared state. So there has been like some interesting
research on trying to do analysis of programs to determine which variables are
shared and which variables are local, depending on the way they're used.
>>: [inaudible] synchronous program you [inaudible] that's allocated [inaudible]
thread to thread [inaudible].
>> Alastair Donaldson: So you pass something from one thread to another?
>>: Yes.
>> Alastair Donaldson: And then would you then regard it once you passed it
would you ->>: [inaudible] to that thread.
>> Alastair Donaldson: Then you would still regard it as part of the original
thread's local state.
>>: If it's local to [inaudible].
>> Alastair Donaldson: Okay. But once you pass it to another thread, did you
regard it as being local to that other thread. Yes. So I mean I'm not sure if you
could model that directly in C and have our tool work successfully on it. But
essentially that's not the problem we're trying to avoid. We're trying to avoid the
problem of one thread being able to directly change the state of another thread.
>>: But he can model it, right, because that will be allocated on the heap and ->>: [inaudible] shared state.
>>: It would be shared state.
>> Alastair Donaldson: Yes. So we would just treat it as shared state.
>>: Yeah.
>> Alastair Donaldson: So, yeah, maybe after the presentation we could talk
through an example that would do that and ->>: [inaudible].
>> Alastair Donaldson: No, no. I'm delighted to be. So, yeah. [laughter]. One
of the things I would -- so Tom asked the question about what kind of examples
are we looking at? And we have a class of examples where this method is
useful. We would be very interested in trying to expand that class of examples.
And if you have ideas on that, that would be great.
Okay. So then and then also we're assuming a strong memory model and we're
assuming in this statement-level granularity so threads interleave at statement
level which is obviously unrealistic. You can avoid this problem by preprocessing
your input essentially into three address codes. This strong memory model
problem we're looking forward to dealing with in future work. We have a new
researcher, Jad Algave [phonetic] who is an expert in memory models. So I'm
hoping to collaborate with her on extending this to ->>: [inaudible].
>> Alastair Donaldson: Yeah. Okay. So now I'm going to do a very quick recap
of the Cartesian abstraction. I'm not sure if this is necessary for this audience. I
wasn't sure if there would be many more people here. But I'll go through it fairly
quickly.
So if we have a set of predicates phi 1 to phi M, what we're going to do is
abstract the program. This is sequential predicate abstraction. We're going to
abstract a program to produce a Boolean program where variables B1 up to BM
track these predicates.
So by F of phi I denote the best approximation of the expression phi over our
predicates. So F strengthens phi to the weakest thing we can express over -sorry, strengthens psi to the weakest thing we can express over the phi such that
F of psi will imply psi.
And then what choose A, B means is if A is true then 1, else if B is true then so
else star where star is the known deterministic expression.
And the effect of an assignment statement st on a predicate phi is defined -- is
abstracted as a choose between the weakest precondition for phi to hold after
the statement has been executed but strengthened over our predicates or the
weakest precondition for not phi to hold but strengthen over our predicates. So
this is like the best thing we can say about phi's new value with our current
predicates. But considering phi in isolation from other predicates.
So we turn a statement into parallel assignment to our Boolean variables doing
this choice for each predicate. So in this work we are not actually restricted to
the Cartesian abstraction. We could have phrased this work in terms of
existential abstraction generally but our motivation for this is that we've written a
paper on this work that we've sent to CAV, and we'll like this paper to be
readable and compatible with the seminal work on predicate abstraction from 10
years ago which introduces the Cartesian abstraction so we'd like it -- someone
to read this paper and then read our paper and the notation to be enough, the
notation from this paper to be enough to understand what we've done.
Okay. So the goal of our work is we have a template program P and an integer
N and we want -- what we want to do is to check that the parallel composition of
N copies of P is correct. And by correct I mean that no assertions in the template
can be violated. So one approach to doing this using existing techniques would
be to build the program PN directly, abstract it and then check the abstraction.
So let's think about how this would work. Suppose we had this very simple
program here and I'm going to write my programs in this simple form where I
define shared and local variables and then have some statements, rather than
giving you whole fragments of C for reasons of space on the slides.
So this obviously incorrect for more than one thread program, a shared-variable
S, a local variable L. We assert that they're different in increment S. So clearly
for more than one thread this program is incorrect. And we had this predicate
that S and L are different then all we could do is we could expand this program
by multiplying out the threads.
So we have the single-shared-variable and then we have two instantiations of the
template. So we get a local variable L1, a local variable L2. And two different
assertions. And we also would expand the predicate. So we have a predicate
that says S doesn't equal L1 and a predicate that says S doesn't equal L2.
And then we could apply predicate abstraction directly to this program to get the
program alpha P squared. So this is the program with two threads. Practice we
would use a Boolean variable B1 to track the predicate S not equal to L1,
Boolean B2 subtract the corresponding predicate for L2. And then we would get
this parallel assignment corresponding to the updates to S.
So if S is not equal to L1, then we don't know whether it will still not be equal to
L1 after the assignment. But if they are equal, then they definitely won't be equal
after the assignment. So we get 1 here. So does this make sense so far? It's a
straightforward application of regular predicate abstraction.
And then we could check this. So it would be easy to turn this program into a
sequential Boolean program by simulating concurrency with nondeterminism.
And we could use a model check. I like Bebop or we could use SMV with a
suitable transformation from Boolean programs into the SMV language to check
this program for two threads.
So we would find verification fails.
>>: [inaudible].
>> Alastair Donaldson: So this thing of turning a Boolean program into an SMV
program. So you just model the program as a variable, yeah, and then you just
have like separate variables for all the threads. So all the ->>: So it would be basically a big gigantic ->> Alastair Donaldson: Yeah, a big gigantic loop.
>>: Loop.
>> Alastair Donaldson: Monolithic loop.
>>: [inaudible].
>> Alastair Donaldson: Yeah, yeah.
>>: There's no recursions.
>>: I see. No loop?
>> Alastair Donaldson: Yeah.
>>: Okay.
>> Alastair Donaldson: Well, you could have loops but ->>: You can have loops but all procedure [inaudible].
>> Alastair Donaldson: So in this [inaudible] we've ->>: [inaudible].
>> Alastair Donaldson: Yeah. If you wanted to do smart things with procedure
codes than this might be more tricky.
>>: So the conduct is exactly the same, the two threads update two shared
variables.
>> Alastair Donaldson: So in the original program, the two threads both update
the shared-variable S. Yeah. And that means that the assertion will fail for the -after one thread does the update then the assertion will fail for the other thread.
Yeah? Okay. So --
>>: [inaudible].
>> Alastair Donaldson: Yeah. So I mean if both threads do the assertion, then
things will be okay.
>>: So verification failed means counterexample ->> Alastair Donaldson: Counterexample, yes. So here we would get a
counterexample in the abstract program. And if we simulate this counter
example, we'll find it's genuine.
>>: So it's a success?
>> Alastair Donaldson: So it's a success, yes. Yes. Okay. So what are the pros
and cons of this work? So, yeah, the pros are it works in principle, right? So this
is a way we can do verification of these kinds of programs. And we can use
existing techniques essentially directly.
But the problem is it doesn't scale. And we've got experiments later that show
this. So there are two problems. One is the scalability of the abstraction. So
suppose we've got K predicates over our template program P. Then potentially
we're going to end up with a separate version of each predicate for every thread.
So if you had a predicate just of shared variables then we wouldn't multiple that
predicate for every thread. But the predicate S not equal to L, we've got two
predicates corresponding to that predicate. Right. So when we perform
abstraction, even if we're abstracting thread 1, we have the predicates of related
to all the threads available to us when computing the abstraction.
And in this example, we made use of that. So if I go back to the code for, yeah,
here you can see that in thread 1 year using both B1 and B2, which are the
versions of the predicate for thread 1 and thread 2. Okay?
So abstraction is expensive. And if we multiple the number of predicates it
becomes more expensive. And I guess a less important problem but maybe still
worth noting is that if we had different values of the number of threads that we
care about, then we would have to do different abstractions and check them
separately. We can't like do one abstraction and use it for multiple thread counts.
>>: But given your assumptions you should be able to just have a template
[inaudible].
>> Alastair Donaldson: So you mean do abstraction at the template level?
>>: Well, I mean you have this one piece of source code that you know is going
to be the main for all threads.
>> Alastair Donaldson: Yeah.
>>: Right. So the local variables it's all parametric.
>> Alastair Donaldson: Yeah. So but the thing is that if you multiple the program
and then multiple the predicates then you abstract a thread with respect to ->>: This is where you do the explicit composition.
>> Alastair Donaldson: So in this work, we're not doing parameterized model
tracking, right. We're not trying to share this for an [inaudible] we're trying to
share this for the program where up to some fixed number of threads are
launched.
>>: Oh, so I -- so you have a parallel composition that you actually perform to
create the program?
>> Alastair Donaldson: Yeah. That's the -- that's not where I'm going to propose
we actually do, but that's -- this is what we could do.
>>: This is in the context of that.
>>: Yes. Those assumptions.
>> Alastair Donaldson: And then the other problem is that it's not feasible to
check this program alpha P to the pi and for large thread counts. So we get
[inaudible] exposure because of concurrent thread interleavings.
So we refer to this method as symmetry-oblivious predicate abstraction. So
we're basically ignoring symmetry here. And what I'm going to show you next I'm
going to propose a method that takes advantage of symmetry to do this in a
better way.
So potentially more natural approach. Well, this template program P is not an
executable program but it's a program nevertheless. So what if we could
abstract P directly at the level of the template so to get an abstraction alpha
prime P. So you say alpha prime not alpha because the abstraction we do isn't
going to be exactly the same as what we would have done in the previous slides.
So this will hopefully be cheaper to compute because we haven't blown up the
number of predicates that we're abstracting over.
What we'd like to do is to do this so that when we then take the parallel
composition of the resulting Boolean program N times we get something that
overapproximates the parallel composition of the template N times. But because
we're working at the template level we should then be able to exploit recent
techniques on model checking replicated Boolean programs that exploit
symmetry to do the model checking efficiently.
And in addition we then can just abstract this program P once to get alpha prime
P and then we can try alpha prime P with various thread counts. So we don't
have to abstract a separate program for each thread count we're interested in.
So I wouldn't be telling you about this unless the answer to this question was yes,
you can do this, and this is what we call symmetry aware predicate abstraction.
And it's what I'll tell you about during the rest of the talk.
So any questions up to this point?
>>: [inaudible].
>> Alastair Donaldson: Yeah?
>>: [inaudible] basic question. My understanding that you were doing bounded
verification [inaudible].
[brief talking over].
>>: [inaudible] the loops you unroll them?
>> Alastair Donaldson: No, no, we don't unroll loops.
>>: [inaudible].
>> Alastair Donaldson: In a Boolean program there can be loops, right. Yeah.
So we're bounding the number of threads that get created but we're not bounding
the number of contact switches. We're not bounding the depth that we search to.
Yeah. But we don't have recursion.
Okay. So a quick overview of symmetry reduction, in case you're not familiar
with it. So for verifying this replicated program, suppose M was 9, right, and let's
just ignore shared state for this -- for the purposes of this example.
Suppose we've checked some state where the threads are in this configuration
so 1 and 2 are in state A, 3, 4, and 5 are in state B, et cetera. So if we've
checked that this state is safe, then because these threads are isomorphic we
don't need to check. For example the state where the identities of processes two
and three are being flipped. So because these states are permutations of one
another, if we've checked one, we don't need to check the other. Okay.
And similarly, we wouldn't need to check this state here where our processes 6
and 7 have been flipped. Or this one, for example. And actually the model
checker, the symmetry exploited model checker that we use as a back end for
our work, which I'm not going to give you details of, uses the technique called
counterexample. Called counter abstraction. So this whole equivalence class of
states would be represented by this counter abstract state here. So we say there
are four processes in local state A, four in local state B, and one in local state C.
So we abstract -- it's abstraction because we abstract the way identity. But an
important point about counter abstraction, yes, so symmetry reduction can give
you a very large reduction in the size of the state space you need to search.
So symmetry reduction gives you a bisimilar quotient structure. So it gives you -you're checking something bisimilar to the unreduced structure. And counter
abstraction is a method of implementing symmetry reduction. And it also gives
you bisimulation. So although we call it abstraction, we're not introducing any
further overapproximation. So in this work we go from a program template to a
concurrent -- to a Boolean program template. We expand that and then we
model check it using counter abstraction. We lose precision when we do the
abstraction. But we don't do more precision because we're using this technique
called counter abstraction in the model checking phase. Of.
I gave a rehearsal of this talk on someone, yeah, raised that point which I guess
because I'm so into symmetry I didn't think about.
Okay. So the idea is we're going to take this template program P, abstract it to
get alpha prime P, expand this, and we're going to do it in such a way that our
expansion simulates the original program's expansion. But actually we're going
to check the symmetry quotient of our expanded Boolean program and because
this is bisimilar this is a sound thing to do.
All right. So I nearly had a heart attack when I looked through the POPL
proceedings and saw a paper with this title, Predicate Abstraction and
Refinement for Verifying Multithreaded Programs. And I thought, oh, no,
someone's doing something very similar to us. And then I read the paper and it's
a very nice paper, but it concludes with this very encouraging statement saying:
Another technique to fight state explosion is to factor out redundancy due to
thread replication as proposed in counter abstraction implemented in the model
checker BOOM, which is our tool. We view these techniques as paramount in
obtaining practical multithreaded verifiers.
So, yeah, this heart attack thrown into a good feeling of, yeah, other people think
that this is a good thing to be working on.
All right. So before I tell you about our approach, I just want to give you an
indication of how many we had to change the CEGAR loop to make this work.
So almost everything I'm going to tell you is related to computing the abstraction.
We have a novel technique to do predicate abstraction at the template level.
Then we had to adapt our BOOM model checker quite significantly into what we
call B-BOOM because it needs to perform broadcasts as we'll see in what
follows. I'm not going to tell about you the details of how we adapt to B-BOOM
but this is a significant piece of work.
Checking feasibility of counter examples required almost no modification. And in
the moment our tool only refines the abstraction by adding predicates. So this is
also very straightforward. And what we're working on now is constrain style
refinement to make the abstraction with a given number of predicates more
precise. And this is actually not so straightforward in our template level setting,
but on the plane here I think I figured out how to do it. And it involves significant
work but is absolutely doable. So this is the state of things. Well, this last bit isn't
the state of things, but that's the state of things.
Okay. So let's think about how we would do this template level predicate
abstraction. So let's consider this simple example here where we have a local
variable L, a shared-variable S, a shared-variable T, which I think I should
remove because I don't use it in the example, yeah, and we're not actually -- I
don't care about this program, but the -- I suppose we've got two predicates.
Yeah. Sorry. We do have a predicate S is equal to T. That's why I need T. So
we've got this predicate S is equal to T and a predicate L is equal to four.
And what we want to do is to turn this into a Boolean program. So we can
abstract the statements directly using Cartesian abstraction. And clearly we're
going to need a Boolean to represent each of these predicates. So now the
question is because we're going to expand this to a concurrent Boolean program,
the Boolean variables need to have a scope. So they need to be local or they
need to be shared so in this example, what do we want to do for these
predicates? Well, I think it's pretty obvious that we would want the predicate S
equals T over shared variables to be a shared-variable, and we want the
predicate just over local state L equals four to be represented by a local variable.
Okay?
But what about this example here, where we have variables S and L? S is
shared and L is local. And I'll use that convention throughout the talk. And then
we have a predicate -- yeah, this is the example we saw before that's incorrect
for more than one thread. And we have this predicate S is not equal to L.
So we build the Boolean program as I showed you before, except we're doing it
just at the template level. And we have this Boolean variable that tracks the
predicate S not equal to L. And now the question is should we make this variable
shared or local? And it's not immediately clear what we should do or it wasn't
clear to me initially because this variable refers to both shared and local state.
So what if we make it local? Well, the problem is that now if you look at this
Boolean program for any number of threads we would say this program is
correct, right, because this predicate is initially true. And then we can't set it to
false. Right? So if it's true, it will remain true.
So this is not a sound thing to do because we would deduce the -- our original
program was correct if we regarded this as a sound abstraction when we know
it's not. So clearly we cannot just represent these mixed predicates using local
variables.
What about representing them -- oh, yeah, in this example if we made this
shared-variable then we would correctly deduce that the Boolean program is
wrong. Okay?
What if we instead decided to represent these mixed predicates using shared
variables? Well, this is an example where this wouldn't work either. It's a bit
more of an intricate example. We have shared-variable S, a shared Boolean flag
F and a local integer L. And a thread can either go into this condition here where
if the flag is true we assert that S and L are different, okay? So if you imagine
that one thread skips over here and performs this update then S and L will be
different -- sorry. Let me just think of -- S and L will be the same for any other
threads, right? Know. I'm getting myself confused here. Yeah.
>>: [inaudible].
>> Alastair Donaldson: Let me look at the abstraction. Okay. Basically I think I
might have made a mistake in this example, but it's very easy to construct an
example where if you represent these mixed products using shared variables
then you get unsoundness in the other direction. So if you don't mind, I'm going
to skip over this, because I don't want to spend ages figuring out the details. I
think I've made a ->>: [inaudible] you introduce the Cartesian abstraction introduces ->> Alastair Donaldson: I want to use an example where if I make this mix
predicate S not equal to L be shared, then I will incorrectly claim that the abstract
program is correct when the concrete program is incorrect.
I think this example does it, but I'm sorry I prepared this [inaudible] PPOP
conference over lunch and yeah. I'll may be look at it after the talk, and if
anyone's interested I'll go through the details. Okay.
And in this example declaring this predicate to be local would work if the example
were correct. Okay. So the idea was to establish with these examples the taking
either one of these strategies of either making a mixed predicate be always local
or making a mixed predicate be always shared. Neither strategy works. Right.
And now you might ask well, okay, these examples were very contrived. If I did
this in practice, would I maybe get reasonable results for benchmarks. So these
are benchmarks that I'll tell you a bit about later. But what I want to show you
here is that if with try this approach of declaring mixed predicates to always be
local or mixed predicates to always be shared, then we get very strange results.
So here what I'm saying is that unsafe means this is a buggy version of the
example. So a correct verification result will be to say that the example is
unsafe.
So if we declare mixed predicates as local variables, we frequently get the model
checker telling us that these unsafe examples are safe, which is unsound. And if
we also declare mixed predicates as shared variables then in one case here we
find that the model checker tells us that the example is safe when it's not.
By no difference found what I mean is that we don't manage to get a conclusive
verification result and adding -- and we can't find any further predicates to add
using our predicate discovery strategy. And I say that this is an erroneous result
because with the symmetry aware approach we don't get a no difference found,
we actually get the right verification result. And a quick observation is that we'll
never have -- we'll never get told that a safe example is unsafe, right, because
we get a character example which we have to simulate over the original program,
and we're never going to find that a safe program has a counter example. Yeah?
>>: The -- suppose that you take the local state and you replicate it to make
[inaudible] instead of having two local variables you have two shared variables.
>> Alastair Donaldson: Yeah.
>>: And you just arrange it in such a way that the program one thread only
accesses the [inaudible] and the other one only access the other one.
>>: Well, in that case all of your variables are shared ->> Alastair Donaldson: Yeah.
>>: So in that case you don't have a mixed predicate, so it's clear that all of the
Booleans should be ->> Alastair Donaldson: Should be shared. So that's precisely what we did in -the first [inaudible] I showed you where we expanded the threads out separately,
yeah, treated the thread's local variables as being different variables, expanded
the predicates to be distinct predicates, yeah. Then we do the abstraction on a
thread-by-thread basis. Then everything is shared. And that's exactly what you
proposed.
But we're not doing that at the template level and therefore we can't exploit
symmetry in our model checking.
>>: But is that sound?
>> Alastair Donaldson: Yes, that's the sound thing to do, yeah.
>>: So, right. But -- how -- I mean, how can the other one not be sound. I mean
that is if you don't take -- if you're just preventing that things are more shared
than they are, I guess I'd like to see the other example [inaudible].
>> Alastair Donaldson: Okay. Yeah.
[brief talking over].
>>: Well, what's missing here is you haven't really defined your notion of your
abstraction in an individual process. Because if you take this process and it has
fixed vocabulary predicates as a [inaudible].
>> Alastair Donaldson: Yeah.
>>: [inaudible] the program. Fixed set of globals that it knows about. Now that
you start composing these guys together ->> Alastair Donaldson: Yeah.
>>: -- your global state now has N times that many global variables. If you have
made the shared predicate the shared predicates global --
>> Alastair Donaldson: Okay. So ->>: [inaudible] vocabulary has changed, right, between an instance of one
process and now my composition so you haven't said what happens is if I
execute process one, what happens to the global variables it doesn't know
about? I mean, are they staying the same when process one transitions?
>> Alastair Donaldson: So you're talking about the Boolean programs having
these global variables, right?
>>: You're saying you're locally abstracting one -- one thread.
>> Alastair Donaldson: Yeah.
>>: To Boolean program. That Boolean program has a vocabulary, has a set of
local variables and a set of shared variables.
>> Alastair Donaldson: Yeah.
>>: But its set of shared variables is not the complete set of shared variables.
>> Alastair Donaldson: So in this case, I'm proposing that it would be. So here,
this would be the Boolean program for an arbitrary thread. And this predicate
here is -- would just be a shared-variable and when you compose this program
many times you only ever have two shared variables. So ->>: So the one variable -- so you could have one copy of the variable that stands
for all.
>> Alastair Donaldson: Yeah.
>>: Even though, in fact, it's representing ->> Alastair Donaldson: And that's why this doesn't work.
[brief talking over].
>> Alastair Donaldson: And this is meant to be an example just to simulate that.
Yeah.
>>: Because you've got all these different local variables, so how can this one
shared predicate.
>> Alastair Donaldson: Exactly.
>>: [inaudible].
>> Alastair Donaldson: Or you need something else.
[brief talking over].
>> Alastair Donaldson: So maybe I could show you this -- maybe I could now
show you the solution rather than ->>: [inaudible] or something ->>: Or something ->>: Okay. Okay. So that's ->> Alastair Donaldson: So ->>: All equals through, all equals false or [inaudible].
>> Alastair Donaldson: Yeah. So basically if you made these predicates, these
mixed predicates be local variables then you don't communicate when you
should communicate. And if you make them just be represented by a simple
shared-variable, then a communication that's specific to one thread just is applied
to all threads. So neither of these approaches make sense.
Okay. So now you might ask -- I'm going to show you how we can deal with
mixed predicates in a sound way. But you might first ask, do we need them at
all? Might it be possible to rewrite our program so we never have these mixed
predicates. And it's very easy to construct an example where we do need mixed
predicates. So this example is rather contrived. The idea is that be only one
thread will be able to get into this loop here. And this thread will increase these
variables S and L and assert that they're equal, right. S is a shared-variable, L is
a local variable. So clearly we want the mixed predicate S equals L to prove this
program correct: And it can be shown that over a set of non-mixed predicates
you won't be able to compute an invariant strong enough to show this program is
correct if you assume unbounded integers with machine integers. You would
need a very, very large number of predicates tracking the value of every -- you
know, every possible integer. Okay.
And then in practice, also, let's have a look at this example of building acquire
lock using test and set. So in this example we do test and set on this lock
variable to get back a condition, right? And if the condition is locked, then we
know that the lock was already held so we do this exponential backoff and we do
this. All right. And what we want to assert is once we have successfully
acquired the lock, the condition should not be equal to locked, it should say not
lock -- this should be locked.
And this is something we would represent very naturally by mixed predicate.
Lock is a global variable, cond is a local variable. So these mixed predicates not
only do we need them in theory, but in practice they're useful for these sorts of
examples. All right. So now I'm going to explain how we handle mixed
predicates in a sound way in our symmetry aware predicate abstraction
technique. And I'm going to show you the technique if we assume no pointers
first. And then I'll show you how pointers can be slotted in. This makes the
presentation much easier and doesn't lose anything.
So suppose we have a program P and a set of predicates over the variables of P.
We want to translate this into a Boolean program B so that BN approximates PN.
And we want Boolean variables B1 up to BN as usual.
Our approach is to say that the Boolean variable BI is a shared-variable if and
only if the predicate phi I is shared. So if the predicate phi I only refers to the
shared state then the Boolean variable is a shared-variable. Otherwise we're
going to make the predicate the Boolean variable local. So in particular, we are
going to track mixed predicates in local variables. So clearly we need something
else up our sleeve because I've shown you that that alone would be sound.
So let's suppose we have an assignment V becomes equal to E, a predicate phi
and associate of to variable B and we want to work at what the effect on this
predicate should be.
So first of all, if V doesn't occur in phi then the variable B won't change. Because
remember, I'm considering no pointers here. Otherwise we need to update B in
one of three ways. So suppose V and phi are both shared, so V is a
shared-variable and phi is a shared predicate. Then we can -- so for example if
our statement is incrementing S, we have a predicate S is equal to 12, then we
would just update B according to standard predicate abstraction. So we would
do this. B equals B if B then zero L star. Okay. So because this predicate is
shared, it's in a shared-variable, it's in one place. This will be visible to all
threads. So this is what we want. Okay?
Let's suppose that V is local and phi is either local or mixed. So V is a variable
local to a thread, phi is a predicate that either is just the local variables or of a
combination of variables. And we know V occurs in this predicate phi. Then it's
sufficient just to update the predicate for the thread that executed the update
because we've changed the truth of the predicate for that thread but we clearly
haven't changed the truth of the predicate for other threads because we didn't do
a shared update.
So for example the analogous example with L instead of S. So if we had the
predicate L equals 12 then if this is true it will be false. If we had the predicate L
is equal to some shared-variable S, if it was true for this set it would be false. But
clearly it's not going to become false for any other threads or come true for any
other threads.
So the interesting case is when we have V being a shared-variable and phi a
mixed predicate. So phi is a predicate over shared and local variables. So it's
truth will be thread dependent. And V is a shared-variable. So by updating V,
we are going to change the truth of this predicate potentially for every thread.
But in the C program in the high level program one thread is goal to actually
execute this statement, right? So something that would only change the shared
state in the C program is going to change the state of many threads in the
Boolean abstraction of this program.
And we handle this using what we call a notify all update. So if we had this thing
then S plus plus a predicate that says L and S are equal, then what we do is for
the local thread we output a normal update to say that this thread's Boolean B
representing this predicate gets updated in the usual way. But we also need to
tell all the passive threads to update their local variable for this predicate
appropriately.
So we do this by introducing a new Boolean program construct called a
broadcast. So let's think about what a regular update on a local variable looks
like. Suppose thread I executes some update. Then this causes thread I's
program counter to increase and thread I's local state to change.
What I'm going to introduce is a broadcast update which is a thread executing a
statement that has no effect on the thread's local state but changes the state of
all passive threads. And once I've -- I'll show you what this looks like and then I'll
show you how we use it.
So we use this syntax square brackets V to mean this L value is going to be
changed in all passive threads but not in the active thread. So suppose we had
this state here where the active thread is I, okay, so its program count is going to
change but it's local state isn't going to change. And the local states of all the
other threads are going to change if we do a broadcast. So V in brackets. So I'll
call -- I'll call this passive V. So passive V becomes a [inaudible] something
where V is a local variable. This causes the active thread to step forward where
it is in the program and for the states of all passive threads to be updated.
Yeah?
>>: [inaudible].
>> Alastair Donaldson: This means some R value. So this is just the right-hand
side of the assignment.
>>: Can it refer to local variables at [inaudible].
>> Alastair Donaldson: So should it not. So we'll come to that. So, yes, the
answer is -- in principle it could refer to the local variables of any threads. And
we find it useful to allow it to refer to the variables of the active thread and the
variables of the passive thread. Yeah.
>>: So you're saying -- I'm confused. So why do you have this notion of that
active thread not changing its local state?
>> Alastair Donaldson: I think that will become clear when I show you an
example in the next slide.
>>: Okay.
>> Alastair Donaldson: Okay. But tell me if it's not. So this broadcasts
something to all passive threads. So --
>>: I'm sorry but fundamentally each [inaudible].
>> Alastair Donaldson: Yeah.
>>: [inaudible] understand exactly what [inaudible] passive threads are about to
execute some statement but they're all basically disabled and you're going to -you're going to do -- but you're going to sort of tell them to update what their
predicate corresponding to phi.
>> Alastair Donaldson: Yeah. So we're not going to tell them to update, we're
just going to update it for them.
>>: We're going to update?
>> Alastair Donaldson: Yes, synchronously. So at once. You can imagine like a
loop that executes atomically and just changes the states of all the other threads.
>>: Okay.
>> Alastair Donaldson: Yes?
>>: On your previous slide I think you had -- yes, notify all updates and shared
V. [inaudible] V is shared and then [inaudible].
>> Alastair Donaldson: Okay. So this is -- so on this slide by V I mean a variable
in the V program.
>>: Yes.
>> Alastair Donaldson: And on this side I mean a Boolean variable -- a variable
in the Boolean program. So I really should have said B not V there to make this
clearer. Thanks for [inaudible].
>>: B?
>> Alastair Donaldson: Yes, it could be V but it would be clearer if I had said B
not V. So, yeah, here V is a local variable from the previous slide V is shared.
>>: Can we assume that for any given program the number of [inaudible] is a
dot, dot, dot is finite?
>> Alastair Donaldson: The number of different values of dot, dot, dot. In a
Boolean program, yes, because -- well, I mean syntactically, no.
>>: No, but the dot, dot, dot is an expression not on a Boolean variables.
>> Alastair Donaldson: Okay. Well it -- on the variables ->>: It's on our values as you said. It's on our value from the original program.
>> Alastair Donaldson: No, no. This is in the Boolean program we're going to do
this. Yeah.
>>: A boolean value?
>> Alastair Donaldson: Yes. So what they think -- so this is in an abstract
program we're going to introduce this statement to -- this is a Boolean program
variable. And we're going to update it and all other threads using Boolean
program variables from well we'll come to -- from which that will come from. I
think when I show you this in the context of the abstraction hopefully it will be
clear.
So I'm going to introduce a bit of notation. So suppose phi is a predicate. And if
I put phi in square brackets what I mean syntactically is a formula equivalent like
phi but every local variable L is replaced with L in square brackets. So this is phi
in the context of a passive thread. So this is like another thread's take on phi.
So for example if we had the predicate S is equal to L, if I put that in square
brackets, I mean S is equal to passive L.
>>: So now that's really mixed because S is a C curve run variable and
bracketed L is a Boolean ->> Alastair Donaldson: No, no, no. This would be a predicate over the C
program. So S is a shared-variable and L is a local variable in the C program
yeah.
>>: So, okay. So -- okay.
>> Alastair Donaldson: Yeah. So here phi refers to a predicate over the original
program.
>>: I know that. But I thought that the brackets ->> Alastair Donaldson: So, yeah. So in a -- so we're going to use the brackets
on both levels. We're going to use the brackets of the C program level during our
abstraction and we're going to use the brackets in the Boolean program syntax to
represent broadcasts.
>>: Okay. Okay. So you're overloading with brackets.
>> Alastair Donaldson: Yeah.
>>: Okay.
>>: But which L are you referring to? I mean L is local to each one ->> Alastair Donaldson: Yeah, yeah, yeah. So this is just -- at the moment if you
can just think about this as a piece of syntax so we mean in some passive
thread. And then I hope it will become clear what I mean when I show you how
we use it.
So yeah, if we have an assignment V becomes equal to E, a predicate phi and a
variable B they be if V is shared and phi is mixed, we're going to do one of these
notify-all updates. So what we generate is a parallel assignment. So we say that
B -- this is the Boolean variable correspondent to phi in the active thread it gets
updated according to usual predicate abstraction rules and simultaneously B and
every other thread gets updated, right, but this is the -- where this phi in square
brackets gets updated -- gets used.
So the -- we're updating B in another thread because the version of phi for that
thread its truth may be shared by this shared update. So we need to update it to
reflect the weakest pre-condition for it to hold in this other thread after the
assignment of V not in the other thread but in the active thread, right, being
updated with our value E.
>>: So why can't I just think of this [inaudible].
>> Alastair Donaldson: As what?
>>: [inaudible]. It's sort of like.
>> Alastair Donaldson: Because we're not talking about -- because we're not
talking about a specific -- yeah. So I wanted to -- we could have used -- we could
have used V underscore J. We could have written for all I, V underscore I. But
we wanted something we could right in our programs.
>>: I wonder if he just needs two vocabularies.
>>: Yeah.
>>: One to talk about the thread that's moving and one to talk about the thread
that's being [inaudible].
>> Alastair Donaldson: Not the thread that's being affected but like any thread
that's being affected, yes.
>>: But the point is you're thinking about only parallelize interactions of these
two thread local states. Right? And then you're sort of taking the Cartesian
product of all those parallelized interactions and you're saying okay, this thread
that's moving interacts with this thread in the following way and that thread in the
following way and that thread in the following way. Those interactions may be
correlated but we lose those correlations.
>> Alastair Donaldson: Yeah. So maybe a clearer way of writing it would be
something like saying B equals blah, blah, blah and for all I, BI equals blah, blah,
blah, blah, blah, right? So we played with various different notations for this, but,
yeah, maybe --
>>: Okay.
>> Alastair Donaldson: Okay. And yet these statements are actually done
simultaneously.
So let's look at this on an example. Suppose we have this assignment S
becomes equal to L, S is shared, L is local and we have a mixed predicate that
tracks where the S and L reoccur. And there's a corresponding Boolean variable
B.
So we generate this, okay. So let's simplify it. So let's look at the top line. With
weakest pre-condition for S to be equal to L after S, the assignment S equals L is
L and L being equal, which is true. Okay, so this turns into B equals 1. Right?
So clearly if a thread says S becomes equal to L then for that thread S and L will
be equal.
So what about the other threads? Well, the weakest precondition for -- well, so
this formula S is equal to L in a passive thread this means that the
shared-variable L is equal to the passive thread's L. And the weakest
precondition for that to hold after the assignment is that the active and passive
thread have the same value of L. Right? Okay.
So now how do we compute this? So this relates to your question as to what the
R value should be, right, what we should allow on the right-hand side of this
assignment, which threads should we allow or should we be able to read from ->>: [inaudible].
>> Alastair Donaldson: Yeah. So which predicate should we give to the F
operator? So the obvious predicates we have at our disposal are the predicate
phi so S is equal to L and the predicate passive phi, S is equal to passive L. So
what we might try is saying let's just give the F operator -- because we're
operating a variable in a passive thread, let's just give the operator predicates of
the passive thread.
So in this case, what we would be trying to do is to strengthen L is equal to
passive L just over the predicate S is equal to passive L. And it's hopefully clear
that the best we can do here is false, right, and same for the negation.
So in this case we would say passive V becomes a [inaudible] everywhere. So
we would say we've updated the shared-variable so let's just kill all the
information in other threads about that shared-variable. And that would clearly
be a sound thing to do. But perhaps not a very useful thing to do. So we can do
better if we allow the F operator to range over variables -- predicates in the
passive thread and predicates in the active thread. So in this case we would be
computing the strengthening of L is equal to passive L but over two predicates so
S is equal to passive L and S is equal to L. And now we can do significantly
better, right?
So we can say that L will certain be equal to passive L if S is equal to passive L
and S is equal to L, okay? Because they're both equal to S.
And we can say that these things won't be equal if we have either S being equal
to passive L and S not being equal to active L or vice versa. So the
strengthening of the expression is the conjunction of these predicates and the
strengthening of the negation is the exclusive ore of the predicates. And this is a
much more precise thing to say. So we say here B is a choice between these
two things.
And I wonder if maybe the interplay between applying the brackets operator to
Boolean variables and applying brackets operator to the C program variables is
now a bit clearer. So during abstraction our abstraction engine is given these
bracketed variables in the program. But what it will generate is something over
passive and active predicates. Yeah? So I mean I wonder if it might have been
clear to use two different -- maybe to not overload these things. I think they're
kind of naturally related and for me overloading this works.
>>: If I could just think of this as being sort of the ordinary, you know, Cartesian
predicate abstraction of two processes, the active and the passive. I mean, can I
just think of it that way and then say okay now for my whole collection of
processes I'll just take the conjunction of all those things from all of the [inaudible]
for all of the active passive variables?
>> Alastair Donaldson: I don't think it's the same because I think if you just took
the Cartesian abstraction of a pair of processes you won't correct the -- simulate
the affect of updating all processes.
>>: You would be updating all the processes individually and then take the
cross-product of all those updates.
>> Alastair Donaldson: So if you considered each pair separately and then
combined them?
>>: Right, you consider each -- you pick one to be active and now consider all
the pairs consisting of the active one and the passive one, right, predicate
abstract and do a Cartesian abstraction going forward, all right. And now you're
going to wind up with states for each active successor state and you're going to
wind up with corresponding passive successor states for each of the other
processes take the Cartesian product of all that. Is that what you got?
>> Alastair Donaldson: I don't think I freely understand what you mean, but I
think that you might get something equivalent to this thing of expanding the
programming doing the most precise abstraction you can.
>>: Well, the most -- this couldn't be the most precise ->> Alastair Donaldson: No, absolutely.
>>: [inaudible] because it would lose all the correlations between the passive.
>> Alastair Donaldson: Yes, within the passives, yes. And it does do this.
>>: You're capturing the correlation between active and passive for each passive
individually.
>> Alastair Donaldson: Yeah. Yeah. Okay.
>>: Correlation between the passive is being lost.
>> Alastair Donaldson: Yeah.
>>: That seems to me the information that's being lost in this ->> Alastair Donaldson: That is exactly the information that's being lost. So you
can come up with examples. When I said we haven't implemented constraints to
our refinement in our tool yet, that's because you get counter examples that say
that this update is -- this transition is infeasible because of information in other
passive threads. Yeah. And we would need some notation and theory for
including that in our Boolean language.
Okay. So the answer is I think so. But I'd like to talk more about it. So, yeah,
this is the approach we take. And it may seem kind of arbitrary. This is saying
let's update the passive thread considering no other threads and here we're
saying let's update the passive thread considering one of the other thread.
We could consider two or three threads. But this seems very natural because
the active thread did the update, so hopefully the active thread state is going to
be useful in determining how we should update the passive thread state. But it's
not always enough.
All right. So the price here is that F operator now computes over twice as many
predicates in the worst case. So give the operator the predicates of the active
thread and the predicates of the passive thread. But it's not as bad as what I
showed you before where we have N times M where N is the number of threads.
So given an assignment V equals E, the way we do this in practice, so I showed
you how we update with respect to one predicate. In practice what we do is we
compute the indices of those predicates for which we need to do a broadcast.
So we work at -- this is when we're working at -- when we're figuring out the
abstraction. We work out the indices of the predicates that are mixed and that
have this variable V in them.
So for all predicates we do the usual thing of simultaneously updating all the
active threads Boolean variables, using standard predicate abstraction and then
for the predicates with these indices we do a broadcast. And this is one big
parallel assignment. Yeah? Does that make sense? So this is just putting
together what I showed you before. Okay.
So what about pointers. Now, I'll briefly show you how we can adapt this to take
pointers into account. So I'm not going to tell you anything interesting about how
to do concurrent alias analysis which is a difficult problem.
But let's assume we have an alias analysis procedure that's concurrency-aware,
and let's also assume that it will reject the program conservatively if the situation
of having a pointer from shared state to local state can concur. Because like I
told you in the beginning, we restrict our attention to not consider those
programs.
So otherwise the procedure will yield a relation points to D so we write X points to
DY if X may point to Y at program point D right?
>>: [inaudible] just treats all statements [inaudible].
>> Alastair Donaldson: Yeah, yeah.
>>: So fundamentally ->> Alastair Donaldson: Maybe you don't ->>: [inaudible] analyses if they're imprecise enough simulate the concurrency
already, depending on which one you pick. If you want too much precision, then
-- if you want a lot of precision like full sensitivity, then you lose -- yeah, you lose
the concurrency. But it turns out that if you vary imprecise you simulate
concurrency.
>> Alastair Donaldson: Anyway.
>>: Anyway.
>> Alastair Donaldson: So we've done something very [inaudible] to make our
analysis concurrent. And I don't know too much about concurrent analyses like
[inaudible] analysis, for example, but we're hoping to look into that in the future. I
don't know whether it scales.
But let's suppose we have this black box. So the locations of the variable V we
just say is the set V. Okay? Here I'm assuming that pointers point to variables.
I'm not think about records and arrays. But with some tedious work this could be
generalized to that situation. And we say the locations of a D reference are the
variable you're D referencing and anything it could point to.
So locations of an expression are anything you would need to read to evaluate
the expression. And this can be defined recursively for more -- for compound
expressions.
And then we say that the -- so capital Loc of phi is the union of the locations of
any program points. So before we said a predicate was shared if it only involved
shared variables, mixed if it involved both local and shared. So now we just
generalize this. So we say that a predicate is shared if its global locations are a
subset of the shared variables, local mix otherwise.
So this is the obvious generalization to take pointers into account. And then to
abstract assignments we -- so this is our definition of loc and we have a
corresponding definition of targets. So the targets of a variable are just the
variable and the targets of a dereference are just what the variable can point to.
So this captures what you could change by writing through this L value. Okay?
So now when do we need to do a broadcast? Well, suppose we have an
assignment phi becomes equal to E and I write phi because phi could be a
variable or it could be a dereference. And we assume that you've preprocessed
your program so that you don't have multiple dereferences as an L value.
So the predicate were assigned phi and we want to know do we need to
broadcast to other threads regarding the predicate phi, well, we need to do this
when phi is a mixed predicate using this pointer aware definition of mixed and
when there is some variable V which is shared and belongs to the location of psi
at this program point and to the targets of psi. Sorry, this should be the locations
of phi not locations of psi.
So a variable V is relevant to phi at this program point and which could be
changed by assigning to psi at this program point. So I hope it -- I hope it's clear
that if you -- with these new definitions then everything I showed you before
would now work for pointers. The challenge would be of course doing the alias
analysis.
So now I'll briefly tell you about how we've looked to the rest of the CEGAR
process. So we built a tool which we call SymSatAbs. This is based on Daniel
Kroening's SymSatAbs model checker.
So we've implemented this new predicate abstraction scheme in the model
checker. And then to do the actual checking we've extended BOOM which is this
symmetry capable Boolean program model checker. We've extended it to be
able to do these broadcasts which required some nontrivial modifications. And
they're quite computationally expensive to perform. Which is why we've been
careful in this work to try to work out as precisely as possible when you must do
broadcast and not do a broadcast unless you have to. Because the way this
symbolic counter abstraction works these broadcast are quite expensive to
realize.
Simulation required only trivial modifications. And then with refinement. So with
predicate discovery if we discover a predicate by doing say weakest precondition
on a counter example trace like L3 is less than S, so local variable to process 3 is
less than shared-variable S, then we simply add the generic predicate L is less
than S to our set of predicates. So this adds the predicate for all different
threads.
And we never get predicates like L1 being less than L2, because in our program
we could never compare local variables of different threads. So this is actually
very straightforward.
And then transition refinement, CONSTRAIN style transition refinement, this is in
progress. The problem is you may get a spurious transition between two
abstract states but these states will refer to very specific thread IDs. And then in
your Boolean program you want to add a CONSTRAIN to rule out this condition.
And doing that in -- well, you can't add a CONSTRAIN about specific threads. If
you did, you would destroy symmetry which is the thing that makes upward
scale. So we're working out a way to be able to write a generic CONSTRAIN
which will involve some quantification, involve extending our Boolean
programming language a bit and extending the B-BOOM model checker
accordingly.
Okay. So I'll tell you briefly about experiment results. I'm nearly at an hour. Can
I go on a little bit longer? Yeah. Okay.
So these examples are mostly working on lock-free data structures. So we've
got a lock-based and a cas-based counter, a pseudorandom number generator,
a lock-based and a cast-based stack that supports concurrent pushes and pops.
Then these concurrent lock implementation examples like I showed you earlier.
And also some examples on finding the maximum element in an array.
So relatively small but not completely trivial C programs and we have simple
properties that we're checking specified as assertions in the source code. And
for each version we've injected a bug. So we have a correct version and a buggy
version.
The bay we've evaluated this experimentally is by comparing our symmetry
aware approach against the symmetry oblivious approach that I mentioned to
you at the beginning of the talk where we just expand the threads out.
So what I'm claiming is that our approach gives you faster abstraction times
because you abstract over fewer predicates, just the predicates of -- mostly just
the predicates of one thread. And when you need to do a broadcast the
predicates of two threads, the active thread and some passive thread.
And then model checking is faster because of the symmetry reduction. So mixed
predicates were necessary for all of these examples except lock-based
pseudorandom number generator and the lock-based stack implementation. And
the intuition here is that to build these lock-free versions you need to do these
tests that check whether some local variable is still equal to a shared-variable.
And that's the kinds of good source of the mixed predicate. So it's not really a
surprise that in these versions where you imagine you have a lock as a primitive
language construct. You don't necessarily need mixed predicates.
>>: How do you model a lock?
>> Alastair Donaldson: How do we model a lock in SatAbs? So we actually
cheat. We actually have something called C prover atomic, right, which just
makes this like atomic in this [inaudible]. So that's how we model a lock in these
benchmarks.
>>: Have you decided when you're [inaudible].
>> Alastair Donaldson: Then the code you want to be executed until [inaudible].
>>: [inaudible] I guess what I'm asking, do you model locks as a Boolean
variable ->> Alastair Donaldson: Yeah.
>>: As a Boolean variable. Okay. One more question. When you say you were
a finite cas-based stack.
>> Alastair Donaldson: Yeah.
>>: Were the cas operations just used to implement the lock or were they being
used to, you know, flip or [inaudible].
>> Alastair Donaldson: So I'm not actually sure. This comes from an open
source IBM implementation. And our student, Alex Kaiser, he worked on this. I
believe it's not just implementing a lock on the stack, I believe it's an actual
lock-free stack. I haven't looked into the details. But you can -- I know where it is
online. And we can look at it.
So this stack example was written in Java. And he ported it to C, which is what
our model checker works in. Tantalizingly he found a bug in the IBM
implementation but untantalizingly, the bug manifest with only one thread. So not
very interesting for us to say we found this bug.
>>: [inaudible].
>> Alastair Donaldson: So it's from this book concurrency design patterns.
Yeah. So ->>: [inaudible].
>> Alastair Donaldson: Yeah. So we experiment on a three gigahertz Intel
Xeon. We have a timeout of one hour. And we do the Cartesian abstraction with
a maximum cube length 3.
So in blue I've indicated which method performs best. So you can see here we
have the number of predicates required for the symmetry oblivious method which
you can see grows with the number of threads. The number of predicates
required by the symmetry aware method which is independent of the number of
threads. Because this is the number of like generic predicates. Then we have
the time taken for abstraction and symmetry oblivious and symmetry-aware
approaches respectively. And the time taken for model checking.
So here model checking is performed with B-BOOM, our extension of the BOOM
tool. And here we took the best time between SMV and BOOM with no
symmetry reduction. So you can see that we get significant speedups on many
of the examples we can verify. So we show the largest thread count here that we
can verify with each approach. So in this example, showed a mixture of thread
counts. But 10 was the largest thread count we could verify without symmetry
whereas we could go up to 20 but not further with symmetry.
So we can check interesting thread counts obviously during a certain point
checking more threads isn't that interesting. So the important thing is for some of
these cases we could only check very small thread counts without exploiting
symmetry. But we can check larger thread counts with symmetry. Are there any
questions about the experiments?
>>: So it seems that there are now two differences between symmetry oblivious
and symmetry aware. And now one of them is that you're doing this active
passive abstraction of the transition relation in symmetry aware but not symmetry
oblivious.
>> Alastair Donaldson: Yes, symmetry aware is -- computes a more precise
abstraction. Which is why it takes longer.
>>: Yeah. And the second one is of course that in the symmetry aware you are
displaying symmetry.
>> Alastair Donaldson: Yeah.
>>: But even in -- even without using -- even without using your active passive
approximation.
>> Alastair Donaldson: Yeah.
>>: That the system is still symmetric.
>> Alastair Donaldson: Yeah.
>>: And that you could still apply symmetry reduction techniques even without
that.
>> Alastair Donaldson: Okay. So we have a little bit. So with this symmetry
oblivious approach, if you expand the threads and then abstract thread one to get
a Boolean program, then you know the Boolean program for thread two is going
to be the same, but you just switch their IDs. Right?
So what we did with these experiments is we actually didn't implement that but
we divided the abstraction time by the number of threads to be fair. So because
it would be possible to compute the precise abstraction for one thread and then
generate the abstract versions for the other threads that would require some very
tedious implementation work. So ->>: Correct. But you could also use sort of the traditional symmetry reductions,
you know, on the -- on the version that is still exact and doesn't use your active
passive ->> Alastair Donaldson: We end up with this ->>: [inaudible].
>> Alastair Donaldson: Okay. Yeah. So the trouble is that we aren't aware of a
way of doing that in practice like a model checker that can take a set of
processes and do symbolic symmetry reduction. So, yeah, we could give this to
something like spin, say, and do symmetry reduction or Murphy. But then we
would -- that would be hopeless because of the problem of nondeterministic
Boolean variables, right? So with a large number of Boolean variables the states
place would be ->>: So you're saying ->> Alastair Donaldson: Feasible.
>>: So you're saying there's no symbolic implementation of that symmetry ->> Alastair Donaldson: Yeah.
>>: I'm not saying that symmetry reduction would necessarily work well in that ->> Alastair Donaldson: But it's something we ->>: I'm trying to say that there's sort of two [inaudible] everyone was saying was
going on here, right.
>> Alastair Donaldson: Yes. So one is computing an efficient abstraction ->>: Right, one is the abstraction you're performing and the other is the symmetry
quotient?
>> Alastair Donaldson: Yes.
>>: Which could in principle be applied on both although you don't have the tools
to apply them [inaudible].
>> Alastair Donaldson: Right.
>>: So you're saying that I should really multiply like the abstraction like the 13
on the first row I should really multiply it by six to get the true time?
>> Alastair Donaldson: Well, to get the true time that we actually spent. So we
left this running like over a week or whatever. And that's how long it actually took
because we just abstracted thread one and then we abstracted thread two and
thread three. But if I had sat down for like two days or something or less I could
have -- or maybe three hours, I don't know how long I could implement a tool that
could take the abstraction from one thread and produce -- just using syntactic
changes abstractions for the other threads.
>>: Okay.
>> Alastair Donaldson: So it would be pretty unfair of us to report that multiplied
time given that this is like an implementation trick that we just haven't had time to
do ourselves.
So we had to use -- vowel we used the timeout longer one hour for those
examples. We multiplied the timeout for abstraction by the number of threads.
>>: [inaudible].
>> Alastair Donaldson: Because ->>: Okay.
>> Alastair Donaldson: Yeah. I showed you this comparisons of the unsigned
approaches before showing them no good and I just wanted to show you about
the symmetry-aware approach is good. It gives you the right results in all
indicates.
Okay. So future work in -- that -- some of its in progress.
So this CONSTRAIN style refinement is important so -- because we don't have
this, we had to give quite the list of predicates manually for our examples. So if
we give fewer predicates than SatAbs can't find any more useful predicates and
the abstraction isn't precise enough. So I'm looking forward to solving that
problem.
The abstraction time is very high because we don't take advantage of procedural
information right, so when you're abstracting one procedure, we're considering
predicates like overall other procedures. But the challenge with concurrency,
which I would love to talk to you guys about is that when you're abstracting over
these passive threads, then they could be in any procedure. So you don't know
where they are. Yeah. So I don't know if there are like any cool analyses to
deduce the two procedures can't be mutually occupied or, you know, I don't know
if that really happens in practice in interesting concurrent programs.
We need a -- we've got a very crude concurrent alias analysis. It would be nice
to get a smarter one to deal with heat manipulating programs. There's this large
blocking code in this being successful and the blast model checker. It would be
good to try that, but with concurrency suddenly it seems less effective, right,
because you can't just mung together a book of statements if some
shared-variable access is in the book of statements. So we would need to do
this large book encoding much more conservatively than it can be applied in
sequential programs.
As I said, we're going to look at trying to analyze programs with weak memory
models. And I'm not so interested in parameterized verification personally. I
think you know for me showing that something works up to an interesting thread
game is kind of enough. But my colleagues are very interested in parameterized
verification and have techniques from CAV last year which in principle could be
applied directly in our current setting to use this cutoff detection method to show
that once you've checked up to a certain number of threads any larger thread
count can be safe. For some technical reasons we can't yet apply that. Like in
theory there's no reason why we shouldn't apply it. But there's some technical
reasons which I don't fully understand we can't yet. But, yeah, this is something
that they're definitely going to look into.
Finally, just to summarize, well, maybe I don't need to belabor the point ->>: I'm sorry. Because when you do the counter abstraction with acceleration
for certain types of parameterized systems finite state you can -- I mean
[inaudible] results, right, counter abstraction ->> Alastair Donaldson: Yeah.
>>: Acceleration, right?
>> Alastair Donaldson: So this is [inaudible] right?
>>: Yeah.
>> Alastair Donaldson: Yeah. But that's extremely expensive [inaudible]
exponential. And at CAV last year, my colleagues have a more efficient
technique where you check your system for thread count after thread count after
thread count and in success I have checks you look for something, and I'm not
quite sure what it is, but it's something that tells you that's enough, you don't
need to go any further. Right?
But implementing it, the trick -- the problem is implementing it symbolically is
nontrivial. So they have an explicit state algorithm for doing it. And I think they're
working on a simpler version but don't yet have it.
Okay. So in summary, instead of expanding P and abstracting the resulting
program, which would be very expensive both to abstract in jack we're
abstracting at the template level, which is scalable, expanding this, but checking
it by exploiting symmetry and because symmetry reduction gives you a
bisimulation, then this is a sound thing to do.
We publish this as a technical report as well as having submitted the paper to
CAV. And, yeah, I would love to talk to you all about it now or you can e-mail me
if you're interesting in getting hold of the tools and trying it.
[applause]
Related documents
Download