>> Nikhil Swamy: Okay. Well, I'm really happy... Northeastern. David's been doing some great work on adding...

advertisement
>> Nikhil Swamy: Okay. Well, I'm really happy to have David Van Horn visit us today from
Northeastern. David's been doing some great work on adding contracts to racket and verifying
the behavior of these contracts, and he's going to tell us about some of these today.
>> David Van Horn: Okay. Thanks. Yeah. So thanks for having me. I will say so there are
many reasons why it's nice to be here, but one of them is that my mother has a software company
in Texas, and she does records management work, so she's been a long -- I mean, she's been SQL
Server work and stuff like that for like 20 years. And, you know, if I tell her I'm going to give a
PLDI talk, she's like, oh, that's nice. But then when I said I'm going to Microsoft, she took a step
back and thought I was somebody special. So thanks for that.
Okay. So I'm going to talk about this work on trying to verify behavioral software contracts. It's
joint work with Sam Tobin-Hochstadt at Northeastern.
>>: [inaudible].
>> David Van Horn: Yeah, I think so. Yeah. This is why you should not be friends with your
colleagues on Facebook. But anyway. So I'm just back from this DARPA PI meeting in San
Diego, and the project that we're working on has this thesis statement which is that language
mechanisms that capture design knowledge can be leveraged to qualitatively improve reasoning
about programs. And I think that that's a thesis that's really been supported by the work here
with the Rice [phonetic] group, and so I'm really happy to be here talking about this, because I
think we have sort of similar goals.
So I'm going to talk about this thesis in the context of the particular language mechanism which
is contracts and a particular sort of automated reasoning, which is verification. So contracts give
you a way of writing down specifications as part of your program in the programming language
that you're programming in, and I want to be able to verify those things.
Okay. So the goal here is to have some sort of automated modular verification of higher-order
programs with contracts. And, right, I know that there's been a lot of successful work here on
verification of programs with contracts of a slightly different sort. So I'm hoping I can tell you a
little bit about the sort of higher order bit here.
So I want to start by telling this story in a simplified setting where I'm not going to talk about
contracts, but I'm going to talk about types, okay, and then we'll ramp it up to contracts. And I
know Francesco has seen this, a lot of this talk, at OOPSLA, so I've tried to add some more
details so that at least he gets something out of it.
>>: [inaudible].
>> David Van Horn: What's that?
>>: Well, Romans used to say [inaudible] repeated [inaudible].
>> David Van Horn: Yeah. Okay. So -- so I'm going to try to get across the essential ideas here
using this simplified model of Plotkin's PCF language. And I'll talk about PCF not because I
actually care about PCF, but just because I can boil everything down to its essence here.
So let me just give you some examples so you get your footing with this language. So here's -here is a -- a program up at the top, and I've just illustrated the steps that it takes to compute an
answer, so this is integer division 10 over this conditional expression, so we're branching on
whether 7 is 0 or not. Of course it's not, so we're going to pick 3 here, and we get 10 over 3,
which is 3. Okay.
Okay. So the point here is that we've got some basic operations conditionals. Let's look at
another example. So here you'll notice that the thing that ends up in the dominator position is 0,
and so the point here is that we've got errors in our language as well, so you can divide by 0 and
get an error here.
All right. We also have functions in this language. So here's the 10 over X function, and it's
being applied to what's eventually going to be 3, so we plug this in for X and get 10 over 3 again
and back to 3.
All right. And not only do we have functions over numbers, but we also have functions over
functions. So here is the apply to 3 function, it's taking in as its argument the 10 over X function,
and so we eventually get down to 3 here. All right.
Okay. So the way that I'm going to approach this verification problem is by doing a sort of
symbolic execution. And the way that I'm going to do it is by extending PCF to do symbolic
execution, so I'm going to call this language symbolic PCF or abbreviated as the symbol PCF.
Okay. So here's the idea with this symbolic PCF language. So I want to be able to take a PCF
program and rip open and pull out a piece of it. Okay. So here's the apply to 3 function. And I
want to be able to abstract it to some symbolic thing.
And so while I'm telling you the simplified story with types, the thing that I'm going to abstract it
to are types.
Okay. So this is a symbolic value in this language. It's really a -- it's sort of a black hole, so it's
an unknown value, but it -- what we do know is the specification, which is its type. Okay. So
it's a thing that takes a natural to natural function and it's going to produce a natural.
Okay. And I want to be able to
plug that back into my PCF program, seal it up, and now I
have a symbolic PCF program and I want to be able to run that. Okay.
And as I'm thinking about the design of the semantics, I want to -- I'm going to have this design
principle in mind, which is sort of what are the sets of programs that we're talking about in this
symbolic one. Okay.
So if we think about that, symbolic thing, what I'm really talking about is all possible
concretizations of this. So I could have picked any value that could be abstracted to that and
plug it in, and now I'm back to just a regular old PCF program, and I want to be able to run that,
and I want the semantics to be sound in the following sense that all of the concretizations are
approximated by my symbolic PCF semantics. Okay.
All right. So let's look at some examples here. So this is just the example we saw before. And
now let's start abstracting parts of it.
So what if I abstract the numerator here. So I'm doing some number over 3 here. All right. So
you just stare at this for a minute, and you figure out, well, it's got to be -- it's some number over
3, that's just some natural number. All right. So the rule here is that we extend our interpretation
of primitive operations to say that some symbolic natural number over some number that's not
zero, that just gives you back a symbolic natural number.
>>: [inaudible] so you do writing this, your transitions as relations and your delta is the
transition relation [inaudible]?
>> David Van Horn: Yeah, exactly.
>>: And that's ->> David Van Horn: Yeah.
>>: You've added an element to the delta transition?
>> David Van Horn: Right. So we're ->>: [inaudible].
>> David Van Horn: Yeah. We're just doing reduction semantics, and this is our relation for
interpreting these things, and I've written the relation that relates these two things here. Okay?
All right. So here I've abstracted in a different place; namely, the thing that we branch on here.
Okay. So I'm doing 10 over -- well, if this is 0 to an otherwise 3, but of course this represents all
possible natural numbers, so it could either be 0 or not 0. All right. But that's easy to model as
well. You just take both. Right? You come to a fork in the road and you take it.
So here we get 10 over 2 and 5 on one branch and 10 over 3, 3 on the other branch. Okay. So
the relation here is just -- is straightforward, right? If you're branching on a symbolic natural
number, you take both branches. Okay.
Okay. Here's another example where I've abstracted the thing that ends up in the ->>: So the ->> David Van Horn: Yeah.
>>: The [inaudible] or the delta cannot -- cannot [inaudible]?
>> David Van Horn: Yeah. So this language -- so the language does have side effects in that
there are errors. But, okay, so let's think of -- so this could be abstracting an expression that
causes an error.
>>: Okay.
>> David Van Horn: And so the way that I think about it is if this is some expression that causes
an error, that's really that expression's fault. Right? It's not this larger -- it's not the program I
know about that's at fault for that error. So I'm only going to be concerned with errors that blame
the thing I know about. I can make that a little more precise later.
>>: Okay.
>> David Van Horn: Yeah. But errors are really the important thing that we're going to be
concerned about in these examples. Okay. So ->>: Just to understand the [inaudible] semantics.
>> David Van Horn: Yeah. Yeah. So -- so one way to think about it is that these -- these
abstract values, and so the value -- there's no error for values of type not.
>>: Okay.
>> David Van Horn: Only computations. Okay? If this doesn't get cleared up, let me know,
because it's important.
>>: [inaudible].
>> David Van Horn: Yeah. Okay. So this is the thing that ends up in the denominator position.
And so now we're doing 10 over some natural number, and so of course we could be -- we could
get a divide by 0 error, we could get some natural number here. Okay. And so the point here is
that there is an error that's really the fault of this program. Right? And I don't care about the
errors of the things I don't know about; I care about the errors of the program I know about. And
there is one here.
Okay. But the relation is simple. So some number over a symbolic natural number, you either
get some natural number or a divide by 0 error here.
Okay. So here's really the sort of first interesting example, right, which is that we've abstracted a
function here. Okay. So I'm applying a function that takes naturals and produces naturals and
I'm giving it the number 7. Okay. So I don't know what this function does. But I know that if
it's given a natural number it produces a natural number, so it's sort of obvious what this should
reduce to, which is just a natural number, right? It should just reduce to the range of this
function here.
So the relation here is similarly straightforward, where if we're applying a symbolic function that
takes some inputs and it's applied to some values here, some arguments, well, then we just get
back a symbolic result that is -- corresponds to the range of the function. And here you should
think of -- you know, thinking about errors, right, it could be that this function causes an error
when given 7, but that would be that function's fault. It's not 7's fault. And I'll talk more about
fault and who's to blame later.
>>: So you also have -- nontermination is in effect?
>> David Van Horn: So we don't -- we don't really consider nontermination. So if this -- if this
program doesn't terminate, that's -- I mean, we're going to give an answer that corresponds to
what happens if it did terminate.
>>: Sorry, just a quick question.
>> David Van Horn: Yeah.
>>: What is [inaudible] a precondition? You have precondition in your language.
>> David Van Horn: So ->>: [inaudible]?
>> David Van Horn: So in this simplified setting, there's no -- there's no preconditions.
>>: Oh, okay.
>> David Van Horn: But we'll get there.
>>: I suppose you have preconditions in just the typing, though.
>> David Van Horn: In the type. Right. So -- but all of those -- all of ->>: Expressions are [inaudible].
>> David Van Horn: Yeah, exactly. So -- so we assume everything as typed, and so all of
those -- so if I had applied this to the string thread ->>: [inaudible] assuming there's no errors.
>> David Van Horn: Yeah. Yeah. So the only -- the only error that we have right now is divide
by 0. And we're going to scale that up later. Okay?
Okay. So I can write down the semantics. This is in the Read X language that comes with
Racket, so we've got this little domain-specific for writing down reduction semantics. And
there's a problem. So it's nice. It fits on a slide. Of course the problem here is that it's unsound.
Okay?
So let me show you how it's unsound. So here here's another example where the thing that I've
abstracted is a higher order function. Okay. And this is really the problematic case. Because the
rule that I showed you earlier just says that if I apply a symbolic function to some input then
what I get back is a symbolic value corresponding to the range here. So this is what it says I
should get here. Okay.
And right. That's fine. So if we think about how to concretize this, right, that's fine if we pick
something like the apply to three function or the -- but if we pick the apply to 0 function, things
go wrong, right, because here is -- here is just a straightforward PCF program where I've plugged
this thing for this unknown component up here. And what I get is going to be 10 over 0. And I
get some error. And you'll notice that that's not -- that's not represented up here.
And also who's to blame for this error is really the program that I knew about in the first place,
right, because I'm passing this 10 over X function off into some unknown context with the
specification that it takes -- you know, that this function takes natural numbers and gives you
back natural numbers, but there are natural numbers for which it causes an error, right? So it's
really -- this thing hasn't lived up to its specification here.
All right. And the semantics didn't account for it.
>>: You're saying [inaudible] add in contracts to prevent the situation?
>> David Van Horn: So first I'm going to revise the semantics to find this kind of error. And
then, yeah, if I have a contract language, then I can put preconditions on this sort of thing and
avoid this error as well.
All right. So right. So getting back to the thesis of that project, if we give programmers the
ability to write down richer specifications, then we can do a better job reasoning about it.
Because if types are our specification language here, we're just stuck with that this might -- this
might divide by zero. Okay.
>>: So maybe [inaudible] specification languages, it's ->> David Van Horn: Yeah, you can just check it.
>>: -- more easily decidable.
>> David Van Horn: Yeah, right. Yeah. So it's both -- it's more difficult to reason about, but
you can also do other, you know, more powerful kinds of reasoning. Okay?
All right. So thinking about this example, what I really want to sort of focus on is what happens
to this function as it goes off into this black hole here. Okay? So at this call site, you know, this
function is really being sucked into this black hole here. And what can happen now that we're in
this sort of unknown land with this function, and really -- so what can happen here is anything so
long -- you know, we can do anything with this function so long as we play by the rules of the
language and for now that means playing by the rules of the type system.
So you could apply this function to 7 or you could apply it to 8 or 625. But if you apply it to 0,
that's the problematic input that's going to give the error here.
Okay. So it's obvious from looking at this function that 0 is the input that you sort of looking for
to make this cause an error.
But in general you want to consider all possible natural numbers here. And it would take a while
to iterate through all of them, but we have a way of applying this function to all of the natural
numbers all at once, which is just apply it to the symbolic natural number.
Okay. So in order to account for this stuff that happens in these unknown contexts, where the
only thing that we're concerned about are what are the possible errors that could occur, we
extend the semantics with this havoc relation. And the choice of names here is not accidental;
this is sort of inspired by the havoc function from Boogie [phonetic] where -- which is all about
sort of putting things in an arbitrary heap context, right, in a first-order language. And here we're
talking about behavioral things where it's all about putting these behavioral objects in an
arbitrary context that can explore the behavior.
Okay. So when you apply a symbolic function to some inputs, we just pick out the type
corresponding to each input. And what havoc is going to do is basically do everything it can to
this value at this type. Okay. And in particular so at this type the thing that you can do with this
function is apply it to a symbolic natural number.
Okay. So emerging from this, now that we have -- so we extended the semantics with this havoc
relationship, and so every time that you apply a symbolic function, there's sort of two paths. One
is the good path, which is we just get the range, right, and there's -- which is sort of the no errors
occur when this thing goes off to the unknown context, and the other path is let's try to find all of
the errors in the not to not to not context, okay, and it finds the divide by zero because we
applied it to an arbitrary natural number, and our semantics already says that 10 over some
natural number is an error. Okay. So now we're back into a sound semantics. And it only got
slightly larger. We had to add this -- we added this rule here.
Okay. All right. So we now have the soundest property that says all of the concretizations are
approximated by the symbolic PCF semantics and it's got this nice sort of verification condition
it implies, which is that if you're error free in the symbolic semantics, then for anything that you
can plug in, you're error free for the original PCF semantics. Okay?
So, in other words, you can verify the program that you know about, and you could plug
something in that has an error in it, but it would be that component's fault. All right.
>>: [inaudible].
>> David Van Horn: Yeah.
>>: I didn't see how that was captured in your statement of verification, if you do plug in some
value ->> David Van Horn: Okay.
>>: -- that you can have an error, but that the blame [inaudible].
>> David Van Horn: Yeah. So really I need to talk about contracts to get the blame nailed down
in detail. I'm sure that the details of the blame and the typesetting could be worked out, but
they're definitely worked out for the contract stuff.
>>: [inaudible].
>> David Van Horn: Yeah, sure.
>>: So the [inaudible] mistake, so you're saying all the concretization, so the error
concretization, I was expecting the -- so you're saying [inaudible] approximated by PCF, that this
[inaudible] not missing any PCF problem, you're not missing [inaudible] PCF, that's what you're
saying?
>> David Van Horn: Yeah.
>>: Okay.
>> David Van Horn: Yeah.
>>: Okay.
>> David Van Horn: So basically [inaudible].
>>: It's the other way around. Okay.
>> David Van Horn: Yeah. If you started with a PCF program and pulled out parts of it and ran
it, you would have approximated the original one.
>>: [inaudible] you have an alpha.
>> David Van Horn: Yeah.
>>: Essentially [inaudible] approximate for each PCF problem [inaudible]?
>> David Van Horn: I don't know.
>>: Approximated in different things? Okay. [inaudible] which is the order you automate?
How you order problem?
>> David Van Horn: So I'll show you in a minute. Yeah. I'll show you some of the details.
Yeah.
>>: [inaudible] respect to some other.
>> David Van Horn: Okay. Yeah. Okay. Yeah. So just hold on to that ->>: [inaudible] so PCF programs. You essentially say this, quote, PCF problem is approximated
[inaudible] approximates all these sets.
>> David Van Horn: Yeah. Exactly.
>>: And now here [inaudible] you have the empty set and all the [inaudible] programs.
>> David Van Horn: Yeah.
>>: This is your essentially [inaudible] what is the order in the abstract? What is the order in the
abstract when you say that one program approximate more? And that's not -- seems that you
have some -- you should not have [inaudible] structure, I guess, you should have some other
order here, because now [inaudible].
>> David Van Horn: Well, I'll show you the order.
>>: Okay.
>> David Van Horn: Okay. All right. So now let me try to sort of redo this with a language
with behavioral contracts in it. So we're going to look at contract PCF, which I'll call CPCF.
And the language is extended in the following way. So to our set of terms, we've got two new
forms. So one is a monitoring form, which says that we've got some contract here and we're
going to monitor this expression with that specification. And if this expression doesn't live up to
its contract, then we blame. Okay. And in the simplified setting I'm going to not tell you about
how the blame is assigned, but I'll actually show that in this talk. So we'll get there. But for now
you sort of don't need to worry about it.
Okay. And now the language of contracts is -- well, it includes just arbitrary expressions which
encode predicates. So we can write down a predicate and say that this expression must satisfy
the predicate. And that's sort of the Eiffel-style contract system.
And then to extend this to higher order contracts, because there's no predicate that we can write
down for a higher order thing and hope to be able to apply it and check it at runtime, we have a
constructor for functions. Okay. So this is a function that takes input satisfying these contracts
and produces some outputs satisfying this contract. Okay? So you build up contracts -high-order contracts out of these predicates.
Okay. So here's a simple example. I want to check that 7's positive. So the way that you do the
check is you just apply the predicate. If the predicate holds, you get the value back. Otherwise,
you get blame. Okay. So we get 7 here. If you check that 0 is positive, well, you're going to get
blame here. Okay.
And now the crucial thing is how you do the higher-order -- monitoring higher-order things. So
here I've got some function where I've omitted the body here. And I want to -- I wanted to have
the specification that it takes inputs that are prime and produces outputs that are even. Okay.
And I can't just check that against this function at this point in the program.
So the way -- so the sort of Findler and Felleisen insight here is that the thing that you do is you
produce a new function, okay, so you delay the checks and produce the new function that when
it's applied is going to check that its input was prime.
Then it's going to apply the original function that we had here. And after which it's going to
check that the result is even.
So you've got these higher-order things that you need to enforce, and the way that you enforce
them is that you drive them down to a lower level and delay them. Okay? So this is how
contracts work in Racket, for example.
Okay. And now I'm just going to sort of replay the same story with the symbolic semantics for
the contract PCF language. And it's the same idea where I want to be able to rip out pieces of the
program and abstract them, but not just to their type, but also to their type and some set of
contracts. Okay.
>>: That works out? That transformation you just said works out even for higher-order
functions?
>> David Van Horn: Yep. Yep.
>>: I guess because you just keep distributing the down until you get to the point where you
[inaudible]?
>> David Van Horn: Right. Yeah. So when you've got a contract violation -- I mean, when a
contract fails, it's always sort at first order. Right? It's ->>: [inaudible].
>> David Van Horn: Yeah. Yeah. So the real contribution of the Findler and Felleisen work
was how do you enforce these, which is the driving down, and also how do you account for
whose to blame, because that gets more complicated.
>>: Sure.
>> David Van Horn: Yeah. Okay. So yeah. So here I've just attached some contract to this
symbolic thing. So the symbolic values are sets of contracts or a type and a set of contracts.
>>: But I suppose this driving contracts down to the [inaudible] it works, it depends on your
contract language, right? I mean, if you were trying to state the equivalences between functions
and so on, this doesn't work.
>> David Van Horn: Yeah. So there's some -- there's -- yeah. There's some question about like
what does it mean to be correct with respect to some contract. Because like let's say I have a
program that I say takes even numbers and produces odd numbers, but I never call it. Right? So
it could be the function that it always returns 4. And if I said it always produces a prime number,
well, there's no -- it never breaks because we never got there. So by delaying those checks, you
can -- you'll never get there if you don't actually force that thing to happen.
>>: So you're saying it's equivalent to runtime checking.
>> David Van Horn: Yeah. I mean, it's a runtime checking.
>>: [inaudible].
>> David Van Horn: Yeah.
>>: [inaudible] reduction semantics, it's always ->> David Van Horn: Yeah.
>>: -- getting it down to [inaudible].
>> David Van Horn: Right.
>>: Its equivalent.
>> David Van Horn: Yeah.
>>: Even if what you think of it is -- you're thinking of it as static checking. It's not.
>> David Van Horn: Right. But I'm going to -- the property I'm going to sort of verify is the
runtime part of it. So if I say that -- if I verify that this thing satisfies this contract in my
program, what I mean is that at runtime it never produces something that's not prime.
>>: It might not know that [inaudible]. You're not going to know that unless you actually -might not know that unless you actually run the program, because it might get delayed to the
point where [inaudible].
>> David Van Horn: Yeah, but I -- so that's true, but I think our approach to verifying these
things is just to actually run them but in this -- in this sort of symbolic semantics.
>>: Which is [inaudible] just approximating [inaudible] PCF programs [inaudible] say, okay,
abstract when I abstract my [inaudible]. If I can improve it, then by soundest result I know there
is no wrong program ->>: Yeah, yeah, yeah, if you can prove it [inaudible].
>>: [inaudible].
>>: So I have probably a related question. So what about -- can you express if the function is
monotonic, for instance, or you want as input ->>: That was my point actually [inaudible] so my point is that if you are interested in checking
safety properties, then you can do this by pushing the contracts down [inaudible]. But if you
want to check properties, things like out of the program that's lambda F, lambda G ->> David Van Horn: Yeah.
>>: -- and I want to -- my contract says I want F to be extensionally equivalent to G ->> David Van Horn: Yes.
>>: -- or I want F to be monotonic, then these are not properties of a single execution of a
program.
>> David Van Horn: Right.
>>: And then you can't just push the things out.
>> David Van Horn: Yeah. So that's a -- that's just something that the contract language is not
going to let you express. Now, you can express -- so I've shown just a really simple contract
language here, and I'll show some more of the bells and whistles of what are actually in the
account language. So you can express dependency. So you can say I take F and G and X and
the result is that F of X is equal to G of X. And that will check it for any particular X that you
give it, but not for all X. So you can express things like that.
But, yeah, you can't express properties that can't be enforced other than running the program and
checking those things. Yeah.
>>: [inaudible] you still have incompleteness [inaudible] so there are things you can [inaudible].
>>: The problem is completeness has come up way earlier than even [inaudible].
>> David Van Horn: Yeah. There's a kind of incompleteness just in the contract language, right,
forgetting about static analysis or anything. You just can't -- you can't write down the contract
that expresses those sorts of things. Like the function's monotonic.
>>: [inaudible] incomplete even in the limitations of what your contract language is.
>> David Van Horn: Yeah. Yeah, yeah, of course. Sure. Yeah.
Okay. So here's a simple example where I'm checking that some natural number is positive.
Okay. And so of course you can get blame or it could be that it was positive. But there's an
important thing going on here. So when we check this predicate against this symbolic thing,
you'll notice that the way that it reduces is we check the predicate -- I mean we apply the
predicate. But then in the case that it holds, we've added that predicate to the set of contracts that
this symbolic value satisfies.
Okay. So when you get down to the end here, if that predicate holds, we've learned something
about the thing we were talking about here. Okay? So the symbolic values remember the
contracts that they've satisfied. All right?
And then that's going to be important sort of downstream where you may check this again. And
if you remember that this is a positive natural number, then of course you just get back a positive
natural number. And there's no blame here. Nothing goes wrong.
Okay. So the contracts influence the computation. And that's also true here when we think
about our primitive operation. So 10 over a positive natural number, we just extend this relation
to say that if you're doing 10 over something that is positive, that you're just going to get back a
natural number here and not a possible divide by 0.
>>: [inaudible].
>> David Van Horn: So here I'm saying -- yeah, so that's sort of a shortcoming of the relation
that I worked on here. I'm sure you could -- you could add that to the relation; I just didn't.
Yeah.
>>: So it's [inaudible] now you're refining your abstract domain by adding essentially those
contracts.
>> David Van Horn: Right.
>>: Now it's a matter of [inaudible] gets more precise because now you know that [inaudible]
cannot be 0.
>> David Van Horn: Right.
>>: And so you get more precise. So my question is that can you get the [inaudible] by just
taking [inaudible] abstract domain [inaudible] types the one [inaudible] for creating a more
precise domain? So this looks a lot [inaudible].
>> David Van Horn: Maybe -- I really don't know about that. Yeah.
>>: But just seems ->> David Van Horn: Yeah.
>>: -- this is something you can build [inaudible] these basic blocks automatically, you can ->> David Van Horn: Yeah.
>>: -- repeat this one, and so that's why you [inaudible].
>> David Van Horn: Yeah.
>>: You refine your [inaudible] domain.
>> David Van Horn: Yeah. Yeah. I suspect that that might be the case, but I don't know. Yeah.
Okay. And so here's that -- here's that higher-order example that we saw earlier, except the
difference is that I've attached a contract to the 10 over X function which says that it's got to be
given a positive number. Okay.
And now, okay, so the first thing that happens is that we drive down the contracts, and now this
function goes off to this unknown context. And what happens here is remember havoc gets to do
whatever it wants to do with that value, but it always has to play by the rules of the language,
which in this case means that it's got to respect the contract.
So it takes this function which we know consumes positive numbers, and it's going to apply it to
a positive natural number, in which case there's no error here, right? So the important thing is
what's not on the slide, which is there's no error and we always get a natural number.
And so we can verify that the 10 over positive X function is -- can't be blamed. All right? And
that's the kind of -- the kind of thing that we want to verify in this work.
All right. And we have the same soundness and verification property here. And but now when
we talk about error-free symbolic CPCF programs, that we're not just talking about divide by 0
errors, we're talking about blame which is all about arbitrary properties that a programmer wrote
down. Okay. So it's a much richer sense of being error free here.
Okay. Whoops. So now let me show you a little bit more about the details. Okay. So here's our
semantics for the symbolic PCF. And it's basically just the stuff that I've shown you before.
We've got this delta operation for interpreting our primitives. So the way we do conditionals is
we ask if the false predicate holds on the value, and if false is in there, we take this branch. And
the important thing is that this predicate on an arbitrary Boolean gives you back both true and
false, in which case you'll take both branches here.
The delta's also used here, so this will include things like if you -- if you add 1 to some actual
natural number, you get N plus 1, but if you add 1 to some unknown natural number, you just get
back a natural number.
So this -- so here's our function application with a symbolic function. One thing I should say is
we -- we really do have dependent function contracts here. So this is saying this -- we've got a
function that takes an input satisfying C1 and we produce something that satisfies C2 with X
being replaced by the actual input here. So this is how you can express something like I
always -- you know, I take in some natural number and I give you back that number squared or
something like that. So we can express that.
And what this is saying is that we've got a set of contracts attached to this function and what you
get back is some unknown thing that is refined by the set of range contracts here with the input
replacing the X.
>>: So how do you express this in your [inaudible]? You had contracts that were [inaudible]? I
didn't see any binders there.
>> David Van Horn: Yeah, so I just didn't show that. But yeah. So the real way that you would
express it is the right-hand side of the error is a function, a function to contracts. Yeah. Yeah.
>>: Recursion? Can you terminate [inaudible]?
>> David Van Horn: So yeah. It's -- you cannot terminate. Yeah. So we handle recursion.
>>: [inaudible] just the easiest one [inaudible]?
>> David Van Horn: I mean, let's see. So you could just write -- I mean, the simplest example is
just ->>: [inaudible].
>> David Van Horn: So you -- so we have a recursive binding, so you can say let X be X.
That's the simplest nonterminating program.
>>: Can your contracts [inaudible]?
>> David Van Horn: So your contracts -- yeah. Your contracts can diverge. And then there's
also the really interesting question of what happens when there's an error in your contract. And
for what I'm going to show you, that's sort of swept under the rug. But there's work on
accounting for that and who's to blame when a contract itself fails. And all we have to do is take
that semantics and write it down. So we can handle that straightforwardly.
Okay. And what else to look at here? Okay. And then here's our havoc function. And so it's
indexed by the type. And all we do for base types is just say that this doesn't produce anything.
Okay. And but for function types, what we produce is a thing that takes in this value and then
applies it to an unknown thing corresponding to its input type. And then we run havoc at the
output type on the result of that. Okay. So that's our havoc construction here.
Okay. Let's look at how contracts are monitored. And this will give you some more details
about how the blame stuff works. Okay. So monitor is the little monitor form that I had in the
concrete syntax earlier, and it's got these labels on it. Okay. And we've got this way of sort of
tracking the blame. And you'll see it when we check. So flat is a predicate here. So when we
check a predicate and if it fails, we're going to blame F. And the way to interpret this is that F
broke the contract with G. And the H here is really about sort of the origin of where these things
came from. It's not very important.
But -- and really the important thing here is that as far as the blame calculus goes is that when
you are checking -- when you're monitoring a function, the positions swap. Okay. So it's
contravariant for functions. So the who's to blame switches as you do a higher-order monitor.
Okay. What else to look at? Okay. So you'll notice here that I've got this judgment that says
that V -- the way to read this is that V -- we can prove that V satisfies C. Okay. So we're really
parameterized by some theorem prover here. And I should really say theorem prover in quotes,
because -- so for the simple system, this is what our theorem prover consists of, is if a value
remembers some set of contracts and the contract that you're trying to prove is in that set, then
you say it proved it. Okay. But this is I think a big opportunity for -- so this is -- we're trying to
monitor some value with a contract, but if you can just prove it, then you can eliminate the
monitor here.
So the way that we eliminate it is by remembering that you've already satisfied it. But you could
plug in a much more sophisticated theorem prover here and everything would get better. Okay.
Right. So here's the remembering. So we check a predicate and we remember the predicate that
we were checking here. Okay. All right. And these other rules just have the side condition that
you weren't able to prove it, so you actually have to enforce it.
Okay. So here's the soundness which says that if E refines E prime and E reduces to some
answer, then there exists some other answer that A refines such that your abstraction gets there.
Okay.
So I have to tell you a little about this part. And so the basic things of this refinement judgment
are just that any -- a value of some type refines an unknown value at that type. Okay. So
contracts can refine value, so you can sort of arbitrarily add contracts on the left here.
And if you want to check some expression so that the monitors of the checks can refine
expressions here and say that we can get some unknown thing that we know satisfies this
because we're going to check it here. And there are a few more things, but not really important
here. But that's sort of the basics of this ordering here. Okay?
So now let me tell you about how we scale this up to a more realistic language. So symbolic
Racket. So Racket's a LISP dialect that's in development at Northeastern and other places. So
we've got a rich language, a rich language of contracts, and we've developed an interactive
verification environment.
So when I say a rich language, what I mean is that we've got a real module system. It's an
untyped language, which just makes the problem harder. We've got real data structures, lots of
base types, lots of primitive operations.
The contract language, we've got the blame that I sketched for you, the dependent functions.
We've got data structure contracts. You can do conjunction and disjunction of contracts and you
can write recursive contracts. And I'll tell you -- I'll give you some of the details there. Okay?
And the reason why we scaled this up is going back to the sort of thesis of this project, is that this
is a really rich specification language. And the idea is that we should be able to reason better
about programs by having all of these things. So it was important for us to be able to really scale
it up to a full contract language.
>>: [inaudible] negation?
>> David Van Horn: What did we do for negation? So I think that we handle -- I can't
remember actually right now. I can't remember if there's anything hard about negation or not.
Yeah, I'd have to look.
>>: It seemed like there should be, but ->> David Van Horn: Yeah, I don't -- I think -- I think it's easy enough that I thought it wasn't
even worth putting on the slide, but I can't recall.
So here's an example where we're defining a contract for a list of natural numbers. Okay. So it's
a recursive contract which says that it's either -- so it either satisfies the empty predicate, or it's a
pair constructed out of the not predicate and a list. Right? This is the recursive thing here.
All right. And now we've got a module here that's providing two things, insert and nums. And
you'll notice there's no definitions in this module. So this is how the symbolic things get
introduced, because insert and nums are going to be treated symbolically. And insert is a
function that takes a natural number and a sorted list and it produces a sorted list, okay, and nums
is just some arbitrary list of natural numbers.
Now, the module that we're sort of -- where we're interested in verifying is an insertion sort
module. So it requires the stuff which we don't have definitions for, so they get treated
symbolically. It defines a fold, sort of the usual fold left higher-order function here, which folds
this function over a list and some base value. Okay. And then so the way that you write
insertion sorts is you just fold insert over the list and the empty list as your base value.
And we provide sort at this contract which says you give me a list and I'll give you back a sorted
list. Okay. So we're not verifying so sort of full correctness of this thing, but -- because it could
give you back any sorted list, but -- but this is the kind of contract that you might think about
writing down and then erase it because it's going to be too expensive to actually enforce at
runtime. Okay.
And so what we can do is we can just type this into our REPL and say sort this arbitrary list of
numbers. And what you get back is a sorted list of numbers. And, again, what's important is
what's not on the slide. Right? There's no -- there's no coulder [phonetic] of empty errors,
there's no applying something that's not a function, right, so there's no kind of runtime-type
errors, anything like that. And also we have this added property here that we know that the list is
sorted that we get back. Okay.
>>: Can you [inaudible] contracts by other contracts? So could I [inaudible] list contract?
>> David Van Horn: So you would have to -- so polymorphism is difficult to enforce at runtime.
But there's been work ->>: Not necessarily [inaudible].
>> David Van Horn: Yeah. So there's been work on it. So I think one of the lessons here is that
if you can write down a reduction semantics for the original language, we think that it's sort of
straightforward to turn it into this symbolic kind of thing. So if you can write down -- you know,
whatever kind of runtime enforcement mechanism you can write down ->>: So I could say define contract, list C, and take another contract as a [inaudible]?
>> David Van Horn: Yeah.
>>: And [inaudible].
>> David Van Horn: Yeah. So in fact in the actual Racket language contracts are just values in
the language. So you can write -- you just write that as a function. Yeah. And we can do that.
And you could also write down things -- this could be a dependent contract. Like one thing that
you might want to add to this specification is that the result is always a permutation of the input.
And you could write that down. Our tool is not going to be up to the task of verifying that part
of the property. It will just verify this part, and then you'd have to leave in the runtime
mechanism for checking the permutation aspect of it.
>>: [inaudible] tell you that it was able to verify this part of the specification and [inaudible]?
>> David Van Horn: So in some sense, because you can look at what are the possible contract
failures. And so you should be able to see that it never fails with it being not a sorted list. It
would only fail with the permutation part.
>>: That's good.
>> David Van Horn: Okay. So this is out of this OOPSLA paper, and there's some more -some more details there, if you're interested. I'm going to tell you a little bit about the
verification environment.
So we're able to verify programs like this. So this is a program that our first-year students write.
It's the snake game. So this guy is running around on the screen and trying collect this food at
which point it grows and it dies if it runs into any of the walls and it dies if it runs into itself.
Okay.
So it's an interactive event-driven game where we're registering callback functions here to this
event loop. We've -- it's broken up into a bunch of modules. We've got some contracts on the
things that each of these modules provide. Okay. And if you wanted to verify this program, the
steps that you would go through is, okay, so we've got one module that's sort of providing all of
the image primitives. And it's really just requiring this underlying library, image library, and
wrapping them with contracts. Okay. So if you wanted -- and you'll notice that -- you know, so
we're writing this in Racket. And if you wanted to verify it, you would just change the language.
So this is our verified by abstract reduction language. You comment out the require to the
primitive library. So now this module exports a bunch of things but doesn't define them. So
they're going to be treated symbolically. And then you run it. You run it and you notice that
there are no errors.
Okay. So we can verify these contracts which are somewhat sophisticated. So a position is a
pause in structure with two nonnegative integers in it, for example, and that a -- so we've got
these kind of ad hoc data definitions, things like the snake is a nonempty list of positions and so
on, and we can verify all of these properties in this program. Right. And those were the only
changes we had to make. All right.
>>: This [inaudible] written by some student or [inaudible]?
>> David Van Horn: So it's -- it -- it could have been written by some student.
>>: [inaudible].
>> David Van Horn: So the tool could -- I mean, I think it needs some work before it was
unleashed on users. In particular we don't give -- we don't give great feedback when -- so one
thing that we do a really bad job of, like if you leave a paren off or something.
>>: A what?
>> David Van Horn: A parenthesis off or something. You've got some syntax error, like you get
this awful, awful error message, okay, and Sam and I have gotten really good at interpreting
what went wrong there. So I would never -- at this point I wouldn't give this tool to students to
write.
>>: Suppose the syntax is okay.
>> David Van Horn: Yeah.
>>: How do you [inaudible] how do you [inaudible] the messages? How complex [inaudible]?
>> David Van Horn: I should have had an example, but they're -- they're -- they're similar to the
sort thing where you just get -- at the bottom in this sort of interaction window here, you get a
list of possible results.
>>: Okay.
>> David Van Horn: Which could include error results. So you run your thing, and if you get
some errors at the bottom, it will tell you -- it looks a lot like a contract violation in the real
Racket language. So it says ->>: [inaudible] give you, for instance, the [inaudible] how you propose feedback for the user or
[inaudible].
>> David Van Horn: So we have a couple ->>: It's very complicated, so to understand what's [inaudible].
>> David Van Horn: Right. So let's see. So -- so one thing -- so the sort of immediate feedback
is just the list of possible results are printed down here.
Now, another thing you can do is you can add a keyword up here that says you want to see the
traces. And at which point it produces windows that look a lot like my first slides with the -those diagrams. And you can also do a different tool for -- to get an algebraic stepper. So you
can step through the thing and watch where it's splitting and so on. So we actually have some
decent tool support for exploring the semantics here.
>>: Do that [inaudible] come out of [inaudible]?
>> David Van Horn: Yeah. So we're leveraging -- so all of those things are -- we're really just
leveraging the Read X toolkit for our language here. Yeah.
>>: When you do the symbolic execution and explore all these different paths, are you
guaranteed that it won't terminate if the program [inaudible]?
>> David Van Horn: Yeah. So that's a good question. So no. Because in particular you could
just -- you could write down a program and not abstract anything and just run it. And of course
it may not terminate. So yeah. So that's a good question. I think -- I'm going to come to that.
Let's see. Nikhil, how much time do I have?
>> Nikhil Swamy: You should look to start wrapping up.
>> David Van Horn: Okay.
>> Nikhil Swamy: The room till 12, but ->> David Van Horn: Okay. All right. So I think I'm going to skip over most of the formal
details of the sort of rich Erlang here. Yeah. I think you sort of get the essence of it from the
example. So let me just skip through this.
Yeah. So this is really just ->> Nikhil Swamy: In a sense [inaudible] it's just three of us, it depends on you guys, how much
time you have.
>>: I don't care if you want to continue.
>> Nikhil Swamy: You're okay too?
>>: [inaudible].
>> Nikhil Swamy: Okay. So then we have [inaudible].
>> David Van Horn: Till 12? Okay. All right. So let me just try to -- then let me just try to
give you the highlights here. So here's our -- here's our scaled-up language where a program is
really a set of modules and some top-level expression. And a module is just a name, a contract,
and a value. And this should also include that this is an unknown value. Looks like there's
something missing here.
When we've got a -- so module references are sort of syntactically distinct and they're labeled
with where they occur. So this is a reference to some module definition F, and it occurs in some
module or top-level called L. Okay. We've got lists in our language, we've also got pair
contracts, disjunction, conjunction, and recursive contracts. These things are not so simple to
handle.
So one thing that you'll get out of this part of the talk is what are the semantics of these things.
So here the way that programs reduce is we've got some module context which really just serves
to resolve the module references in our program, otherwise we've just got a -- just a step relation
here. Okay.
So here's our -- sort of the obvious reductions, okay, so function application, remember we're in
an untyped setting now, so you could apply something that's not a procedure and you get blame.
And here there -- it's a slightly weird blame label here because what you're saying is that this
expression broke the contract with the language in some sense, which is a runtime-type error,
right, because it wasn't a procedure.
We have really two judgments in this system. So one is the -- this value -- we can prove that this
value always satisfies this contract, and we also have a refute which says that this value never
satisfies this contract. And they're not just negations of each other. Because you can have
something that neither proves nor refutes. If it's some unknown value, for example, well, it
could be that it satisfies a contract or it could be that it doesn't.
>>: [inaudible] which is that you never execute this contract, you never -- so it was the case you
obtained before, you never are going to [inaudible] in general.
>> David Van Horn: So, I'm sorry, what was the question, though?
>>: Okay. So [inaudible] I think the statement you have written here is that when I have a
contract, I have some context and try to see which is the outcome.
>> David Van Horn: Right.
>>: And so you say, okay, [inaudible] means for every execution, if you reach this point, it
proves it, because I think you've got to prioritize by not having extra.
>> David Van Horn: Yeah. So these things are ->>: If I reach this point, it's true.
>> David Van Horn: Yeah.
>>: Then you have a -- the [inaudible] if I reach this point, other times I reach this point it's
false, suppose that you [inaudible] you divide by X. So every single time it's [inaudible].
>> David Van Horn: Yeah.
>>: Then you say I don't know.
>> David Van Horn: Yeah.
>>: And then there is a fourth case, which is I never reach this point.
>> David Van Horn: So that's handled at another level. So these things are only used when we
reach a point. So you sort of avoid making these judgments if you never reach them.
>>: Okay. Increase [inaudible].
>> David Van Horn: Yeah, yeah. But -- yeah, yeah.
>>: Okay.
>> David Van Horn: So it's either I know I satisfy it, I know I don't satisfy it, I don't know
either, or I never get here.
>>: Okay.
>> David Van Horn: Yep. Okay. And just to give you an example, I mean, this is how like the
primitives are interpreted with respect to this theorem prover. So if we know some value proves
that it's a cons, a pair, then the way we interpret the car, right, which is pulling out the left
component of the compare, is we project out the left component.
And you need this -- you can't say that this is literally a pair, because it could be, for example, a
symbolic -- it could be a symbolic cons and things like that, so you need this sort of projection
helper meta function here. But if I know it's a cons, I know I can project out of it. If I know it's
not a cons, I know that it's just a blame. If I don't know either, so I write it with this question
mark, I project and I blame. Okay?
Module references. So there's really three kinds of module references, right? Remember this is
if -- this is a reference to F occurring in the module labeled here. So this is a self-reference. And
notice that there's no check. So we're referring to F and it's got some contract, but we don't
check the contract, because you think of the contracts as being established between module
boundaries. So if you got a self-reference, you don't check the contract. That's just the way that
the contract system works.
This is a reference to F occurring in G. So we take the contract on it and we check it when you
make that reference. And this is an external reference to something where there's no definition
available, so we check it against this abstract thing here.
Now, here's -- so here's how the contracts are checked. You really need to have this distinction
between a flat contract and a higher-order contract. So a flat contract is just a contract that -- I
mean is -- like predicates are flat contracts. There are things that you can just check at the point
that you want to monitor them. Okay.
So if C is a flat contract -- and for now -- okay, so what's flat? So a predicate's flat, a pair of flat
contracts is flat, a conjunction of flat contracts is flat, and so on. So if it's flat and we can prove
it, then you just eliminate it and remember that you've satisfied it.
>>: So, David, I'm confused by the previous slide, actually, so the one where you had this
checking self-references.
>> David Van Horn: Yeah. Here.
>>: And you were not going to do any check.
>> David Van Horn: Yeah.
>>: Is that the one?
>> David Van Horn: Yeah.
>>: So why is that -- why is that sound? I mean, if I have a module that says I'm going to
provide you with a positive integer and the model is F and the implementation is that it's just
going to return F dot F and run 0 instead.
>> David Van Horn: Yeah. So -- so what -- so what happens is that you -- I mean, the way to
think about it is that internally to your own module, you don't have to -- you can break your own
contract. And that's fine. But as soon as somebody else uses your module, there's going to be a
contract boundary established. At which point this doesn't -- this isn't the case anymore and it's
going to be checked here, checked here.
>>: And so you get blame at which point? When G calls F and tries to use [inaudible] the
supposedly positive integer that's actually 0?
>> David Van Horn: Right.
>>: And then you blame F?
>> David Van Horn: Yeah. So yeah. Yeah. That's right. So yeah. If F -- if F doesn't live up to
its contract externally, then it's going to be blamed. And internally it can do whatever it would
like. Right. So you could have -- you could, for example, provide some function that says you
have to give me the input 5, and then it recursively calls itself with arguments that aren't 5 all the
time, and that's fine, so long as externally you're always given 5, and likewise with the returns.
And this is -- I mean, this is just a design choice that comes from the way that contracts work in
this language. And you could make a different design choice if you wanted. And if you wanted
to check that contracts held within a module, you would just change this rule.
But there's no way to sort of break the module-level contracts. Okay. So this is when we can
prove that we can eliminate it. This is when we know that it fails so we blame. And this is when
we don't know so we have to enforce it. So what this FC function does is it compiles the contract
into a predicate.
So the way to think about a flat contract is it can be compiled into a predicate. Okay. And this is
the compilation for it. Okay. So if it's a recursive contract, you just move inside. I don't know
that there's much that's interesting here. So if you have -- so flat contracts are easy. So if you
have a conjunction, you just compile it to a function that checks the first one and then checks the
second one. Right.
So this is just turning all of these things into predicates. If you've got a pair, you turn it into a
function that asks if it's a cons and if it is, then it applies the first contract to the car, otherwise
it -- right, and then it applies the second part of the contract to the cutter. That's the easy part.
Here -- so the distinction between flat and higher order is that a higher order has to be potentially
partially delayed. Okay. So here is a dependent function contract checked against a value where
that value is a procedure, so we just do the sort of eta expansion here, the driving down the
contracts, and we blame if it's not a procedure. And this is all -- we sort of pushed all of the
work into delta telling us whether this thing is a procedure or not.
Now, you can also have -- you know, so a pair of contracts can be higher order if one of the
components is higher order. So this is how to deal with sort of composite higher-order contracts.
So if you've got a pair, then you produce a new pair that monitors both components. And if you
can prove that this thing is actually a pair and blames if it can't and so on.
So conjunction of higher-order contracts is sort of straightforward where you monitor both
contracts here. Disjunction is the more interesting one. So the disjunction of two higher-order
contracts, it's really -- it's not clear what the semantics of that should be and who's to blame when
something goes wrong. Okay.
And so what Racket and other languages with these higher-order contracts do is they make a
restriction where they basically say that you can make a disjunction of contracts but only one of
them gets to be higher order. So you can have a flat contract and a higher order contract, but
they can't both be higher order. Because there really -- there's no good answer for what to do in
that case.
So we make the assumption that the left side is the flat one and the right side is the higher-order
one, and so we just compile the flat one and then monitor the higher-order one.
>>: So suppose -- so taking C or D as being -- suppose I had [inaudible] or string error string,
then why can't you, you know, decompose it by saying I'm going to test int or string on my
argument, and depending on the one that succeeds, I'm going to remember what should be the
check that's done on the outside when the function returns.
>> David Van Horn: Yeah. Yeah. So what happens in that case. So I think that that -- at that
level it's -- you can work it out. But then when these things get more higher order, it becomes -it becomes more problematic. I can -- I'll have to dig up the details for this, because ->>: Is it just because you need a bit of state to figure out [inaudible]?
>> David Van Horn: So that -- yeah, that's one -- that's one aspect of it. There's more to it. I'm
just -- I can't -- I can't recall the details right now. But, again, this has nothing to do with sort of
the symbolic stuff. This is all just about the design of this contract language, which are design
decisions that we didn't make, right, we just wrote down the semantics of what Racket actually
does, and it makes this restriction where you can't do a disjunction of higher-order things.
Okay. And here is our applying unknown functions, and it's similar to before where we're just
getting the range and havoc in the other place, but here we have a sort of more complicated
havoc construction which has to decompose pairs and apply functions. Because both -- now we
have new kinds of behavioral values, right, you have functions and also lists can contain
behavior because they can contain functions. Okay.
And then the way that you run a program is you have to first -- so the important thing here is that
you do the sequencing where the first thing you do is you take all of the modules and you put
them in a havoc context. Because the module is providing something and it doesn't really know
how it's going to be used, so we have to explore all of the behavior there.
And this is just a ramped-up statement of the soundness theorem for this language. But it's the
same as before, basically. And then our corollary about, you know, if you don't see blame, you
know that you can -- no matter what you plug in that you can't blame the program you know
about. So that's what this is really saying here.
So if F is the name of the program that we -- that's concrete, meaning we know its definition, if
the program doesn't reduce to a blame of that module, then there's no way to plug something in
so that F is blamed. Okay. So this is really being precise about who's at fault, and really we're
reasoning about the program that we know about and ignoring the programs we don't know about
okay.
All right. So this was original goal, right, was an automated modular verification of higher-order
programs with contracts. And, you know, I'll say that we've succeeded, but there's some caveats
here. So one is that there's lots of room for improvement.
And I think the biggest room for improvement, and it's something that, you know, I think you
guys probably know a lot more than me about, is how to do this part of the puzzle. Okay. So
maybe you could plug in Z3 here and get some interesting results. And it's all going to -- I mean,
as soon as you do something better, everything -- the whole thing just gets better and there's a
nice -- I predict that there's a virtuous cycle that's going to occur when you start doing a better
job about what theorem prover you plug in here. Because we have really just a simple theorem
prover that we wrote ourselves.
And then the other question is, you know, well, is it really automated because it's not decidable.
And, you know, so if you think automated equals decidable, then you might think that we've sort
of -- we haven't really achieved that. Okay.
So I was going to talk a little bit more about how to make it decidable. I'll just -- I'll just step
through that real quick and then we'll wrap up. So the way that we're going to do it relies on
work I've down with Matt Might and more recently with Ian Johnson. And it was really the -the sort of genesis for this work was thinking about how do you do program analysis in a
modular way where, you know, the sort of classic approach is you write down some whole
program analysis and then you pull your hair out thinking really hard about how to make it
modular and then you write it down.
But Matt and I have this sort of turn-the-crank style approach to going from, say, a reduction
semantics to a program analysis in a systematic way. And all of the stuff I've been showing you
is just a reduction semantics. So we've sort of solved the modularity problem up here, and now
we should be able to just turn the crank and get out a computable program analysis from it.
And so typically this is applied to whole program semantics, because, after all, what else is there,
right? And we've applied this to Racket in the past and Android and JavaScript. There's a group
at Harvard that's applying it to an intermediate language for Coq and there's a group at Oxford
that's applied it to Erlang.
So yeah. So this stuff is fairly useful and easy to apply. But another nice thing is that we can
take the semantics that I've been talking about and plug it into this pipeline and what you get out
is the sound and computable modular program analyzer.
Okay. I was going to tell you some more of the details here, but I'm just going to skip this stuff.
I think the big point here is that we're really just writing down interpreters and you iterate them
to a fixed point, and you get this -- you know, these big graphs and these state graphs and so on.
But they're always finite. But for me the big insight here is just this is just an interpreter and ->>: So why do you say they are always finite? You can go forever [inaudible].
>> David Van Horn: I skipped a step. I skipped a step. So I've got this machine semantics, and
it's set up in such a way that the only approximation that I'm going to do is say that the heap is
finite. Okay. And what that does is that sort of collapses the whole state space down to a finite
space.
>>: [inaudible]?
>> David Van Horn: And then you have -- and then you have [inaudible]. Then you have to
deal with the base values as well. So making the heap finite is enough to make your function
space -- all of the set of functions that you can represent is now finite, which is sort the piece that
I'm most concerned about. And then you have to do some -- you have to have some abstract
domain for the base values, like natural numbers and strings and so on. Okay.
So the components are -- sorry, let me back up just a step. So you've got this machine and it's
parameterized by functions that do push and bind. So push is like a stack push, but it's being
threaded through the heap here. So our stacks are always finite. And binding is doing variable
bindings, and that's also threaded through the heap here. So that is also ->>: Sorry, I [inaudible]. So if you suppose that [inaudible] how you abstract it [inaudible].
>> David Van Horn: So you -- if you have the Fibonacci function, then you're going to make -so here let's look at ->>: You don't know what is the initial value, so essentially you cannot go forever, so at some
point you should [inaudible].
>> David Van Horn: So the way that we -- the way we would do it is you would say Fibonacci
of some natural number ->>: Okay. So you -- the natural number is given [inaudible].
>> David Van Horn: Yeah, it's an abstraction of the input.
>>: Okay.
>> David Van Horn: Okay? So now you run it.
>>: Okay.
>> David Van Horn: And as your machine is running, it's going to have to allocate stuff. Like
every time it makes a recursive call, it's going allocate something. So you know the recursion
bottoms out at some point because you run out of space. So here's a ->>: So essentially you are being bounded [inaudible].
>> David Van Horn: In some sense. Yeah. Yeah.
>>: So you are not sound.
>> David Van Horn: No, it's sound. That's the difference.
>>: Okay. So at some point you should [inaudible].
>> David Van Horn: So okay.
>>: You cannot just stop N steps.
>> David Van Horn: Yeah, yeah, yeah. Okay.
>>: This would be unsound. So at some point you say okay, because if you're going [inaudible]
you should [inaudible] try to see ->> David Van Horn: Okay. So right. So push and bind are going to -- yeah. So we've got this
finite heap, and now the other important thing is that the heap -- okay. So let's say you go to
write into some location in the heap and there's already something there. Okay. So what you do
is you just join them together. So you take a -- the heatmaps to sets of things, and when you -when you have a conflict, you just join them together. And now you've got a set of things there
[inaudible] ->>: [inaudible].
>> David Van Horn: But the set of things you can put in there is finite, so they can't grow. They
can't grow forever. I can work out all of the details for you. Okay?
>>: [inaudible].
>> David Van Horn: But the sort of important point is that when you go to look something up in
the heap, there's now a set of values there and you just nondeterministically choose something.
So you go from a deterministic infinite state machine to a bounded nondeterministic machine.
Okay? And I'll tell you all about the details.
>>: [inaudible].
>> David Van Horn: Yeah. So if you wanted to get an interpreter out of it, your push and bind
are just going to produce fresh addresses. And if you want an abstract interpreter, like if you
want to do something like 0 CFA, then you're just going to choose like -- so push is going to
choose the label of the call site and bind is going to choose the variable name and so on. Okay.
And then you just iterate these things. Because you know that it's finite, you just iterate them to
a fixed point. And you get things like this. All right?
And the big insight for me was that we're just talking about interpreters. Okay? I mean, they're
abstract interpreters, but they're just interpreters. So you can bring the stuff that you know about
interpreters to this problem space. So it's just an interpreter. And one of the things -- so if the
heap is your sort of resource for precision, right, and the way that you get these -- the -- these
splits in the graph is that you -- you had two things sitting in the same heap location.
So you've got this finite resource, which is the heap, and you lose precision when multiple things
reside in the same heap location. But it's just an interpreter. So let's -- I don't know, let's write a
garbage collector. All right? And this is the kind of thing that I mean by -- observing that it's
just an interpreter means that you can take stuff that you know about interpreters and bring it
here. And this is just one idea.
So write a garbage collector. And what happens is this thing -- you get a much tighter
characterization of your program here because you're able to collect space that wasn't reachable.
And then when you go to write something into it later, it's the one thing that's there.
>>: But I'm still confused because I thought the point of keeping multiple things in the heap was
to -- was not -- was also for soundness. Right? So if you just going to blow away stuff that was
there previously, you [inaudible].
>> David Van Horn: Yeah, but it's a sound blowing away. Because you run your garbage
collector and it says I can't -- I can't reach this in the future. I mean, it's sound in the same way
that garbage collection is sound, right? I know that this location isn't reachable, so blow it away.
And then later if I need to write something into that location, that's fine. And there just won't be
anything sitting there when you go to write into it.
>>: Okay. I guess I don't have the intuition [inaudible].
>> David Van Horn: Yeah.
>>: Just say [inaudible] am I locating now I know that this [inaudible].
>> David Van Horn: Yeah.
>>: [inaudible] because this is there but it's a [inaudible] because maybe something located
inside a procedure, and then this thing escape, does not escape, so it can just get written
[inaudible].
>> David Van Horn: Yeah.
>>: [inaudible] heap analysis. You get rid of the original [inaudible] smaller.
>> David Van Horn: So another ->>: There was a paper in ACS [inaudible] student maybe they were using some [inaudible]
technique for their JavaScript [inaudible].
>> David Van Horn: Okay.
>>: Just how thaw do some kind of [inaudible].
>> David Van Horn: Okay. I'll have to look at that. So I know this stuff from Matt Might's
work on abstract garbage collection. Yeah. Another insight is that this thing is just a finite state
machine. And you think okay, so I'm approximating this Turing complete language with a finite
state machine, which is a little bit like bringing a knife to a gunfight. And you can think about
richer models here. So one is like why not use a pushdown automata.
And that has the nice property that now in your abstract space you're going to -- the stacks on
your pushdown automata match the stacks in your program. So you can do a really precise job
of reasoning about the stack here.
And you get a -- I know it's light, but you get a nice -- a nicer, tighter approximation of the
program as well. And then a sort of natural thing that you might want to do is to use both of
them. And there's a technical problem that you've got to solve here, was if you got a pushdown
automata and you want to crawl the stack to do garbage collection, that's not something you can
typically do with a pushdown automata.
>>: Sorry, I lost the intuition for this slide. So what's at the top?
>> David Van Horn: The top is just original finite state approximation. One is using garbage
collection ->>: [inaudible].
>> David Van Horn: The other is using a pushdown approximation, and now we'd like to ->>: [inaudible] more precise than the other ones [inaudible].
>> David Van Horn: They're really kind of incomparable. And then you'd like -- you'd like to
use both of these techniques. And Matt and I have an ICFP paper on how to combine these.
>>: So what you prove is that essentially you can get [inaudible]?
>> David Van Horn: That's more precise than ->>: It's more precise than [inaudible].
>> David Van Horn: Yeah.
>>: Because you have fewer states and ->> David Van Horn: Yeah.
>> Nikhil Swamy: I think we should wrap up.
>> David Van Horn: Okay. So let me just -- let me wrap up. And there are other -- there are
other things that -- sort of optimizations that you can -- that you can start doing once you just
start seeing this as an interpreter, like writing a compiler rather than an interpreter, and it
improves the performance. So we get better memory, the transitions are faster, and the overall
analysis time is dramatically improving here. So this is two to three orders of magnitude better.
Okay. So it really can be automated. Just by composing with this stuff.
All right. So I showed you about our verification environment. I talked briefly about our sort of
approach to making these things computable and fast. And there's some papers -- this OOPSLA
paper, this is the paper on combining pushdown analysis and garbage collection, and this is the
paper about making these things fast. That's just a draft that's on archive these days. So thanks.
>> Nikhil Swamy: Thanks a lot.
>> David Van Horn: Yeah.
[applause]
Download