>> Tom Ball: Okay. I'm Tom Ball, and... He’s well-known to everyone at Microsoft Research. He was...

>> Tom Ball: Okay. I'm Tom Ball, and it's my great pleasure to have Roman Manevich back. He’s well-known to everyone at Microsoft Research. He was here at early days of program analysis and Programmer Productivity Research Center back and visiting as an intern all the way back in 2003. He is a professor at Ben-Gurion University of the Negev in Israel and he'll tell us today about shape analysis and how to take that towards proving things about terminating programs. Welcome back. >> Roman Manevich: Thanks, Tom. Thank you for inviting me. It’s good to be here again. So this is joint work with my colleague Noam Rintezky from Tel Aviv University and our coadvise[phonetic] student Boris Dogadov. So let's just start. So what is the research question that we are interested in? So we want to actually verify termination for programs that manipulate link listed structures and allocate memory and so we talk about their structures such as data structures like lists and trees but also unstructured ones like just arbitrary graphs. So we assume you can allocate and even de-allocate memory, you can have destructive updates, and this is I guess especially relevant for Microsoft because this kind of the low-level code and low-level manipulation of point just takes place a lot in device drivers and actually after talking to a few people here I actually started this kind of small divergence. So I think some people are thinking that maybe automatic verification is kind of doomed and it's kind of nice to publish papers but maybe it's not really going to make it out to the masses. And then I started thinking maybe this is all kind of useless and it's all just academic. But then I reassured myself and I thought to myself even if verification for the masses like automatic verification never makes it I think these kinds of techniques will still be useful. I think you can still repurpose them in two types of ways. So you can come up with high level languages and do these kinds of automatic reasoning techniques for synthesis and you can also use them for sophisticated optimizations which current de-compilers actually don't do. So that's what I think. So I'm hoping that in the future the programming community will keep writing programs and at some point the level of obstructions will go up and maybe at this point these kinds of reasoning techniques will somehow kind of resurface and will still be useful. So we are interested in whether a program terminates or not. And of course, as you know, this is undecidable so we're not going to try and solve it all the time but of course it doesn't mean that we can never decide whether any program terminates. This may become trivial to some of you and just assume that maybe someone is watching this that doesn't have to maybe all the background. So you can always consider trivial programs and you easily verify termination for them. One other thing that we do is that we allow the algorithm, so it's not a decision procedure so we allow the algorithm to say yes, he program always terminates at every output or no meaning that we managed to prove either by counterexample or by some other kind of argument that there is enabled for which the program just goes into an infinite loop, but we can also say kind of give up and say we're not really sure. And of course the idea is to limit the number of these kinds of don't knows to a minimum and especially for the types of programs that people actually write. So the startup of programming you think is interesting and then this work we actually we make use of a particular programming style for writing programs that [indiscernible] dynamically allocated memory and we use that to build our termination analysis so you'll see that later. So I’ll just summarize the work very briefly and then I’ll start going into more details. So what we have is actually, so if you are familiar with termination analysis, so to prove termination you have to come up with some kind of a wrecking function or some kind of metric that decreases it but always remains non-negative. Usually when you deal with dynamically allocated structures, so if you look at different works and I'll show you some examples, you immediately start thinking about different types of wrecking functions, but what's nice about this work is we have a very uniform type of wrecking function which is basically oblivious to the type of the structures. So in some sense it works if the data structure is a list, is a tree, is a general graph. We don't try to customize it for the data structure. It just kind of works. I’ll show you that later. So I think that's a nice aspect and so regarding the particular class of programs that we want to handle we assume that the programs are written in some kind of a modular way and are defined more precisely later and we take advantage of that. And the last thing which I think is really cool is that our analysis is parametric in an underlying shape analysis. So we don't build our termination analysis into an existing, we haven't started from a given shape analysis and kind of tweak and kind of change it to do termination analysis. It’s actually independent so we can plug in different types of shape analysis. So if you have a particular type of program, a particular class of programs that manipulates different types of data structures and certain shape analysis works for that you can use that and if that changes you can change the shape analysis, you can mix them, we don't really care how the shape analysis works if it can work by doing a global fixed point analysis or by abstraction or different stuff and we don't care we can just use it. So it’s kind of independent of that. So what is shape analysis, in case you don't know, is a static analysis of programs which takes a program, of course the program has to manipulate, it's interesting stuff if the program manipulates a linked data structures and some property. It can be different types of properties that can be kind of low-level memory safety like no null dereference, no double-free, or it can be a slightly higher level data structure environment such as I’m manipulating a linked list. Acyclic should always be acyclic so stuff like that. Then, as you can see of course what I want to know is whether the program size [indiscernible] property and this is still undecidable and we compromise in the usual way. So we can either say yes the program meets the specification or no it definitely doesn’t for something you can say okay, I'm not sure. And of course we want to limit the cases for which we are not sure. So there are quite a few different types of shape analysis out there but I think we can kind of classify it in kind of a really broad way. So the three on the top, so separation logic is very popular and tree-automata is something called tree rewriting systems. And these are types of shape analysis which are really good at handling data structures which are mostly inductive. So inductive and maybe just kind of go out of the inductive nature just a little bit. And then there are lots of different shape analysis that I haven't written it down but there’s a frame called 3valued shape analysis and it tells lots of instances. So that kind of shape analysis can handle quite well their structure which are unstructured but you can also trick it to kind of do structured and you can also use to handle inductive data structures. If you look at all these types of shape analysis from a very specific angle, and I’ll give you the details later, a lot of these shape analysis are really what I'm calling region-based shape analysis and that means that the way they perform obstruction is by taking the memory and collapsing sets of objects together and producing some kind of a shape of the description that represents infinite sets of memory configurations. This is something which is important to me because our analysis actually relies on that property and I'll define it formally later. So that's what the shape analysis does. But actually, in this work we actually don't really care whether the answer is yes not null or no. So another thing the shape analysis can do is I can actually output other types of information. So one type of information is an abstract transition relation which is very standard and it’s not really related to shape analysis. So abstract transition relation is a finite graph where the nodes are object states that represent potentially infinite sets of states at a given program location and the edges between them represent how the statements actually take you from one abstract state to another, and that's very standard and you can ask shape analysis to output an abstract transition relation. It's really very reasonable. So it's a finite graph like this. So a region transition relation is a transition relation on the lower level of granularity. And what that does, so remember I said that all these types of shape analysis are really region based, so if you transition from one abstract state to another and each of abstract state has a finite state of these regions which represent sets of objects you can ask it to say okay, how do these regions map across a program statement? If you're not sure exactly what it means I'll show you examples later. But if you understand just roughly that's fine. So all I want to say at this point is that null shape analysis currently doesn't give you a region transition relation but it's really easy to get it. You can easily tweak them and get it to get it. So with it for TVLA and it's easily done for separation logic and also for the other types of shape analysis. So we need these two types of inputs and we'll build our termination analysis on top of them. So what’s a shape termination analysis? So it's nothing more than a termination analysis that is geared for programs that manipulate dynamically allocated memory. So that's what you want to implement. And just to give you an idea, so here's a program, so this program implementing InsertionSort. So it takes an acyclic list of cells that have data values and it produces an output list which, so it's a copy of the input list but it's sorted. So you can see that it has dynamic allocation, so it has destructive updates naturally, it has loop nesting which is kind of interesting, and all sorted before that. So these kinds of things make this problem challenging. So just roughly if you look at it we have the outer loop and that outer loop goes over the input list and the inner loop looks for the right position to insert an element and once it finds it it kind of kind figures out how to tweak the output list whether it's in the head or in some inner position just insert it. So that's all it does. And, sorry I keep flipping back and forth, but maybe I'll just make a small point here. So usually the way you think about termination you'd say roughly that what happens is that each loop has this point as it kind of goes from beginning to end or stop at some point and you’ll use kind of a wrecking function which is basically some kind of a lexical graphic wrecking function which is built on top of these individual functions. So either X or Y so that's what you usually do but that's not what we do. So if you take kind of a system view of analysis that's how it looks like. So it has two stages. So you take an origin-based shape analysis, and I don't care how it works as long as you just run it, and if it gives you an abstract transition relation in the region transition relation then we'll use that and what we'll do is we generate a finite system of constraints and those constraints will be in the logic called Quantifier-free Presburger Arithmetic which luckily for us it’s a decidable logic so you can say feed this formula into a solver like Z3 and then if the solver says that the formula set is satisfiable that means the program terminates for sure but if it’s not satisfiable then we kind of give up. We don't know what to do. Ideally what we would do is we would somehow use that information to refine the shape analysis but we haven't done that yet. So ideally we would somehow close the loop. I think this is really nice because you can just run a shape analysis and we automatically kind of upgrade it to a termination analysis kind of in some sense for free so you don't have to worry about how that's done and you can change it. So what are the main results here? As I told you we have this parametric approach. We start with any shape analysis and then kind of upgrade it to shape termination analysis and this way you can handle lots of different types of programs and shape invariants depending which shape analysis you started from. So you can use it to reason about inductive data structures and just general graphs; and the other contribution is a new type of wrecking function which is really oblivious to the type of the structure of shape invariants and it works really well. It's guaranteed to work for a certain class of programs. In some sense it's relatively complete for a particular class of programs which is quite wide and seems to be quite natural. So all the programs we've seen so far we didn't have to change them, they just naturally fit in this class, and so there are programs outside that class but it's a very natural. And we have a nice optimization so the analysis is actually modular and I'll talk about it later, but modularity means that we can basically prove termination for each loop whether it's nested or not and each recursive function comes separately so that makes it more efficient and we also handle recursion. So we are one of the few analysis that do it and actually document it. And we tested it on a small set of benchmarks. So these benchmarks are kind of modest but they’re varied and they’re interesting. So that's basically the whole thing. Now I'm going to go into technical details. Do you have any questions so far? >>: Can you actually produce an input for once the program [inaudible] terminates or you don’t have that? >> Roman Manevich: Can you show counter examples? No. So as you see usually there are these three like yes, don't know, and know and when I show the system you see I just gave up on the no. >>: [inaudible]? >> Roman Manevich: Yeah. We just give up. It's interesting to actually, so supposedly we can look at the output of the solver and maybe find some kind of unsatisfiable core and try to map it back to the program but we haven't done it just because we didn't get around to do it. I haven't thought about that deeply. We started thinking about different aspects. But I think it could be done and it could be nice to kind of maybe use the unsatifiable core and to somehow guide the refinement of the shape analysis. That could be really interesting. So now I'm just going to go into different types of details. So the first detail is to show you our new wrecking function which is really simple. So this is the inner loop of the insertion sort. So you see what it's doing is that it’s just testing whether the next point of Y is not null and some other condition that has to do with the data. If that holds it just advances Y and it can only happen a finite number of times. So now the question is why does this program terminate? So any ideas? It's not a trick question. It's really easy. It's actually trivial. >>: [inaudible]. >> Roman Manevich: Because what? >>: Because the list keeps getting shorter. >> Roman Manevich: Yeah. So that's what I expect. So that's a standard argument. So is this suffix of the list from Y to null and that keeps getting shorter and I think that’s the actual type of wrecking function that you think of. You try to build a wrecking function that somehow matches the shape of the data structure. So that's what we usually do. But the problem with this kind of thinking is subjective, but if you start thinking this way that means that you have to start looking for, so every time you change your data structure you have to start looking for different types of wrecking functions that have to do with the size, with the number of unfoldings, with the height of a tree, and you have to somehow search for wrecking functions to synthesize it from that type of thing. And usually what you have to do you build into your analysis some kind of additional type of information to keep around that actually keeps track of these measures and then usually what people do is they kind of enhance their shape analysis to track these additional metrics and then they map it all down to some kind of numeric program and then termination for metric programs people have done that. So they just kind of reduce it to that and just terminate. That's what you usually happens but that's not what we do. We do something completely different. So what we say is here the reason this terminates is because this program runs in linear time. So what happens is that the number of memory axis is actually linearly bounded by the running time of the loop. That's what we're going to utilize. So you can think of kind of a counter that you establish and you always check that this counter is linearly bounded by number of iterations so what we do is almost like that. So let me show you in more detail what we actually do. So if you have a loop, let me define the term. What I showed you earlier was an instance of a linear loop. So a linear loop is nothing but a loop that performs a linear number of iterations. So if the size of the input data structure, if it has N objects then the number of iterations of that loop is literally bounded by N. That's what I'm calling a linear loop. So I don't care what happens inside, it’s just a nested loop, but number of iterations of that loop is linearly bounded. So what can I do? So we do something which is kind of not standard for our wrecking function but kind of a standard way of reasoning for our termination we reduce it down to a safety proof. So this is what we do. So what we can do is actually what I told you. We can add some logical variables. So bound is the bound on the number of iterations so there's a counter which we just keep increasing and we have to guess a constant factor which relates the size of the input heap to the number of iterations, and then all we have to do is to assert that the number of iterations is bounded from above by this bound. But the problem with this is that if we can prove that then we can prove that the program terminates, that this loop terminates. But the problem with that is somehow this iteration counter, this numeric variable it doesn't relate at all to the shape of the heap. So if you want to try to somehow relate them then you'll go back to the kind of reasoning oh, let's reason about the suffix of the list and stuff like that that's related to the shape of the data structure and that’s not what we want to do. So what do we do instead? We do something which is similar but somehow different. Instead of using a single counter we give every object a mini counter. That’s a counter that counts up to a fixed size. This is the second version of this instrumented program. So you see we guess a bound s it's just a natural number and then this is a technical issue if you don't want to think about it it’s fine, but suppose we take a snapshot of all the objects that are allocated at entry to the loop and then what we do is on every iteration we [indiscernible] choose an object and we increment its object counter. So suppose there's a logical feed count for each object and just increment it and you make sure that you choose correctly such that each of these object counters doesn’t go beyond this constant bound. So you can see that if you can prove it then the loop actually terminates. But now it’s actually nicer because instead of having one counter that kind of doesn't relate at all to the shape of the heap now we have all these counters that are kind of dispersed and you can choose which one to increment and you have to choose in a manner that somehow matches the axis better into the data structure. So that's the secret. The reason we have to take a snapshot of the allocated objects is because if you start counting objects that get allocated inside the loop then this argument would not be sound because you can always allocate and count more and count more and you can count infinitely so that's not an integer variable. That's an unbounded counter so we don't want that. That's just a technical issue. So here are the challenges. First, the question is how do we actually guess this bound? The second one that’s the second source of [indiscernible] how you choose the object correctly? So in the concrete world it's easy. You just to choose an object for which the object counter hasn’t offload yet and that's it. That's simple. But of course there are other problems which is the programs have an infinite state space. So you can kind of start seeing that very soon we get into abstraction and then your number of choices for the input object are going to be more limited so it will be more challenging. So I was showing you what are linear loops and if you understood that you understood quite a lot. Now what I want to do is actually define the problem mathematically in the sense that I want to write a system of numerical constraints such that if there are satisfiable then the program terminates. And I'm going to use the same reasoning to [indiscernible] these object counters. I'm just going to represent the problem differently. So remember the goal is to write down a system of constraints and what the system of constraints expresses the fact that the object counters are always bounded, there's a way to choose between them, but they are always bounded, and you never overflow them, and you increment, and you have to make sure that you increment these counters on every iteration of the loop. So what is a concrete transition relation? So if you have an input state like this, so you have this list, this is a concrete state and you advance Y you get an output state, right? So this gray type of error just shows the edges of the transition relations. There's nothing special about that. The other type of transition relations I told you about, so earlier I called it region transition relation but here it's all concrete so I'm calling it an object transition relation so these red type of edges just relate objects that correspond in the input and output heap. So if the edges don't change it’s just identity really. But if you have garbage collection then maybe it changes but suppose you have a way of connecting these so this is kind of standard. And now what I'm doing is that I'm assigning each object a variable which is really the value of the object counter. Now here's a system of constraints. So you see it has three types of constraints. And what they express is the fact that first of all the values of the counters as you go across the statement they don't decrease because we don't want to do that. So that the first type of constraint. The second type of constraint says that the sum of these variables in the output, the sum of the object counters in the output, are strictly more than the sum of [indiscernible]. So that means that you have to increment at least one of them otherwise how are you going to measure progress? The last thing is that they all have to be bounded because that was the initial thing. If we can’t prove that they’re bounded than it doesn't have to terminate. So for each edge in the concrete transition relation you output this small constraint system which is finite but, as you probably can guess, the problem is that there's an infinite number of these edges so for each edge it's finite but the whole constraint system is infinite. So it has a nice property. So all these constraints fit in this decidable logic of Quantifier-free Presburger Arithmetic but the problem is it’s infinite. So if it’s satisfiable this program terminates, but it's infinite so we have to do something. And so what we do? We do the standard thing. We just apply abstraction. That means we are going to soundly approximate everything that you've seen so far, we’re just going to do it systematically, and then we’re just going to compromise. So that means that sometimes we lose precision by the abstraction that we would actually fail to say that the program terminates even though it does terminate. So now let me show you how we actually approximate everything we've done so far in the finite way. So now the goal is to produce another constraint system. You’ll see I’ve added a [indiscernible] which means it's abstract and that has to be finite. If this constraint system is satisfiable that will mean that the concrete system of constraints is satisfiable but not the other way around because we lose something. We’re going to approximate it otherwise we solve the halting problem which we cannot. So now what I want to show you is a very simple shape abstraction which was actually quite contrived. It was constructed just to be simple enough to show you on the board. So here's an example of a concrete state and how does the obstruction work? So all we do is we will look at simple list segments which start at the variable and end either with another variable or null and we kind of compress these simple path segments into these double edge arrows. So double edge arrows means it's a simple path, it's a simple segment, it has one or more edges so the number of objects that it represents can be zero but it can also be more. So this object state that you see over here is an abstraction of the concrete state so this is kind of simple. I’m not saying it’s very useful but it's quite simple. So you see this double edge, this is an example, so each of these nodes is a region that represents exactly one object and these edges also regions that represent zero more objects. So in this example this double edge represents both F and G. You can represent an unbounded number of objects. So now let's see what happens when you take a step in the concrete system and in the shape analysis. So in the concrete system you see all these red edges and what happens when you apply an abstract transformer which is an approximation of the problem semantics then what you see you get all these red edges which map regions in the pre-state to the post state but you can see it's not a one to one relation. It's not a one to one relation because what happens is that, for example, some object that is represented by this region gets pulled and Y starts pointing to that so it's kind of splits and this object that used to be pointed by Y gets kind of merged into the prefix of the list so it’s not one to one. It can be like>>: So what is the step you actually took? >> Roman Manevich: So I advanced Y. So Y takes one set forward. So you see Y is to point here [inaudible] this object. >>: Got it. >>: So loop K is not related to>> Roman Manevich: This edge was broken down and this edge got another member. So this object kind was merging to this double edge. That's kind of repartition. Okay? >>: But it also, did it lose precision on the right-hand side? >> Roman Manevich: On the right-hand side nothing happens. If I do it correctly it’s supposed to be one to one. It is actually. So this edge [indiscernible] down to the edge, this object to this object- >>: So in the abstract you map edges to edges as well whereas you don't over here because edges can represent regions? >> Roman Manevich: Edges could map to an object or to a node, a node to an edge, it can happen any type of ways. So that's what happens. So then what we do is we take each of these abstraction transition relations, so for each transition we take this little region transition relation and we write a little system of constraints which has exactly the same shape as before because it expresses exactly the same thing, but now because this relation is not one to one you can see variables appearing more than one time. So, for example, you can see that T has to be greater or equal to L and also to M. This is something that cannot happen in the concrete program but it has exactly the same shape. That's what we are expressing. >>: What's the meaning of L or T? >> Roman Manevich: So with these L or T’s>>: [inaudible] number of objects? >> Roman Manevich: That's actually a good question so maybe I haven't said that. So what is a region? I put variables on those regions. And what do they represent? They represent the maximum of all the object counters that are represented by that region. So a region represents lots of objects. Each object has a counter and when abstract them together you have to take the maximum. >>: It’s like joined. >> Roman Manevich: It's a join. It's a type of join on the lattice of the maximum. The natural is the maximum. >>: This is, by the way, it seems related to array segmentation [inaudible] where you do something like that. You have these abstract segments of the array and you join the data values when they combine. It seems related. >> Roman Manevich: It’s one point I>>: If you were doing sorting on arrays rather than lists then you might use that [inaudible]. >> Roman Manevich: At one point I got a little ambitious and I looked at the paper that actually I think maybe it’s published by now that had this nice domain where you kind of partition arrays and sort of using it but then I couldn't quite convince myself that it's actually going to be practical and there’s some integer reasoning that you have to do so I thought let's just settle down and just handle the linked lists and just go to my safe zone and just handle that. >>: Can I back up and ask a question? Your argument about termination here is that in a sense that the length of the list is finite and you only visit each element of the list at most K times, therefore the loop must terminate. >> Roman Manevich:. So when you say the length is finite I don't have to think of the length. Just think of the number of objects in the input>>: Yeah. The number of objects in the universe. But to try and have a loop that creates objects then that will not terminate but each object will be visited only once. >> Roman Manevich:. So remember, you have to be careful. So remember one thing I did is I took a snapshot of the object that exists before the loop started and I'm only counting on them. >>: I see. So it has to be some>> Roman Manevich: So when I’m showing you this I’m actually lying a little. I lost some details because I wanted to go for simplicity. So, yes. You have to actually make sure that you're not going to increment>>: Whatever you’re doing you have to find a finite number of counters and then if they all>> Roman Manevich: Yeah. So rest assured we’re doing it correctly, but I'm not showing it because there are too many details. >>: I am perfectly assured. I have full confidence in you. I just wanted to understand. >> Roman Manevich: It's a very good point, yes. So if we just use this as constraints it's not good. But if you make sure you only count an object that exists before the loop starts you're fine because that never changes. It's not going to grow and you can also ask what happens when you do de-allocation and you just keep these objects around then there fine. You can use them. We may fail in some way depending on your abstraction but it works. So just to show an example, so this is the transition relation for a finite position, and I'm only showing a part of the region transition relation because it would completely clutter and cover this picture, but you can kind of see that the program starts and it checks whether they got holds. So you see there's a cycle here between these two states so you only have to worry about whether this part of the abstract transition relation whether you actually terminate over there. If it can prove that axis is this cycle then you're fine. So this is the region transition relation between S3 and S4 and there's also one going the other way around, I'm just not showing it. So for each of these directions you have to write two small subsystems of constraints, two small systems of constraints. I'm just not showing that. So that's it, and if you actually try to solve it you'll see that since you only have a finite number of choices where you want to increment the counter, actually here there's only one. You have to increment the counter on the object that is currently getting axis iterate to Y. That's kind of a common pattern actually. Later when I talk about implementation I'll tell you how we actually use that. >>: [inaudible]? >> Roman Manevich: So let's suppose we increment 28. So I haven't shown the red edges but if you go back to red edges goes from V14 to V28 just back. That means if you increment it it will just incrementing forever to just keep cycling and just keep incrementing. It will never reach any bound. But what happens here if you increment on V32 it just kind of joins V31. You never see it again. So increment the object counter of this object once and got swallowed back to the prefix and then you never see it again. So you're not in danger of incrementing it again and forever. But this one if you start incrementing it you have to keep it incrementing again, again, again in the loop and then there's no bound. So I don't know if you got the intuition. >>: In the solution it goes from the [inaudible] for the constraints so you actually get complete integers. You get [inaudible] for these variables, right? >> Roman Manevich: Yes. You could get an actual satisfying solution but we just want to know. But we are actually fine just knowing whether it’s satisfiable or not. We don't actually care about what is the actual bound, we just want to know that there is a bound and there is a choice. In practice I'll tell you that usually the bound is extremely small. It's either one or two in all our examples which is not completely surprising because usually the way you write programs you have a loop and in that loop you don't just to touch the objects over and over and over and over again. If you want to touch them many times then you have another loop that does that. So if you want to think about practical issues then yeah. So usually setting the bound for zero, one, two and you can just use [indiscernible] or something. You can just do it. It's really easy. >>: So the classical way to show that something is finite is to put it in a one to one correspondence to a finite set, right? This is sort of what you're doing here right? You're saying every piece of work that my program is going to do, like every loop iteration, is going to be that work is going to be assigned to some object and I have a finite number of those objects. I started with a finite number ergo the amount of work that I have to do is finite but there could be many such maps. You have to intelligently assign the work to the right place. >> Roman Manevich: In the concrete system you can have a lot of possibilities. You can always just choose any object for which the counter has reached a bound and you're fine. But once you abstract it then you get more constraints and it gets trickier. >>: That's right. And then you have the additional condition that you have to be able to prove these conditions using the abstraction which can be hard. >> Roman Manevich: So this is basically what we do. The rest is just more details. So how do we handle loop nesting and recursion? I’ll just tell you about it really briefly. So in the input programs I've shown you there's actually a loop nesting and you'll see one thing I've done I've written it down a small portion of the precondition assertion and what I really want to have here is actually the strongest assertion. I don't want to compute it, I'm just saying what this stands for conceptually is for the value that you get from the collected semantics, the strongest assertion that you can put in there; and conceptually I'm going to use it to define the class of programs that I'm handling so the observation here is really that both loops are linear loops. The amount of work that's done entirely can be quadratic if you have nesting but individually each loop only does a linear number of iterations. So we set it our program is loop polynomial if basically each loop is linear. But it's only linear for the precondition which is only for the right set of states, only from the states that actually reach from the outer scope, in the scope of the entire program. If you put in just any kind of precondition it may not be linear, but it's linear within the scope of the entire program, so this is what we call a loop polynomial program. And for loop polynomial program, the run time is polynomial bounded by the loop nesting [indiscernible]. So I think it's a nice class of programs because you can see the bound on the worst-case execution of the program is very visible. It’s just a structure of the program. Just a loop nesting that and that's it. So that's what we handle. >>: What is N? >> Roman Manevich: Sorry? >>: What is N? >> Roman Manevich: N is the number of objects that exist when you enter the program. It's the size of the initial heap. >>: So you have N and its binary tree and then [inaudible] is logarithmic? >> Roman Manevich: So you have a>>: Yeah. So you have a binary sorted tree, you have a B tree and look up is logarithmic. >> Roman Manevich: So the lookup is logarithmic but it's also linearly bounded. And that's all we care about. We only want to prove that it terminates. So we’re not trying to infer tight complexity bounds. So we’re not that ambitious. So for us this loop is linear and it's fine. Maybe it's not what you want, but that's what we give you. >>: The upper bound [inaudible]? >> Roman Manevich: So we give you kind of a coarse upper bound. It's only polynomial. >>: But [inaudible] these methods [inaudible] you doing a transformation to a logarithmic space within your bound logarithmic space so I’m wondering how that applies to shapes. >> Roman Manevich: I honestly don't know. There's an interesting work by Miguel Yuriek[phonetic]. He actually shows you how to integrate different types of metrics into the program and he can track that. So he can track the height of the tree, the size of a partition, the length, so that is somehow more involved but we are trying to somehow sidestep all that. >>: Another is [inaudible]. >> Roman Manevich: So if we try to apply to [indiscernible] we’ll just end up telling it to the worst case is quadratic which is>>: It wouldn't tell you anything about the [inaudible] complexity [inaudible]? >> Roman Manevich: That wasn’t our intention. >>: But the amortized is the sum of the one time over some number>>: Over some linear number. >>: You’d take an arbitrary permutation [inaudible]. This is much less ambitious, right? This is just saying I know the heap was a finite and that's enough to show that the program terminates. >>: But the other run time methods they work by, so translating into an arithmetical domain is a difference logic and then after the [inaudible] maximization problem you get your [inaudible]. >> Roman Manevich: So you can do much more. >>: They can handle these logarithmic classes by transferring the count [inaudible]. >>: But he doesn't even have the ability as a precondition say that a tree is a balanced binary tree. There's no way in the shape domain to even say that it's bounded. >> Roman Manevich: You could say it, but I won’t use it anyway. You can have a shape analysis that say that the tree is balanced and you can say anything you want. You can always design any type of shape analysis but that won't really help you. >>: [inaudible] tree does balance? [inaudible] that balances a binary tree or something like that [inaudible] balance. >> Roman Manevich: I think [indiscernible] actually had a paper in 2005 maybe for>>: I’m way out of date obviously. >> Roman Manevich: I think he maintained predicates basically, I told you what is this Q? What’s the difference in heights between the left subtree and right subtree and he was able to update it so you can do it. So this is what I'm calling loop polynomial programs. That's a class of programs that I want to handle. And you can slightly extend it to include our classic procedures. So I'm not saying that all of programs fit in there. There are some fixed point algorithms but I guess most fixed point algorithms won’t fit in there, but that's fine. I can still handle some interesting programs. So how do I actually handle nested loops? So remember, I'm assigning integers, I’m assigning actually natural numbers to variables. So if you want to say I'm touching an object at most quadratic number of times I can’t say that. So what I'm kind of forced to do, maybe I can do it a different way, but what we wind up doing is actually using summarization. So when we have a loop, suppose you have an outer loop and an inner loop, we actually summarize the transition relations of the inner loop which is just a finite graph. You just take the transitive closure of both relations and then just treat it as an atomic step. So it’s a fairly complex atomic step and then you just analyze it as an atomic step within the scope of the outer loop and then just to do everything, so it didn't just handle the outer loop on its own in some sense. So what does that mean? So it's good because once you've computed all the summaries of the loops and the procedures and the classic procedures you can just analyze all loops separately, for termination that is, but it also means that you've lost some ability to reason about terminations about when the interaction between the outer and inner loop is somehow intricate. So think of a program like InsertionSort and you remember the outer loop the advances X over the input list. What you could have done is actually pushed that point advancement into the inner loop in some condition and then once you summarize that, and we don't have any argument to show that the outer loop actually terminates because visibly you call an inner loop and then we don’t know what changes. We don't know which object to actually increment the counter on because we can’t go to the inner loop and say in the context of the outer loop increments sometimes happens in the inner loop so we can't do that. So we lost that ability. So we lost some precision. So we really handle even the subclass of loop polynomial programs which are loop polynomial but in some sense the termination argument of each loop have to be visible within its own scope. You can't hide inside some other scope. So if you think about it it’s really simple programs but they're kind of natural and that's how you write programs. If you start squishing loops together, good luck. So that's what we handle. How do we handle recursive procedures? So you can think about it as a hack. It has some advantages and disadvantages. So we used an analysis by Noam Rinetzky and [indiscernible] Sigiv. What they did is basically they said okay, what can we do? We can model the call stack as a list and the lock available is just kind of fields pointing from the activation record to the object, and then in some sense you can reduce inter-procedural analysis into intro-procedural analysis and you can add more predicates to make the analysis precise. And then the algorithm works almost out-of-the-box. So if you have a classic procedure the stack frame is like a newly allocated object so we can't count on that. So we can’t count on stack frames over classic procedures but you can count on other objects and it's fine. So that's what we do. There are some intricacies but I won't talk about them. >>: Can you not do this also with just procedure summarization? Because let's say you descend on the left children and you descent on the right children and you know that when you descend on the left children you only increment the counters of the objects on the left, and when you descend on the right children you only increment the counters on the objects on the right. So then none of the counters should overflow by the time you terminate. [inaudible] assuming your shape domain is strong enough to handle the trees. Talk about left subtree, right subtree. >> Roman Manevich: No. That's not the problem. I just don't know. So I think the more popular type of inter-procedural shape analysis they are more local. So this is a very global shape analysis. It's very precise. You can check anything between different call stack instances and deactivation instances but then it's not that efficient. So there’s more local type of shape analysis and we would like to use them but so far we haven't really figured out how to do that. >>: It seems like a natural application for separation logic to do it that way because you separate the left subtree and the right subtree and call on them. Just the normal framing would guarantee that all the counters are incremented no more than once. >> Roman Manevich: I’m not so much worried about the actual type of shape analysis. I'm more worried about the proof of [indiscernible] proof would be. Suppose you have an idea [indiscernible] you can do anything you want. I'm just not sure what the proof would with the local shape analysis. >>: The proof rule basically you’re just counting work when you do this and you just have to make sure whatever your program does it increments some counter that’s efficient. So, for example, for recursive procedure if you increment one counter at least every time you instantiate a recursive call procedure that's enough and that would be enough to guarantee termination so long as a counter gets incremented [inaudible] then that’s sufficient. So every time you go around a loop or every time you call a procedure. >> Roman Manevich: I have to see it happen. I don't know. I understand the idea on some very high conceptual level but I have to be convinced you can actually summarize and you can actually solve this constraint system. I don't know. I have to see an instance of that working. So that's roughly what we did. We actually implemented a lot of the experiments. So the implementations actually did something much simpler. So we don't actually employ any solver. What we do instead is that we use a simple heuristic that usually, so what we do is we just try to guess a satisfying assignment. We’re just being optimistic and saying this should be one. And usually you should just increment the object counter of the object that you're currently touching. That usually just works. So that takes two minutes to implement. So we did that. It just worked for pretty much all the cases. I can very easily think of programs where it wouldn’t work and you actually need some constraint solving. So suppose you have a loop and you keep touching the same object over and over again and you keep comparing then it won't work. All the programs we analyzed actually weren’t anything like that so it just worked. So that's what we did. But that's not the theory but if that works then obviously the theory would still work. In terms of running time we are not sure. We haven't actually run the solver so maybe it would take a long time, maybe it would take a short time, but I have a feeling because it's all modular then I think it should be efficient. And then we just try different types of benchmarks. So first we tried just the standard list benchmarks and everything worked fine. Then we took all the mutant benchmarks from the paper about proving shape termination. So these are kind of models of device drivers and everything just worked. But in these programs there's no loop nesting and they're kind of simple. Then we went to problems with nested loops. So in the parentheses you see the nested loops, and usually we didn't have to do anything, but sometimes only in two cases we only had to refine the shape analysis to get it to work. So for Bubble-sort we had to keep track of the sorted suffix but that was really easy and there was some intricate program that [indiscernible] Group they wrote a shape termination paper that kind of intricate program where you have an inner loop and an outer loop and the outer loop is true and is [indiscernible] at most two times so we just need to remember which is the last element of the list and everything just worked. And we also tried some just artificial programs with loop nesting [indiscernible] and you see the running time increased but that's because our implementation we’re not actually using, so we haven’t actually implemented the modular part so it kind of grows. But I expect if we actually used modularity to loop it would remain fairly small. And then we tried some programs with acyclic lists and recursions from the original paper and everything just worked as is. And finally we tried some programs that handled sorted trees. We handled the Deutsch-Schorr-Waite which is an algorithm that takes a binary tree and traverses it by flipping the pointers around. So that's kind of interesting. But still, if you think about it, it's loop polynomial because you're traversing the tree but you’re touching each object at most going down once on the left-hand side, back, right-hand side, so it's still a constant number of times so it's got naturally loop polynomial; and then we also handled mark and sweep garbage collector. So you do a graph search, you find all the objects reachable from the roots, and then you kind of sweep and remove them. So that's obviously loop polynomial but it's nice because it's a different flavor of shipping variants. It has structure, has unstructured, that was nice. So I'm just going to conclude. So we do have some capabilities but we also have some limitations. So we support memory allocation and de-allocation. We do not support integer variables and concurrency but it's interesting to ask so we try to handle that. If the loop is a super linear then we don't even have the hope of handling it right now. But sometimes, I kind of lied, sometimes if you can adapt the loop modularity to the structure of the abstract transition relation, not to the CFG, and sometimes you can actually do it but we haven't tried it. If you're not sure what I'm saying right now, don't bother. It's fine. >>: Couldn’t you do this lexicographically? In other words, you have two sets of counters, higher and lower counters, and every time we increment a higher order counter we set all the lower counters>> Roman Manevich: That's exactly what the implementation does. That's the only way to figure out how to do it. >>: So there's this unofficial program that takes [inaudible] by moving the pointer once. >> Roman Manevich: Virtual array? Which one? >>: It moves the point step and then two steps. It keeps two pointers into the list. One is advanced but one and the other is advanced by two. >> Roman Manevich: It’s kind of a fine type of pattern. >>: And you’ve either reach null or>> Roman Manevich: You’re trying to find the root of the [inaudible]? >>: [inaudible]. >> Roman Manevich: Right, okay. That would work. You’re just trying to find whether there is a cycle? Okay. Would that work? Let me think. I think it should work. Sorry, I’m stupid. We actually tried it because that’s the [indiscernible] example that keeps reappearing. So it didn't work. It didn't work because of our reasoning. It didn't work because the underlying shape analysis kind of lost precision at some point and we didn't have time to kind of fix it. But when we thought about it on paper it should work. >>: But you increase [inaudible] constraints. [inaudible] domain that you map into is>> Roman Manevich: As long as your shape analysis knows that you can’t somehow, that the point that [indiscernible] cannot somehow bypass the slow running pointer you're fine. >>: But it’s also property of arithmetic. If you advance by three and one and have it different one time>>: That's a problem of the shape analysis. So the shape analysis that also knows about the arithmetic. >> Roman Manevich: I guess if you keep going>>: The shape analysis>> Roman Manevich: It wouldn't have to know about groups or whatever. So we kind of delegate it to the shape analysis and we just hope that it does the right thing. So for that one and two running thing just give us a day we’ll fix it and make it run. But if you something more arbitrary eventually you'll stop. The shape environments really depend on these numeric properties I don’t think we'll be able to prove termination without actually reasoning about some of that. It's not magic. >>: I was guessing one could make a statement about say sufficiency of the arithmetic domain that you map into. >> Roman Manevich: Sufficiency of the>>: [inaudible]. You say you map into Presburger Arithmetic. Termination argument can be solved in this domain. Anyway, that goes to the back one for asking this kind of>> Roman Manevich: I'm not sure how to map it down. I’m not sure how to give you a good answer. I'm not even sure what the question is now. You're trying to say that if the Presburger, given the constraints on what the Presburger what it can do maybe you can say something about the type of programs and>>: [inaudible] termination and about the run time properties. >> Roman Manevich: I'm not sure. I’ll have to think about it. That's the thing when we bring these loop polynomials programs that's the closest approximation that I have in mind. >>: [inaudible] and the loop that does quadratic algorithm. >> Roman Manevich: Yeah. So if your loop is super linear>>: So one thing you could do is instead of attaching counters to objects that [inaudible] you have a finite number of regions just more variables for the [inaudible] solving problem but it stayed finite. So let's say that you had somehow one loop that goes [inaudible] and compares every pair or something>> Roman Manevich: You have one loop that goes over all pairs? >>: Just do InsertionSort with one loop instead of two. >> Roman Manevich: To you start maybe with two loops and kind of squish them together? >>: You can always make two loops into one. >> Roman Manevich: That's what I meant. >>: It could handle it easily by instead of attaching the counters to objects touching the pairs of objects. >>: So whenever you do a comparison of X.D to Y.D you increment the counter for the pair X,Y. >> Roman Manevich: That's interesting. I never thought of that. >>: So you get from the shape analysis, let’s say you get the highest number of regions that you have in any abstract state is five. So now you will have for each transition five pair integer or five natural [inaudible]. So if you instead of 25 you have four pairs and you have exactly the same kind of constraints. >> Roman Manevich: It's an interesting idea. The way I currently thought about it that instead of trying to handle it on the syntactic level just try to handle each syntactic loop separately. You could take the abstract transition relation and kind of reverse engineer the structure from that. So there are some known techniques to take non-reducible control flow graphs and try to extract some hierarchical structure. So you can try to apply to the abstract transition relation and then sometimes you actually find out this natural structure or there's actually an outer cycle in transition relation and an inner cycle. So I think in many cases it would work but maybe what you have is actually a more general solution. I haven't thought about that. Just to summarize. So we have a termination analysis that kind of follows the natural way that a lot of programs are written and the reasoning is done in a non-pervasive way in the sense that you run a shape analysis and then you kind of do a little bit more reasoning to reason about termination. And the analysis is actually modular which is kind of nice. It’s supposed to be more efficient. And what we observe in practice is that usually the number of times each loop touches an object is really small. Usually it's no more than two. So you probably ran out of patience a long time ago, but in terms of, I won’t tell you about all the related work. I'll just tell you how we come to distinguish ourselves from related work. So a lot of related work has to do with inductive shape invariants is kind of the reasoning is really tied to down to the fact that you can unfold the inductive definitions. So if you want to analyze unstructured graphs you can't even start. Some of these are even more limited. So first you have to prove that it is actually memory safe and then you can apply termination reasoning. In our case, if you just want to prove termination you don't have to prove that the program is actually safe. It's fine. You can just apply it. We actually tried it. It's very simple to take a program that does another reference we still prove it terminates. It can abort but it will terminate. There are some other types of work. There are lots of types of work. Some of them are better documented and some are less documented. Some of them handle recursion in some way, but usually the reasoning is somehow tied down to the type of shape invariants. So there's a work by Sumit[phonetic] and Talagumin Mully[phonetic] which is where the termination is actually not done automatically but by tracking partition sizes. It's actually quite powerful. You can actually prove I think all the programs that we've seen so far I think can easily be proved with that approach. That also requires you to track all these numerical invariants during the runtime where is in our case it’s kind of separated which I think it's nice. So that's it. Thank you for listening. If you have any questions you can ask me now or later. >>: Not a question. On related work. So [inaudible] and I had a paper in 2006 where we did something like this but pre-distributed protocols. So we tried to show liveness by giving every object in the protocol a counter or maybe you have to give your restate of every object in the protocol counter. It was the same idea of showing that one of those counters had to be incremented and very often so if you terminate or if the liveness condition is satisfied before all the counters reach some value K then the thing is [inaudible]. So it was a different application domain but a similar idea. >> Roman Manevich: I'm going to look into that. Thanks.

>> Tom Ball: Okay. I'm Tom Ball, and... He’s well-known to everyone at Microsoft Research. He was...

Related documents

Products

Support

&gt;&gt; Tom Ball: Okay. I'm Tom Ball, and... He’s well-known to everyone at Microsoft Research. He was...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

>> Tom Ball: Okay. I'm Tom Ball, and... He’s well-known to everyone at Microsoft Research. He was...