Sumit Gulwani: Okay. Good morning, everyone. It's... pleasure to introduce Saurabh Srivastava, who is a Ph.D. candidate...

Sumit Gulwani: Okay. Good morning, everyone. It's my immense pleasure to introduce Saurabh Srivastava, who is a Ph.D. candidate at University of Maryland, College Park. So Saurabh's dissertation has been about a new template-based technique to reasonable programs that leverages the engineering advances in SAT and SMT solvers. So Saurabh has used this technique to verify 20 lines long programs, and you might wonder that why is that interesting. Well, it turns out that these techniques can also be used to actually synthesize those 20-line programs, and Saurabh synthesized the first program probably in January last year, and since then he has been synthesizing programs like crazy. So Saurabh is going to tell you more about this wonderful area of work. So off to you, now, Saurabh. Saurabh Srivastava: Okay. Thanks, everybody, for coming. Sumit, for the introduction. Thanks, So I'll be talking about work that I've been doing in my dissertation, and it's titled Satisfiability-Based Program Reasoning and Synthesis. Okay. So let me start by making almost a non-statement. Software is everywhere. We employ teams of developers, Microsoft knows all about that, which put in many man years worth of effort into building these things. Then we send them off to teams of testers, we stack on another some man years worth of effort, and then we put them on almost everything imaginable. But while software is everywhere, correct software, not so much. We have all these instances where our software hits [inaudible] cases and crashes or does something bad, and, well, all these software companies invest a lot of effort trying to debug the software and, well, probably write it in a font that is very visible and push it [inaudible] or something. We have to spend a lot of effort dealing with that. So as we move forward, it won't just be sufficient to find these bugs and tolerate them if one can. We need a process that does not introduce these bugs in the first place. And while we have this issue, what we also have is the resource that we have at our disposal, that is, we have the computing power available to actually get the job done. We have the computing power to build better software. We just need to develop some theory, techniques, and tools that will allow us to harness it. So what do we want to do? What we want to do is, one, we want a mechanically reasonable software, and that will help -- and that will allow us to help the testers, because now what they can do is they can use the computing power that they have at their disposal and use it to automatically verify software. Additionally, it will also alleviate some of the human effort there. Additionally, it will also help the developers because if we can do mechanical program reasoning, we can infer specifications. That might be good for the developers to use as a summary of what they've already built to build other software. The second, probably the more interesting thing, is that we want to automatically generate software. And this will certainly help the developers because they can use their computing power alleviate some of the human effort there, and what we can do is automatically synthesize the software possibly. So towards this end, I've been pursuing the researchers in the last two, three years, and what I claim is that we can build powerful program reasoning and program synthesis techniques that leverage the power of satisfiability solving. And let's dissect that. So program reasoning, as I was talking about, what we want to do is we want to do verification and we want to do property inference. So verification is you have a program and you have a property and you want to infer a certificate that matches the program to the property. That is the [inaudible] proof that the system will generate. Property inference is you're given a program and you want to infer the properties off the program and you want a certificate about it. Again, as I said, more interestingly, we probably want to look at program synthesis. And as the boxes kind of indicate, we'll be kind of linking this all together. I'll be building on program reasoning to design program synthesis tools, and we won't have the time to go into much detail about program reasoning, but let me just state that that is the key technical bit that I've been working on. But program reasoning just builds on top of that. So we want to talk about verified synthesis where what we want to do is we want to just take the property that we have and we want to infer a program. But not only that, we want to infer a certificate about it as well. And a more pragmatic approach that I've been recently pursuing is this approach where you're given just a property, you only infer the program, but you don't really care about the certificate that much. So what we have is a testing-inspired approach to synthesis where the program that is generated is correct and matches up to the property up to the guarantees provided by testing. And that is approximate guarantees. Okay. So the other thing that I want to mention is that we are going to be using satisfiability solving as our core tool to do all of these things. And why is that? Because that allows us to bring available computing power that we have at our disposal to these tasks. And that is facilitated enabled by recent advances in the last decade, and we now have fast solvers, essentially Z3 for my purposes, which can really solve hard instances in the SAT and SMT domain. And if you look at the SMT competition, which is the competition for these solvers, some of the industrial benchmarks are incredibly big. So we have this tool at our disposal, and we want to apply it to our task. So here is the outline for the rest of the talk. I'll be talking about the foundations that we have built, the core technique that we have built, which leverages satisfiability solving for verification, property inference and synthesis, and then we will talk about -- in this case we will mostly concentrate on the synthesis aspect, just breeze through the first two, and then I'll discuss our testing-inspired synthesis approach. And while we're doing that, I'll also mention the practical implications of this thing, how we've been able to infer some very complicated invariants that you might need for verification, all preconditioned for functional correctness or termination for property inference, and I'll discuss in detail our examples for synthesis and also for testing-inspired synthesis. Yes? >>: So you talk a lot about synthesis. How do you weigh the sort of the impact of the work you're doing in verification against the work in synthesis? In other words, which do you think is more important, and why? Saurabh Srivastava: Well, synthesis has the potential to be really important. Verification is sort of needed right now, right? We have developers who write these programs, so in terms of inferring these quantified invariants, which are really required for a lot of programs to actually prove them correct, verification can be very useful. But synthesis right now is at kind of like an experimental stage. We've been able to synthesize a couple of programs, but once we figure out what we can actually synthesize, then we can start talking about how that might be useful in practice. I don't think synthesis is talking about -- at least my work doesn't really talk about [inaudible] applications right now. We're trying to figure out the core techniques and maybe in a couple of years we'll figure out that, oh, here's how they can help the developers, et cetera. Okay. So to talk of the first part, which is the score satisfiability-based program analysis technique, I'll be talking about verification and property inference, as I was talking about. Okay. So let's start with some real basics. Reasoning about -- and this will be very high level, especially for this audience, but just to put everybody on the same page. So reasoning about a straight line code is simple. Well, relatively, if you don't talk off the heap, et cetera. And at least -- so what I mean is that at least for simple domains, it is simple. So, for instance, if you have a fact X is greater than 5 at the beginning, you have an assignment, X is assigned by plus 1, intuitively we know that we should be able to derive X's greater than 6 afterwards, right? And this is easy to mechanize as well, and our way of doing that will be, for the purposes of this talk, will be to say that, okay, you have the fact X is greater than 5, and for that assignment we will put an equality predicate, we'll conjunct an equality predicate which says that the alpha value of X, which I indicate by the prime variable, X prime is equal to the sum expression over the inputs that we're computing. And once you conjunct that, then you have this -- you can prove something about the resulting values. And this is very -- well, this is essentially the way we do it in symbolic execution, RSSA-style reasoning, and you can do it backwards using [inaudible] style reasoning as well doing substitutions, but we'll stick to this because it's, for one, intuitive, and it also facilitates other things. Okay. So before we go further, let me just introduce some notation. I will use the [inaudible] available F to indicate facts in the system, such as y is greater than 5 and X prime is greater than 6, and because statements for us are these traditions between the inputs and out puts, I will use a [inaudible] variable T to indicate that. And because we're talking about reasoning and synthesis, these facts might be unknown or these traditions might be unknown as well, so I'll use this [inaudible] font for that. Okay. So we have straight line code, which is somewhat simple. Acyclic code is also doable because what you need to do is that's just a combination of various straight line fragments, right? You just need to -- as a [inaudible] operation, you just need to -- quote, unquote, just need to define the joined operator. And you essentially just need to combine the fact that you get F1, F2, F3 from straight line reasoning to get this new fact. So that's also kind of doable. But almost everybody would agree that dealing with loops is the tricky bit. So you have this straight line fragment, acyclic fragment, you have a zero at the beginning and then you get a font at the end. And then you have a loop. The problematic bit is this back edge is that bumps [inaudible] into the beginning, so you have this F0 prime, and then you might bump it back and then you get this F0 double prime. Essentially what you need is a fixed point, right? You need a fixed point that if you start with that, then you'll get the same thing at the end. And this is the key difficulty that we have been dealing with. You need a loop summary, a loop invariant, and for automatic program verification, this is the key difficulty. Inferring these fixed points, loop invariants, is the big task. And over the past couple of decades we've been essentially automating this process, trying to figure out mechanical ways of generating these fixed points. And it has been limited to simple techniques. So what we really need is a technique that will infer complicated, by which I mean quantified invariants for the purposes of this talk, and can do it robustly. And for this we have our approach which I call satisfiability-based program analysis, and analysis written in this framework are always composed of two parts. One part is this part that reduces the program into a SAT formula, and then the other part is the one that actually does the fixed [inaudible] computation using the SAT solver. It takes the SAT form that you have computed from the program and then gets a solution out of it which corresponds to invariants that I will talk about in a bit. So the first part is the important part, which is this part that takes the program and generates a boolean formula out of it, right? So let's talk about that. And here what we have is that we make a key assumption. The key facilitator that allows us to get boolean formulas out of programs so that they respect this mandate is the user invariant templates. And this is very similar to the domains that give you a [inaudible] except that here we have these templates which essentially give you more structure. We'd say that, okay, this is the kind of invariant that I'm looking for. And we've considered two domains. For instance, like for linear arithmetic what we say is that you can have invariants that are a particular form. Instead of just being some invariant we say that, okay, it is an invariant which has a particular form, let's say, of F1 disjunctive, with F2 disjunctive, with F3, and these internally are cubes over unknown linear relation that we want to infer. So this is the kind of template that I'm talking about. We have another reduction for a predicate abstraction that essentially takes -- can talk of quantifiers. So you can have some boolean structure in the top and it could involve quantification. Linear arithmetic reduction does not handle quantification. So this is essentially why I put that down there. >>: So the general [inaudible] is that you're going to say rather than searching for arbitrary invariants, we're going to assume the invariants have a certain structure? Saurabh Srivastava: Yeah. >>: And then you're basically going to ask the theorem prover how about this invariant or how about that -- or, no, you're actually going to say within this structure, please search for invariants that may -Saurabh Srivastava: So, yeah, the intuitive idea is more similar to the second one. We reduce it to a boolean formula and then once you solve for that formula, that datacally gives you the model for that formula, it datacally gives you the invariants. >>: Okay. Right. So the FI's are all unknowns, but you've essentially limited the structure, the size. Saurabh Srivastava: Yeah. But we do not numerate -- so our use of the SATs is a live different than traditional approach -- whatever traditional means. So one approach could be that you numerate some candidate invariant using some technique, like use some process that you might have, and then you use the SAT solver to filter them out, correct? But we don't do that. >>: I see. The analyzer thing are holes that [inaudible]. Saurabh Srivastava: >>: Yeah. They're variables. They'll essentially become variables. Saurabh Srivastava: Yeah, essentially become variables. So when you have conjunctions of -- for instance, in this formulation it's easier for linear arithmetic. These would be conjuncts off linear realization, and those linear relations will be over program variables and some unknown constants. >>: Unknown constants, right. Saurabh Srivastava: booleans. And those constants will essentially boil down to Okay. So, again, for the case of predicate abstraction, these unknowns are conjunctions over some predicates. So we won't really have the time to go into the details of how we do the reductions so let me just give you an overview. So what you have is you have these program verification conditions. They're of the form of some unknown invariant followed by some transition should imply some fact somewhere in the program. It could be the unknown invariant if you're going around the loop or it could be some other fact. And by making this assumption about the structure of the invariant, more so than what you do in abstract [inaudible] domains, what we're able to do is we're able to apply a trick from the linear arithmetic literature called Farkas' lemma to actually take this implication, plug those templates in the formula and then, through a couple of steps, eventually get a boolean formula. So I'll be happy to talk about these steps off line, but let me just give you an overview of how the technique works, right? So we have that for linear arithmetic. For predicate abstraction we, again, have a couple of steps that take these verification conditions and get you to a boolean formula. And we have an algorithm that we call optimal solutions, but essentially it's a glorified predicate cover operation that we know from the literature. So that essentially takes you from the verification condition to the boolean formula in this domain. Once you have this boolean formula, you can use a SAT solver to solve for it. Right? And now you get a SAT solution. And what does that SAT solution correspond to? That SAT solution essentially gives you values for what goes inside of those unknowns that you have in the template. And the template has some alpha-level boolean structure, and once you have those values, you can plug those back in, and now you have the new invariant. Okay. So that is sort of the high level of what the technique does. Again, I'll be happy to talk about these two reductions off line. So what have we used this for? We've used this for verification, that is for assertion checking, generate invariants for assertion checking, and to illustrate that, let me talk of an example. So imagine the background here to be a pixilated screen, okay? So these are pixels. Each box indicates a pixel. So what we need -what we want is a program that draws the best fit line corresponding to the real line from 00 to xy, and such that the alpha values are no more than half a pixel away from the real value. So you want to output these pixel values. You can do that by computing the slope, y divided by x, and multiplying it by small x and then taking the [inaudible] as required. But supposing you wanted to do that more efficiently, you wanted to do that only using linear arithmetic operations, well, the graphic simulator does know of a program that does this, and it's this one, but looking at it, we have no idea why this should be computing the best fit line to a real line, right? So we want to verify that. We want to have -- we have a precondition, we have an output that we want to meet. We want to verify this program. We want to infer invariants for the loop of ten. So we can infer invariants for this program that we have. Bresenham's line drawing example. And the actual invariant that you need is this. And using the linear arithmetic tool that we've developed, we can infer invariants of that form. The human program is probably not going to worry about that or even give you that. So that is the kind of automation that we can provide with that linear arithmetic tool. The other interesting sort of standard benchmark is sorting. Sorting is interesting because you have these small complicated programs, you have a couple of them that we're very familiar with, and these are interesting because the reasoning involved is complicated because it involves quantifiers. You have the fact that you want to prove at the end. And for the loops you want to infer complicated quantified invariants, and this distilled reasoning is difficult to automate. And what we can do using our technique is to give it a template or an alpha-level structure that says, okay, you have some quantification, this fact, this is unknown implies some other unknown, and then the tool [inaudible] for this complicated invariant that you need. Okay. So we can infer quantified invariants, and we can also do that in very reasonable time, quote, unquote, reasonable time. So here are the times in seconds on the log scale out here on the y axis. And for the most part we see that we can compute the invariants within 1 to 10 seconds, approximately 1 to 10 seconds, and for one example, which is the line drawing example, we need more time, about 100 seconds. What is interesting is to know that the previous techniques were somewhere there. Like we had one technique that came close, which was insertion sort, for insertion sort, and then there was merge sort and quick sort, and that's a log scale, so that's about an order of magnitude. But that is essentially not the point I'm making. That's not a fair comparison, because those techniques were a couple of years back and stuff like that. The point that I'm making is that we have a robust technique that can infer invariants given these templates for all of the benchmarks that we have as opposed to other techniques which were specialized for one or the other. Yes. >>: I'm not sure I understand what you're comparing against. So what are the previous techniques? What did they do? You're comparing the performance, presumably, but they verified that quick sort was correct. Saurabh Srivastava: >>: They inferred the invariants for proving -- [inaudible]. Saurabh Srivastava: Yes. Yeah. So the times here are for inferring those invariants. Did I answer the question or ->>: Yeah. Saurabh Srivastava: Okay. Cool. Okay. So we can infer invariants for verification. The other interesting thing is that we can also do loop bound computation, which is of interest to someone, for instance. But this is a heavy-weight technique, and in a speed product, I think he uses more lightweight techniques. But we can infer complicated loop bounds if they are needed. More interesting, probably, is probably inference. And what we can do is we can infer preconditions. For instance, we can infer preconditions for functional correctness. So, for instance, consider a program that implements a binary search. What we can do is we can run this program through a tool in this precondition mode, and what it will tell us is that binary search is only correct if you give it sorted array. And that can be kind of useful if a program is entering -- is writing some fragments and then he wants to figure out other fragments that are going to call into this, so what are the conditions that you hold. We can also infer preconditions for bugs or worst case. And what do I mean by that? So consider selection sort. If you run selection sort through a tool in -- by saying that, okay, well, what is -- put in an assertion saying that, okay, here we're swapping. Give me the worst case number of swaps that it does. And the tool in this precondition mode gives you opportunity to say that array should be sorted, pretty much, except that the last element should be smaller than the first element. So this is kind of interesting. Precondition would say that, okay, if you give it this particular input, then it's the worst case number of sorts that it can possibly do. We can also input preconditions for some kinds of determination. Okay. So, again, we have a technique that can input quantified properties of programs, which can potentially be useful. Okay. So this is what I've been talking about in this part. We have this approach for doing fixed-point computation using the SAT and SMT solver, and the key detail that we need to add are these abstraction-specific reductions. Unfortunately, we didn't go into the details of how you actually do that, but, for instance, for linear arithmetic predicate abstraction, we had to design different reductions to convert them to boolean formulas. Okay. >>: I have a question about that. Saurabh Srivastava: Yeah? >>: So sounds like you're not going to go into it in more depth. So one question I have is what are the limitations of that strategy for trying to invariants? You have the two different kinds of strategies, and in both cases you're making approximations or you're creating this template, et cetera. Under what circumstance will that not work? Have you thought about -Saurabh Srivastava: So there is one thing that is -- one limitation kind of in addition to previous -- kind of in addition to what previous techniques had was this notion of template. Right? So if you do not -- if your invariant does not happen to lie in a template that says, okay, for all K1, K2, something implies something, right? If it requires, like, something more complicated than that. Then you're solver will come back and say that that does not exist in invariant. So you have to [inaudible] go ahead and maybe like give it more expressive templates or something like that. So linear arithmetic does not handle quantification, so they're template -- our template would be some number of conjunct, some number of disjuncts, and, again, if it doesn't fall into that template, then the solver will come back and say no solution. Right? >>: [inaudible] you were looking for 100 variables and thousands of disjunctions and you couldn't find them. >>: Is that the only case? matter of numbers? I guess that's the thing. Is it just a Saurabh Srivastava: Well, so there's a trade off there, right? So if you give it a very, very expressive template -- I mean, you can just give it a huge template, right? Any number of quantifiers, you can enumerate tons. But then the solver will take its own sweet time to come back, right, using SAT and SMT solvers at the core. So if the instance is terribly hard, then the state-of-the-art solvers won't be able to handle that. >>: I mean, one thing that could be done would be to look at the solving time versus the complexity of the template so you could artificially create bigger templates. It's just kind of how long they take to fail, essentially. Saurabh Srivastava: Sure. >>: Because it would give a better sense of what the boundaries are between things that are solvable in a reasonable amount of time and things that are just way out there. Saurabh Srivastava: That might be instructive to see, but I don't think that will be very -- very -- what is the word that I should refer -- SATs always have this brittleness in terms of -- like it's not easily quantifiable which instances are hard and which are not. [inaudible] but it doesn't seem like you can just vary the templates and get [inaudible] like this is a harder SAT than the previous one. So, again, like we're -- so that you could say is another limitation. >>: I'm an experimentalist, and I believe doing an experiment would probably give you some insight. Saurabh Srivastava: Yes, certainly, certainly. give us insight. But ->>: It will certainly [inaudible]. >>: So I have a question. I mean, the template-based version, is that really new to you? I mean, I thought that people had been investigating this idea before using -- maybe not using SAT. So can you really tell me much more specifically compared to the related work what so far is a new contribution? Saurabh Srivastava: These reductions -- so if you assume -- so we need to assume a template to actually get a finite status. >>: No, I know that. Saurabh Srivastava: in. >>: So that's essentially where this template comes No, but the idea of the templates. Saurabh Srivastava: The idea of the templates, I don't think that's, like -- that's a [inaudible] idea. I don't think that's the new contribution here. >>: Okay. Saurabh Srivastava: The new contribution are the reductions. Essentially for linear arithmetic, again, we have to -- to be completely honest, the reduction -- the application of Farkas' lemma was used earlier by student [inaudible]. >>: Yeah, yeah, yeah, right. Saurabh Srivastava: Yeah. That's what I -- So that was -- >>: So you're going to tell us more about the actual -- the detail of your reduction? Saurabh Srivastava: synthesis part now. >>: Actually, no, I'm going to jump into the Oh, okay. Saurabh Srivastava: >>: Okay. >>: [inaudible]. But, again, we can talk about that -- >>: No, no, no, no, no. the talk. You're not giving the talk. Let him give Saurabh Srivastava: But the idea of their user templates was based off that thing that, okay, you can reduce it to formula and then I'll apply mathematical solvers to solve then we took it one step further to get it to SAT, okay? linear arithmetic was the additional bit. >>: [inaudible] a linear it. But So that for Okay. Saurabh Srivastava: They did not input preconditions, which was an additional thing that we did. For predicate abstraction, I don't think there was, like, over predicates getting into a boolean formula, et cetera. That's new. >>: Okay. >>: [inaudible]. Saurabh Srivastava: >>: Okay. That's part of this template of -- Okay. [inaudible]. Thanks. Saurabh Srivastava: >>: Yeah. I'm going to make sure you got the first [inaudible]. Saurabh Srivastava: Okay. So now I'll talk about proof-theoretic since, which is builds off of that program reasoning approach that we've talked about. And then we'll talk about how we infer a program and a certificate. Okay. So through program synthesis, stated broadly, is the idea for automatically generating programs for given specifications. And our approach to that is to connect it to program reasoning. And let me just motivate that. So what we have is that in program reasoning, what we had was we had unknown invariants and then we had those transitions for the statements, and that allowed us to go to other facts. But if we write it out, some of the transitions look exactly like some of the facts at the end, right? So in this particular case, there's no reason why this transition is any special, right? It just looks exactly like you have -- like the fact that you have at the end. So why should they be given special status? And essentially what we decided was that let's try to see if we can just make that a known as well. So what we have is we have our known invariants, as we did earlier, and now we have these transitions which are known as well. They probably should have a particular form, as we will see, but once we make this unknown, our approach would be, well, let's see if we can create synthesis generalized verification and hopefully infer the transitions and the transitions as well -- and the invariants as well. So this is the certificate of correctness that we need. So we have our example that we were talking about earlier. It has an input and output preconditions. We want to generate a program now with only linear operations. And our approach builds on the previous work. So we want to encode synthesis generalized verification and we want to use existing verifiers. But right at the onset we have a problem, because in verification what we had was we had a program, and we were generating a proof for it. Now we don't have a program, so what are we going to do with it? What is the input of the synthesizer? And we need to talk about that. And the input to us is this thing is that we call a scaffold. And what does the scaffold contain? The scaffold contains three parts. The first is the function of specification. You need to know what you're computing, right? So you need -- this is kind of reasonable. You need to know what the pre and post conditions are. So the function of specification is what we need first. The second bit is the resource constraint. So this is a little more non-trivial, if we may, because this allows the programmer to constrain the space of programs that he's looking at or that he wants and additionally allows us to build a synthesizer, essentially. So the resource constraint consists of two parts. The first is a looping structure. That is, does the program contain a nested loop, does it contain two loops in a sequence, and so on. This in some sense very vaguely says what the time taken by the program is. It's not the asymptotic complexity, but in some sense it's like, you know, some indicator of the time taken. The other is the stacked template, which says how many variables are there that the program can manipulate, it can manipulate one variable, thousands of them, or something. So how many local variables are there for the procedure. So these are the two bits of the resource constraint. The other thing that we need is a specification of the domain. So over those variables inside that control flow structure, what operations can you have. Can you have quadratic expressions, can you have linear expressions or something like that. So we need a specification of the domain. So once we have those, we will actually be able to -- once we have these three, which are part of a scaffold, we will actually be able to set up a system so that we can look at it as verification. And our approach to that uses what we call synthesis conditions, and those I'll describe next. So what you have is you have a scaffold which has three parts: The function specification, the resource constraints, and the domains. And from that what we can do is we can write out some basic formulas. You have the controls and structure, so you can write down some verification conditions. But the problem there will be that everything now is unknown. The only thing known are the pre and post conditions, right? But at least when we write them out in this form, we have some tools that can digest implications of this form which is our standard verifiers that we have constructed. They're not used to everything unknown. They're used to having the loop guards known, they're used to having the transitions known and so on. But at least they can digest them, right? So we can be optimistic, send them out to a verifier, these are safety constraints that we have with a whole bunch of extra unknowns, and send it over to the verifier and see what it gives us. And as you might expect, because the system is so horribly under-constrained, the synthesizer will probably just come back and give you a trivial solution. So in this case it says if I assign everything defaults, it just works, right? All the implications are discharged because they have a false on the left-hand side, and that is fine. But that does not correspond to a real program. Defaults do not correspond to a real program. So what went wrong up there? What went wrong is that we did not impose the semantics of the unknowns that we were inserting. We did not say that the statements were of a particular form, the guards were of a particular form. So we need some well-formedness constraints. And we have found that we can impose them in the same manner as safety constraints, so we get these two. You get save the and well-formedness, and now you can ask the solver for solutions again. But there will still be a problem here. So let play a game. So let's say that you are the person who's generating the save the and well-formedness constraints from the scaffold and I'm the synthesizer, the core solver, that is solving these things. And now you've given me these two, so I have to give you a program that meets the pre and post condition and has to be a develop-formed program. It has to be a valid program. But what I'll do is I'll give you back a program that always goes into an infinite loop, right? A program that does not terminate meets every pre and post condition. So it's a real program. It meets the pre and post condition, but it's not interesting. So what we need is we need that the output should be reasonable so that it does some real computation. So once you add those two, the last one, which is the termination constraints, now what we get are these three parts to what we call the synthesis conditions, and solving those actually gives us valid non-trivial programs. And this is the systematic approach to verification that we were kind of looking at. So in verification -- did I say verification? The systematic approach to synthesis that we want, in verification what you had was you had a program using [inaudible] verification conditions from it, and we have engineered these verifiers that can engineer the prover. Now what we want to do is we want to take the scaffold, generate synthesis conditions from it, and use the exact same verifier that we had from the verification domain and apply it to solve these synthesis conditions to get the program plus the proof. And that is possible because these are all encodable in the format that is digestible by these verifiers. Yeah? >>: So the scaffold is imposing some -- like a template, like a constraint on the size of the program you're going to? Sumit Gulwani: If you don't have the scaffold, then there will be no way to put invariants anywhere by having, let's say -- saying that, okay, it has a nested loop. Now you have two points -- you don't know anything inside those loops, but you have this, like, [inaudible] structure where you can put the invariants and you can write out the verification conditions. >>: So I could also just give you the context [inaudible] of the programming language and say don't build me a tree with more than N nodes. You want a little more structure than that. Saurabh Srivastava: Yeah. I mean, that's the system that we've built. But we can discuss the version that you're talking about. don't know how [inaudible]. I >>: But it's a finitization, right? It creates a -- I mean the really key property of the scaffold is not just its structure that you want -- that you say I want this followed by that but also the finite size? Saurabh Srivastava: Finite size in the sense that, well, we allow loops and we allow a loop invariant. >>: You can express unbounded computations. Saurabh Srivastava: >>: You're not going to search over an unbounded space [inaudible]. Saurabh Srivastava: >>: Yeah. Yeah. Sure. Sure. [inaudible]. >>: No, no, but the -- I understand that. But that's what the [inaudible]. I mean, but you could just say, well, because you know perfectly well because of the proof of the halting problem that we can encode anything that's constants, right? We can encode [inaudible] expressions of up to some bound as a constant, so why not search for those too? Saurabh Srivastava: >>: So it gets down to [inaudible]. Yeah, I know, I know. >>: In both this case and with the scaffold here and the template in the verification case -- let's just take the template case. When you specify a template with a given number of disjuncts, is there any advantage to be gained in imposing a higher level of search where I'm going to try all the templates up to the permitted complexity in some increasing complexity order? Saurabh Srivastava: That's certainly doable. The instance is constructed out of those, but using [inaudible] higher things, right? Yeah, that's essentially what our -- like, that was the interface to me as the user of this tool, like I try this template, it doesn't work, I try something more expressive. Like you don't want to go to something that's, like, you know, 10,000 disjuncts because the [inaudible] would be insanely difficult to solve. Right? >>: What I'm asking is for -- if it turned out it was eight disjuncts, would there be any advantage -- I'm sorry, let's say it was five, and you gave it search up to ten. Would there be any practical advantage to having it do try 1, 2, 3, 4, 5 in terms of more efficiently finding it at five versus just try the 10 first? Saurabh Srivastava: Yeah. Essentially I think the ideal scenario would be do a binary research, like go to some high-level thing. Because it's not -- the SAT's not always entirely deterministic, and, like, 10 would be more difficult than 5. Not necessarily. So you could probably just assume that all of them will take some amount of time and you can try 10, 1, and then by new search to find the right amount. But 10 would give you the solution, right? Yeah, [inaudible] increasing is just one approach. You can just give it some higher bound as well. I don't see anybody [inaudible] to that. >>: The idea that the cost is not monotonic in the size of the expression of the template, or in this case the scaffold, is a little disturbing, because it means that -- I mean, if it was, then the strategy that Dave outlined would be very logical, right? You'd start at the smallest and you'd work your way up because it would take less -- each would take successively more time, right? So you'd do the fast one first. Saurabh Srivastava: Yeah. >>: Since it's not, I guess the question becomes how do you know as a person writing one of these things that you're not going to pick some template that's going to be very expensive and sort of fall into a hole, essentially? Even if it's very simple, you're saying -Saurabh Srivastava: This is probably just a usage concern, right? Like so eventually you could have different versions of this, trying out different things, right? The problem is that when the template is not expressive enough, it has to say [inaudible], which takes a lot of time, so that could be problematic. If you go to something that is really high up, like, you know, 10,000 disjuncts, the SAT instance could be incredibly complicated. That's problematic. So you want to be somewhere in the middle. You want to try some thing. You can do it, like, sequentially as I did with my human experiments. Like I [inaudible] tried more, but you can go the other way around or you can do it in parallel if you had a system like 10,000 threads just running the different kind of template structures. That is a space that I have not explored at all. Like the coding [inaudible], well, we found that we could verify these programs and all of that, but actually using it over real life [inaudible], that I haven't really done. And probably part of the motivation why I want to come here. >>: Okay. Saurabh Srivastava: Okay. Going back to synthesis, this actually works, so you can give it a scaffold for, for instance, a line-drawing program and generate those conditions from it, send it out to the satisfiability-based verifier that we built in the first part, and then out comes a program and, in addition, comes invariants. So if you wanted to formally check whether that program is actually correct, you have the invariant there as well. Question? Okay. Okay. So we tried it over some linear arithmetic programs. For instance, we can automatically synthesize Strassen's matrix multiplication, which is kind of interesting. Everybody has seen Strassen's matrix multiplication in the undergrad, and everybody knows it can be done. The key idea was instead of using eight multiplications, you use seven multiplications, and therefore the asymptotic complexity comes down from n cubed to n2.78, but nobody remembers -- well, unless you have a photographic memory, nobody remembers what the actual computation was, right? Now you don't need to because you can just give it a scaffold, you need the output that we have proved up, you have some looping structure which is some indication that it's acyclic, and then you give it seven variables to work with, not out, and then out comes Strassen's matrix multiplication. Well, one of the solutions is that. It generates others which are equivalent and different. We can also generate a program which it's the user verifiers that that has a loop, and, for instance, to compute the integral square root of the given number, and you can give it a stacked template and it will automatically generate, for instance, the binary search, or if you give it a less expressive, it can generate linear search. We can do the line drawing example. We can automatically generate the Bresenham's line drawing as opposed to just checking that invariant as opposed to just generating the invariants for its correctness. More interesting -- well, quote, unquote, more interesting would be looking at sorting examples. So we have a bunch of sorting examples, but it's interesting for synthesis because there's one specification. You have [inaudible] and you want a sorted area of the output. So where do all the different versions come from? Well, they come from the resource constraint. So if you give it a nested loop and zero variables, out comes bubble sort and insertion sort. If you give it a necessary loop and one variable, selection sort. Of course, a template with zero variables, merge sort. Of course, a template with one variable, quick sort. And now we can generate the invariant for it and additionally generate the statements that all of them match up to the pre and post condition. Okay. >>: [inaudible] Saurabh Srivastava: Nested loop with two variables -- well, all of those would be in the template, and as I was pointing out, the first one is two solutions and iterate over those two solutions. So if you have two variables, you'll just iterate and you can generate all of them. >>: [inaudible]. Saurabh Srivastava: >>: Huh. [inaudible]. Saurabh Srivastava: things so -- yeah. Well, I wasn't really trying to get different >>: On the previous one, with Strassen's algorithms, I remember it was recursive. Saurabh Srivastava: Oh, yeah. Sure, sure, sure. No, we don't synthesize the top level structure, the recursive divide and conquer structure. We're just synthesizing the core SAT fragment which just does the computation using seven multiplications as opposed to eight. You have to wrap this around with a recursive divide and conquer algorithm which divides the matrix into four and then you do the multiplication. >>: But given that you relied on beginning pre and post conditions -- Saurabh Srivastava: Yeah, so these could be [inaudible] matrices, right? I'm just saying that this multiplied them in some order and add them in some order. So these are not elements, essentially. So the recursive step just takes these as some -- from end by end and goes to -- by two by end [inaudible] and so on. So I'm just [inaudible] the core inside there. >>: So what makes you think about this is that you are abstracting, but in the real algorithm would be [inaudible] and you're abstracting it with just a product of scalars. >>: [inaudible]. >>: Yeah. It still cares about whether or not there are other undiscovered [inaudible]. Seems like that you're that close -Saurabh Srivastava: Yeah, yeah. Well, okay, one thing that is kind of hidden in here is that this is over predicate abstraction. So that's what I meant by I didn't try other random, like -- so I didn't try the space of all possible predicates and so on. Like linear arithmetic generates any linear relation, so once we put quantification over linear arithmetic, then I would be in a much better position to say that, yeah, these algorithms are completely different. But the one thing is that it does generate -- well, insertions [inaudible] generated not the standard insertion sort. The standard insertion sort uses one extra variable. The version that is generated here conceptually does the same thing. It is not exactly the version that you know. So there will be tons of solutions that are generated which do the same kind of operations as the standard ones, but, yeah, there might be something in the space. I'm not willing to commit to that there is nothing else right now. Yeah? >>: I don't understand why you cannot generate an algorithm with, say, 00 or 11 or 22 because -Saurabh Srivastava: Like an input/output table? >>: No, no. Just because you don't need [inaudible]. Saurabh Srivastava: Oh, the accommodation. Yeah. Okay. That's a -- Sure, sure, sure. Yeah. >>: Because once you can generate a simple algorithm with just, say, 0, 1, 2, 3, 4, 5 -Saurabh Srivastava: Certainly. >>: Yeah, yeah, yeah. Certainly. Certainly. And it's very fast. Saurabh Srivastava: Yeah, yeah. So, yeah, this specification, as you're pointing out, is missing the fact that the output -- the output should be [inaudible] of the input, and that could be added, right? And then the invariants would get correspondingly more complicated and for the most part our technique would not be able to handle that. The way I got around to doing that -- got around that problem was by imposing restrictions on the operations that I was talking about, like the operation that can be used. So I said that you can only swap. I did not have [inaudible] copies in the area. So if you can only swap, you don't lose -- so if you can only swap, then it can't generate the program that you're talking about. It can't copy elements into the area, right? So if you have A0 something, it can't copy it into A0A0A0. If it can only swap, then the input/output computation thing is taken care of, and then imposing this gives you -- so you're right. If you did not have those operation things, if you did not impose that resource constraint before the operation, then you would have to impose additional things about the permutation and then the invariants would be correspondingly complicated and so on and so forth. >>: [inaudible]. Saurabh Srivastava: Huh? >>: It's also possible that [inaudible] constraints like that for programs [inaudible]. Saurabh Srivastava: the ->>: So that constraint does not talk about, like, No, I'm saying that [inaudible]. Saurabh Srivastava: Oh, sure, sure, sure. You're saying that as opposed to having unbounded quantification, you have, like, bounded, and then it actually will find kind of like the right program for the most part. Yes. And I'm sure you guys are pursuing that approach in the big practice. No? Okay. Okay. Going ahead, so here are the times taken for synthesis. The y axis is the time in seconds on the log scale, and we have two sets of benchmarks. Reason why they're similar to the verification is because we had a verifier for this and we had a verifier for that when we were building a synthesizer out of it. So correspondingly, it would be interesting to see how it compares against verification, right? So if we overlay the chart of verification on top of this, you see that for the most part in order of magnitude or two orders of magnitude for the case of this and that, you can synthesize the program in addition to verifying it, right? So you generate invariants plus the program, which is interesting because now we can focal our energies on developing better verifiers and correspondingly, we might have better synthesizers for that domain. Okay. So we were talking about this approach for proof-theoretic synthesis which essentially is leveraging the idea that synthesis is just verification with additional unknowns. And what we realized is that safety by itself is not sufficient. You need to talk about well-formedness and termination to actually get good programs out. Yeah? >>: Can you give me a couple examples of well-formedness? you're swapping, is that part of the well-formedness? Like when Saurabh Srivastava: No, no, no. That is part of the scaffold. That is the input. The well-formedness constraints are essentially, for instance -- in the example we had the solution getting the statement being fault, right? So if you have statements as traditions from output variables equal to some function over the input, that can never be fault, right? A conjunction of those where the left-hand side is prime and the right-hand side is some expression, you conjunct all of them together, it can never equal fault, right? So that was something that we need to preclude. And for that we have a constraint that says statements can -- the conjunction that you put in the statements cannot be fault. If you put that constraint in, then the statements would be of the right order, then you need to put something about the guards as well, because for it to be a real program, the guards have to be translatable if the analysis of a particular font is some constrained error and so on. So those are the kind of constraints that I'm talking about. They're not very complicated. They're just like guards and statements. Yeah? >>: So without these constraints, for example, could you quit the synthesizer generating, say, a program that generates values out of thin air non-deterministically to make things work correctly? Is that one of the constraints that you need to impose or is that -Saurabh Srivastava: So well-formedness does not talk about that. Well-formedness -- if you go and impose well-formedness, the solution that you get from the solver can't even be translated to any program. >>: Right. Saurabh Srivastava: For the most part. So are you going to look at that as a program generating values out of thin air or ->>: just work it's So it's the kind of thing that generate -- magically generate correctly. So the values have a cost [inaudible] or it reads Saurabh Srivastava: you expect a real program doesn't the values that will happen to to come from somewhere. Either it from somewhere else. Uh-huh. >>: So it just seems that that would be one thing that this constraint is probably proving out. Saurabh Srivastava: That might be one way of looking at that. Yeah, I should talk with you more to figure out what exactly that means. >>: [inaudible]. Saurabh Srivastava: Okay. So now I'm going to talk a little bit about some ongoing work where we're trying to do synthesis using our testing-inspired approach. And here we don't care about certificate that much, so that's the key goal. And the motivation for this came from program inverse. And what do we mean by that? Well, program inversion is this problem where what you want to do is you have a program, let's say a compressor for a certain input, and then you want to synthesize the inverse for it, let's say the decompressor. So program inversion was something we need to look at, and we made two observations there. The first one was, well, if you look at a compressor -- for instance, we looked at the core algorithm that goes inside the [inaudible] format, right? If you look at the compressor for that, that generates a online dictionary. While it's parsing the input, it computes a dictionary, and it doesn't output the dictionary, but the compressed output has enough information so instead the decompressor can generate the same dictionary. So if you were to look at these two separately, they have very complicated input/output specifications, right? But if you put them together, that was a key idea that if you put them together, that is, you take original program, that is, the compressor, you concatenate it with the inverse, the decompressor, then this combined program has a specification of identity. It might have very complicated things in the middle, but if you put them together and only work with the concatenation, then you are essentially get a specification of identity. So that was one key idea. The other was looking at the structure of the inverse. So it turns out that for almost everything that we tried, typically the inverse, well, somewhat looks like what the original program is, and so much so that Dykstra [phonetic] proposed that we can just read the program backwards and get the inverse. It doesn't really work all the time, it works for two examples that he talked about, but essentially that can be a good starting point, right? So what we do is we automatically mine the template using this observation. So what we're going to do here is we're going to take a given program, we're going to mine the template, concatenate these two, and that should be the identity. Right? This should be the identity program. The other thing that we wondered was, well, we're not talking -- the specifications are not very simple. So we shouldn't really talk of the invariants as well. Like if [inaudible] we just apply proof-theoretic [inaudible] which is possible to do somewhat, at least the core effect. That would involve invariants. So if the specification is complicated, the invariants are going to be very complicated too. So we don't want to talk about the invariants, so can we get a technique that does not use invariants and essentially has approximate guarantees. And for this we have this approach to -- the approach using the testing-type thing. And this is what we call Path-based Inductive Synthesis. The first two will be inspired and sort of related to what we've been talking about. We're using a satisfiability-based approach. And the last part will be this interesting thing that we'll have to add because of this bit about testing. So let's talk about the first two. So what we have, we have a template program, and for inversion, this will be partially known and partially unknown. So you have the known fragment and then you have the mined unknown fragment, and let me say that this template is both of those together. Okay? And we want that this program should have the specification identity, right? So what we do now is that instead of generating formal verification conditions, we use symbolic execution. We say that we'll run through some path in the program. So, for instance, in this case the path that goes to the bending of the loop and then just exits. And because this template program has this specification identity for everything, then it should be identity that part too. Right? So we generate through one part, we can go through another part, one that goes inside the loop once and then comes out or we can go around twice, doing the first iteration going on the right-hand side and then second on the left-hand side. And we know that for all of these traces, it should meet identity, right? So we can -- instead of the verification conditions, we can use these as kind of proxies, do the same reductions that we were doing earlier, generate a SAT instance from it, dump it out to a SAT solver and get a solution for it. So what is happening out here? What is happening is that we're generating these traces, and when you generate more and more traces, you're kind of constraining the space of programs that can possibly be there. Right? So it's some kind of pigeonholing principle going on, and if you generate enough traces, if you explore enough traces then essentially the remaining programs will be the ones that are correct. And here -- okay, that is because we're using these that could potentially be out. How do we deal with may. the core approach, but the problem is that paths, there's an infinite number of paths there, right? So that we need to figure -- how do we explore the good parts, if you And for that we have this third bit about directed path exploration. I'll talk about the high-level way of doing that, what it means. So what you have is you've generated some traces, okay? So you have a template program. You've generated some, like, one or two traces, you get a SAT instance out of it, and then you send it over so the SAT solver and that solves for some things that go inside those templates. Right? But it might not be just one solution. It might be a ton of solutions. These are candidates that work for these traces. If you were to plug these solutions inside, then they work for these templates, these traces. And essentially if you look at just these two, then nothing has been explored on this side, right? So the solution can assign anything to the left-hand side, it can assign any equality predicate. It can assign x is equal to x plus one for all you care, even if it doesn't work, right? Because that will work for those two traces. >>: Did you mention -- is this the [inaudible]? Saurabh Srivastava: No. That's not my table. >>: It sounds very -- I'm sorry, because it's ringing all sorts of bells. It sounds very similar to what [inaudible] is doing on these examples using -- generalizing from examples on that table. Saurabh Srivastava: So I have not read that paper in its entirety so I'm not the best person to comment on that. But this is over paths. >>: Okay. You get a path from giving an input and running your program. So the path is just a scenario you get from an example. Saurabh Srivastava: Sure. But like the input -- in our case the input is not constrained. It's whatever inputs take that particular path, right? So if you just give it one input, then that will take the path, but another input could take the exact same path. We're generating constraints for all those possible things that go with that. So in some sense it's a broader generalization. And it is related. I'll talk about the spectrum of things. But Sumit's technique is on one extreme of it, and this is a generalization. Every input that goes through that path has to have [inaudible]. >>: Okay. of stop. But you still have some issue of how you're going to sort Saurabh Srivastava: >>: Yeah. Yeah. That's coming up in a second. Okay. Saurabh Srivastava: So we have these different candidates which work for the traces that we have. And essentially what we really want to do is we want to prune the candidate set. We're not concerned about generating like 10,000 traces, as many as possible, we're just interested in prune this candidate set to this one candidate which actually works. Right? So what we do is we take one of those candidate solutions and we instantiate this template with that candidate solution. Okay? So that candidate solution might say that, okay, on this part you should have x is equal to x plus 1 or whatever. Now you instantiate this template by putting that x is equal to x plus 1 there and you have a real program. And now what we do is we do some kind of parallel symbolic execution over this and this, essentially just do symbolic execution over this real program, but generate a trace over the unknown. Okay? So we do this thing where we make our decisions on which side to branch on this instantiated program, but generate the trace over the unknown, and now what we get is a new trace which is relevant to this instantiation. We get a trace that is relevant to program 2 and if the program 2, the solution, is not a correct inverse, then that trace will add constraints to the system which will eliminate that candidate from the system. And we keep doing this. We take each solution that we have in the space remaining, instantiate the template with that, and then keep iterating until we get only one candidate. At that point in time we can terminate, but we can keep going and essentially that will boil down to sort of like a testing-based verifier where you keep generating new and new parts which refute or just -- or just reinforce the program that you have left. Okay. So we have applied this to program inversion. We've been able to synthesize inverses for a bunch of compression benchmarks. So, for instance, random encoding is the trivial simple example where what it does is instead of -- if you have an input that says 200 zeros, instead of writing 0, 0, 0, 200 times, it writes 200, comma, zero, and the inverse for that would be fairly straightforward, but it involves a different -- a little bit of a different structure, and therefore it was interesting to use as a starting point. LZ77 and LZW are the core algorithms that go inside the [inaudible] format, and we were able to synthesize decompressors for those. We were able to synthesize inverses for formatting benchmarks and also for some linear arithmetic -- these are kind of image manipulation benchmarks which rotate, scale, or shift image, and that is a matrix computation and therefore is a linear arithmetic template, and we can synthesize the inverse for that. Yeah? >>: Does the performance of the synthesizing versus -- is it comparable to the original, or is it different? Saurabh Srivastava: So I put a core off the algorithm that goes inside. So the performance -- yeah, those are -- like the actual algorithm is 10, 20 lines. The thing that actually goes inside, if you look at the library or something, that's probably 100 to 200 lines. I mean, that has been optimized way beyond what we synthesized. So, yeah, we're not doing any [inaudible] rolling, we're not doing any, like -- yeah, we're not doing anything for performance right now. >>: [inaudible]. Saurabh Srivastava: Huh? >>: [inaudible]. Saurabh Srivastava: I manually check that it's the correct one. did not actually run it. >>: [inaudible]. Saurabh Srivastava: >>: Oh, you mean run it to check performance? Yeah. Saurabh Srivastava: >>: I Oh, no, I didn't do that. [inaudible]. Saurabh Srivastava: Possibly. Okay. So the other thing that we tried was -- so program inversion was this thing where I was talking about taking a known program, having a template, and sequentially concatenating it. What we also wondered was can we do that for parallel composition. So client server synthesis. Can we -- it has somewhat of the same flavor. You have a client, you have a server, they run in parallel, they do communication in between. If you look at them separately, they have this complicated specification because of this communication that happens at the end of this, but if you combine them, then the messages communicated are all internal, and you can use a testing-based approach to say that, okay, the combined specification is very simple. That is, like, you know, some value from the client goes to the server or some area which represents the disc contents goes from the server to the client or someplace [inaudible]. And we run it over the components of the FTP client and we've been able to synthesize the client from the server. So this is approach, a testing-inspired approach, to synthesis which uses symbolic testing as a proxy for formal constraints, and the detail here is this path exploration that you need to make sure that the path that you explore are the relevant ones to pruning the space. So in the remaining time I just want to go over some future directions for the short-term future that I want that -- that I'm interested in. So one thing that I've been thinking about is using synthesis as sort of, quote, unquote, auditing compilation where you have a program and it communicates with the environment, and there is a specification for what the interface should look like. And what we might be able to do is say that, you know what, if a communication does not meet the interface specification, can we synthesize a wrapper that allows us to take that program, and if you wrap it inside this thing, then it will meet the specification. And what I'm thinking about here is probably in the context of security. For instance, information flow. So if it, let's say, leaks some information, the interface verification could say that the variable's high level should never go to the output. So can we synthesize a wrapper that sanitizes the output or obfuscation or tamper protection. Can we synthesize a wrapper that obfuscates the execution such that the environment cannot infers some properties about it. So that is some potential that I might be interested in exploring. For instance, for the case of information flow, Merlin, which is Ben, Aditya, Sriram and Anindya, that tells us paths in the execution that do not have a sanitizer on it. So instead of just out putting that, oh, this program is unsafe now because that part does not have a sanitizer, can we synthesize a sanitizer that we can insert now. So that's one thing. The other interesting bit might be can we use a synthesis to take a program that is kind of generic and then add some architectural constraints or limitations and parameterize the synthesizer using it to get an object-specific program. And what am I thinking about here? Well, for instance, these constraints could be over memory. For instance, you have embedded systems which are memory constraints. Can we synthesize -- just from the C program, can we synthesize a program that runs on the memory-constrained system or some process configurations. Well, for the cell processor, can we take a C program and generate a program that has two components, one for the PP and one for the SP, can we add some constraints about that to network communication. The client server synthesis is some preliminary stuff that we've done in that context. Can we do it better or synthesize from a sequential program or distribute a version for it with the same semantics. We can keep -- first three, well, they look like they might be doable. We can keep going. Can we add constraints about operations. Map-reduce computation has this domain where everything has to be key value pairs. Well, can we impose that. Can an algorithm be taken from a sequential C program, can we take it to a map-reduce program. Energy efficiency. Now it's getting really vague. So energy efficiency, can we add constraints for that. Or performance. Actually, I'm trying to do that as part of a course project. Let's see where that goes or -- well, keep adding. So essentially the big question here is can we have formal constraints that encode those things. Well, memory, we kind of did that for the stack. Operations, they're kind of like -- they're not exactly process configurations, but they're telling you something about the computation that's happening. It might not be possible to do that for everything, right? And taking [inaudible] work, we can do -- if we can do formal constraint encoding, we can possibly do post-generation filtering. So we compute some candidate programs, and then based on whatever criteria we have, we might want to filter them out. Okay. So that is one thing. The others, domains -- I'm most interested in applying this to other domains, and that is -- well, yeah. So, for instance, functional programming. So [inaudible] Appel says that SSA style is functional programming. So does that mean that synthesis over SSA style implies that we have synthesis for functional programs? Well, I don't know. The heap is certainly something that I have not addressed even for reasoning or for synthesis, so I want to talk about the heap. Because I use SMT solvers, well, it might be reasonable to talk of the approaches that have come out of MSR to reason about the heap. For concurrency, again, a non-explored domain. And I'm talk to Viktor about using a satisfiability-based approach to infer the assumes and guarantees that he has in his deny guarantee work. Certainly interesting is probably modular synthesis. And here what we want to do is we want to talk a top-level specification that we have, we want to synthesize some functions based on some interface that is provided by a lower level and then keep doing this until we get to the bottom. Now, that in general will be difficult because you do not know what the interface will be. So can we talk about co-synthesis of hardware and software? Because there the interface is very well specified. The hardware has a particular interface that it can give you. Can we synthesize on top of that? And also can we mix this verified and testing-based approach? So let me just draw a diagram for what the spectrum looks like. On the one hand, you have an inductive synthesis which kind of generalizes from instances as you were pointing out. And on the other you have deductive synthesis which refines from specifications. And deductive synthesis is essentially -- all of that stems from Manna and Waldinger's work from the '70s and '80s, and essentially that is pretty much manual. Proof-theoretic synthesis probably lies there. It is an automated version of that. On the other end of the spectrum you certainly have a sketching, [inaudible] work, and so much work I think falls there. Path-based inductive synthesis, somewhere in the middle. And what I'm wondering is that can we explore this space, you know, combine this and this or that and that, you know. Okay. So just to conclude, I think satisfiability-based -- a satisfiability-based approach is very -- has a lot of potential for expressive program verification instances, for building expressive tools for program verification instances. And all of what I talked about here is available on my web page, and -- yeah, with that, I'd like to conclude. Thank you. [applause]. Saurabh Srivastava: Yeah? >>: So you didn't mention in the future work sort of any focus on trying to understand the scale of the existing stuff you've done. In other words, sort of if you took what you're doing now, say inverse, and, you know, you wanted to make it useful, so what would it take, are you interested in doing that? Saurabh Srivastava: Yeah, yeah, certainly. I did not have a lot of concrete stuff to say that, so that's why I removed it. It was in my early version of SAT. So one thing that I sort of glossed over is that this approach for reasoning that we have is massively parallelizable. So one approach for scalability could be can we just do the reductions that I'm talking about, each one of those verification conditions reducing to a boolean, parallel boolean formula, can we do that in parallel. So that could be one approach to addressing scalability. But certainly that's very much on my list. I'm interested in doing that. We will have to explore ideas about how this needs to go forward. Sumit Gulwani: [applause] Any other questions? Okay. Let's thank our speaker.

Sumit Gulwani: Okay. Good morning, everyone. It's... pleasure to introduce Saurabh Srivastava, who is a Ph.D. candidate...

Related documents

Products

Support

Sumit Gulwani: Okay. Good morning, everyone. It's... pleasure to introduce Saurabh Srivastava, who is a Ph.D. candidate...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib