>> Leonardo de Moura: Hi. It's my pleasure to introduce Marijn Heule. Marijn is from the University of Texas at Austin. He's very influential in the SAT’s community. He is one of the editors of the Handbook of Satisfiability, the author of a very sufficient SAT solver called March. He also was an advisor in the last SAT competition and today is going to tell us a little bit about his latest work. Thank you. >> Marijn Heule: Thanks for the introduction. My talk today, if anything is not clear, ask questions. My talk today is about unsatisfiability proofs and so I, before I started working on this there was either, it was easy to make proofs but they were very inefficient to check, or they were efficient to check but they were hard to create. And I present work here that has bridged the gap here between both favorable properties. This is work together with Warren Hunt and Nathan Wetzler at UT Austin. A short outline, so I've studied some motivation and contributions and talk about the different kinds of proof styles that there are, resolution and clausal proofs. I go into detail about it. The big disadvantage of clausal proofs is that the current methods were very inefficient to check them and so I'll explain here techniques how to check them efficiently. I also will talk about expressive proofs. That's all the techniques that are used in the state of the art. In other words how can we express them. In the end, I will just before conclusions I will mention kind of the future direction that we actually want to have a mechanically verified checker that does all of the things that I will talk about. Satisfiability, especially as you know here at Microsoft they are used in many tools [indiscernible]. They are used to find counterexamples and then if you have a SAT benchmark, you get the solution for it. It is used for many equivalence checking problems for miters, but there are also, you have more bigger answers which kind of say if they are used for diagnosis that you ask what is the small explanation for why it's unsatisfiable. And what we are interested in also in Austin is we want to have a small proof for benchmarks which are unsatisfiable such that we can check it with a mechanically verified checker, so we can trust the output of the SAT solver. Although there are lots of positive news and they're used in so many applications, there are also some negative news. One of them is that there are bugs in SAT solvers. And not only in SAT solvers, but SMT solvers and especially QBF solvers. Even winners of the competition have bugs so last week I was at FMCAD and there was the [indiscernible] Mobil competition and there were seven rounds of bug fixing allowed to, that solvers kind of were [indiscernible] on… >>: How many solvers did not have bugs? [laughter] >> Marijn Heule: The problem is that you cannot really trust the solver, but you want, the whole idea is we want to trust the output of the solver, so that we cannot be sure that the solver always give correct results, but you want to emit something that you can say okay. For this result I can trust because of this and this and this. And you don't have to trust the solver. You only have to trust the results for this specific things that you are interested in. Does that make sense? >>: Well it seems like if you can always use the output then in some sense you could defend that as fewer bugs than you care about. >> Marijn Heule: Yeah, but you will check the output for each instance, so you have a problem, you have a solvable problem -- I will show you in a slide soon. Once you have a problem, you solve it by solver and then it gives you some kind of proof and then you have a checker and you have the same problem and the proof and then you check whether the proof is correct. So you don't have to trust the solver, only if the proof is incorrect, then you don't know, so the solver might have a bug, but if the proof is correct then you have a proof of the problem. So the answer for the problem is you know. Does it make sense? >>: So you are going to talk about how the checker [indiscernible] >> Marijn Heule: Yeah. It's the final step, but it's slide 35. It's about where you have a formal proof for the checker and therefore you needed a formal proof for the proofs emitted, certified proof, but yeah, there are several steps in between. The main contribution for this work that I will talk about is a tool that can efficiently validate results. I didn't talk about this yet, that for many of the, all the existing tools that provide these two things, so unsatisfiable course or small proofs they use a lot of memory and with a lot of memory these tools for benchmarks used in the competition if you turn on this emission memory consumption increases up to a factor of a hundred. Due to this memory increase solver slowdown found eventually go out of memory in a relatively short time. The tool can efficiently check results of a given proof but it also can produce these results without requiring much memory. That's what I will talk about. I guess everybody here knows about satisfiability problems so I don't have to go too much into detail about that, but given a Boolean formula is there an assignment that satisfies it. Throughout this presentation closest will be denoted by blue boxes and I will be free Boolean variables a, b and c. Negative literals are denoted by negation, so the first clause is not b or c. It is easy to check whether an assignment satisfies this formula for this assignment. Not a and not b and not c, we just scan over the formula to see if each clause has one element in the assignment there. I guess this is clear for everybody? Unsatisfiability is much harder and it is done by checking clauses that will be added to the formula. Most of the techniques can be expressed using resolution steps. Resolution works as follows. You have a pair of clauses such that they have a complement that is literal, for instance, b and not b and then what we can do with the resolution rule is we take the union of the clauses and remove the complementary pair. Resolution with not a and b, not a or b and not b or c gives us not a or c. The same thing we can do with another and your purple circles mean lemmas which mean they are not original clauses but have been, they can be added by resolution steps. So on the left there's a resolution derivation but I will use this notation throughout the presentation which means that we can constrict lemma c using resolution using these clauses but we don't specify how. >>: I guess we do because [indiscernible] >> Marijn Heule: Yeah. So you can reconstruct it but this notation does not specify exactly how you do it. This notation specifically specifies how the resolution steps are so there is not an order in which the resolution, so you can give an order, but if you look at this you don't know exactly how to do it. Of course you can reconstruct it, you can reconstruct it in linear time, but it's just not notation. >>: But an arbitrary solution [indiscernible] reconstruct one more time. >> Marijn Heule: No, not arbitrary, no. >>: So there's some, if I recall correctly you have a restriction which is that the steps have to be reconstructed by unit propagation. >> Marijn Heule: Excellent. That's also the restriction I will use. I will come back to that later. >>: Maybe I'm anticipating too much. >> Marijn Heule: In this case I just want to say that we don't specify it like, all the drawings will be like this and not like that. This is what we just talked about earlier, kind of the tool chain. Although, it gets a bit more complicated, but you have the Boolean formula; you give it to a SAT solver. If it's unsatisfiable, you give you also require that it, to some proof of unsatisfiability and there can be several kinds of proofs and the proof and the original formula are given to the checker. You don't have to trust this. As soon as this is correct and this is correct then you trust this or this is verified then you are done. Is that clear? I will also talk about redundant clauses a lot. What is a redundant clause? If you add it to the formula it remains [indiscernible] satisfiable. If it has solutions adding the clause, there will still be solutions and also for removal if it's unsolved and you remove the clause then it remains unsolved. It's a more general notion than a logical equivalence and that's necessary for some of the properties that I'll discuss later. It's just if it has solutions and you add it there still are solutions. There might be fewer solutions and if you remove it if it's unsatisfiable then it is still unsatisfiable, but if it's satisfiable when you remove it, you might introduce solutions if you remove it. What is a proof trace? You have a formula and a proof and so the formula everything is, they can be of arbitrary order, but the proof there is an order of sequence, a sequence of lemmas. How does it work? We have a formula and we have to do some checks that say the formula we lift a clause from the proof for satisfiability equivalent with the original one and then we add clauses until we add the last clause in the proof. And we call it proof of refutation if the last clause in the proof is in the empty clause, which is denoted by this symbol. So if that one is at the end, then the empty clause cannot be satisfied so if the empty clause is redundant the satisfiability equivalent then we know the original formula is unsatisfiable. There are kind of two kinds of proofs and also I won't go into detail here but in the next slide one is resolution proofs and the other are clausal proofs and there are kind of four properties you kind of really like to have in a proof format. What is the properties? You want a proof to be easy to emit so you don't need lots of change to results over to emit, to obtain a proof and for resolution it is, you can acquire obstacles to implement it in the state of the art SAT solvers. For clausal proofs it's very easy. I will show it later. You want to have them [indiscernible] and compact is for competition results is kind of a few gigabytes which is what clausal proofs can do, but resolution proofs can quickly grow to hundreds of gigabytes. You want things to be checked efficiently and so the resolution proofs are big but they can be checked efficiently. And clausal proofs are existing methods are expensive to check clausal proofs, so our contributions mostly focus on this and by adding deletion information which is a feature of the solvers, you can make check-in much faster and here we have a percent more techniques to really speed up the checking and so then we combine each emission and compactness with efficient checking. What you also kind of want is you want to have things expressive which means you want to cover all existing techniques in SAT solvers. In a proof format, so resolution and clausal proofs have the same kind of expressivity and they can cover lots of techniques, however, there are several techniques that are not covered. A case I presented this year another format which is very expressive, and this is kind of the state-of-the-art so in the future I want to combine everything so that I have it all. I have also the expressivity here but it's kind of more tricky to get efficiency and expressivity. >>: [indiscernible] proofs are very [indiscernible] >> Marijn Heule: Yeah. They cover extended resolution. >>: And so what about QBF and proofs? >> Marijn Heule: I'm not talking about SAT solvers. I think I can… >>: So there are certain extensions of resolution and QBF is [indiscernible] >> Marijn Heule: Yeah. >>: Does that fit in these frameworks or [indiscernible] extension? >> Marijn Heule: I think it all fits in but there, so these RAT proofs I think can also be used for QBFs but the procedure of checking them becomes more complicated because you have to also take into account universal reduction and you have to take into account the order of the quantifiers, so the check becomes much more complicated. I'm just working out where everything works and that you can do it. But I think for QBF it would be very nice way of having certificates because if you, remember I call the solutions, it's very expensive. Having a clausal proof would be much more complex, so these RAT proofs is resolution and unit propagation combined and then you also have to take into account this quantifiers and universal reduction because that's also a well-known technique used in the QBF solvers. You want me to, did that answer your question enough? >>: I think so. I haven't [indiscernible] believe it was about [indiscernible] QABF resolution and then QBF would be an extension, but you say it somehow it fits in. >> Marijn Heule: It somehow fits in, yeah. I just submitted the paper that these, about these procedures for clausal elimination and clausal addition which kind of describes how to, how things should be changed in order to let everything work for QBF. >>: Okay so [indiscernible] >> Marijn Heule: But this was kind of more in how, for instance, it's used in blocker. It's the main pre-processing tool for QBFs and they use also these kinds of procedures, but it's kind of, I explain how it works and that they are sound and I still have to kind of work out how it's actually working for proofs and how to make a proof checker for QBF because it's much more complicated. >>: Do you consider [indiscernible] proofs [indiscernible] >> Marijn Heule: What exactly do you mean? What kind of format do you… >>: [indiscernible] proofs for [indiscernible] >> Marijn Heule: Ideally, I would like to have this kind of universal kind of proof construction as possible, but I think the way is to go from SAT maybe to QBF and then to go up instead of going down. I think first solve the QBF version and then kind of handling relief into more complicated clause. >>: [indiscernible]. >> Marijn Heule: I am not aware of all of the techniques used in all of these kinds of solvers. [indiscernible]. Discussion about resolution versus clausal proofs. So what is, so I will talk about resolution graph. The resolution graph is on the bottom. It has all of the clauses and it has a sequence of clauses and the sequence is based on when the solver learns the clause. It has incoming arcs of clauses that require clauses or the lemmas that are required to prove a resolution. And it's a refutation if the top clause is the empty clause. A resolution proof is just a simple description of the graph. It has for each line it has an identifier for each of the clauses and lemmas and additionally it has all of the, describes all of the incoming arcs. The size of a resolution proof is the number of vertices plus the number of arcs, which is big because, so here, this is a small proof, but typically there are millions of clauses and on average what I will show on the next slide there are on average 400 incoming arcs in a proof per lemma, so the size is pretty large. I also used the word core and core is all the lemmas or core clauses are clauses that can reach the empty clause in the resolution graph and core lemmas are lemmas that can reach the empty clause. For this resolution graph all of the clauses are in the core and not A is not in the core because you cannot reach the anti-clause starting from not A. One of the main problems is that they are huge and the other one is that they are hard to emit these proofs. To give you some details. Here is the number of clauses and here are the number of literals. This is the number of literals average in learned clauses which averages about 40 and here is the number of on average the number of incoming arcs for lemmas in the proofs. What you can see is that there clearly are many more incoming arcs than literals in the learned clauses. In order to have a much more compact proof, you actually only want to store the reds instead of storing the green. And that's the idea of clausal proofs. You only store the literals in the clause and not store the incoming arcs. Based on this data you can see that resolution proofs kind of are on average kind of effectively ten larger but if you look to the memory consumption it's going to be the effective a hundred larger and that's because clausal proofs are immediately dumped to discs and for efficiency resolution proofs are kept in memory, so you have these big things into memory. The idea is only store the red dots and not the green ones. To come to your point how do you now reconstruct if you only have the literals in the clause but you don't have the incoming arcs, how do we reconstruct the incoming arcs. You do unit propagation and unit propagation the clause is unitive or its literals are falsified except for one and then we can affine the remaining literal to true. And then we do that until fixed points. For example, we have five clauses here and we have the simple assignment not c then there are two units clauses, not b and a, and we just, for instance, take one clause to extend the assignment. We say, for instance assign not B to true which creates another unit clause not A. we now take another unit clause, for instance A, assign it to true. Accent the assignment and now we're done. There are no unit clauses anymore, but notice that there is one falsified clause, not A and B. this procedure can be used to reconstruct the arcs and it works as follows. If the procedure calls for reverse unit propagation, so the main idea is that we assign all of the literals in the lemmas to false and apply unit propagation. If unit propagation causes one of the clauses, existing clauses to become false as well, then we can reconstruct the arcs with all of the units propagation steps. If we assign not C to false and then we have these two units which falsify this clause. And the reason why we can add lemmas, we are now able to add lemmas, is that if this clause is falsified then one of the antecedents is falsified so the only way to satisfy this is if all of these clauses are satisfied this must be satisfied as well. So therefore, this is the reason why this clause is redundant. Is not clear? Now we have a clausal proof. >>: [indiscernible] >> Marijn Heule: This procedure, no. The one that I use for [indiscernible] will be later on in the presentation and will change the set of [indiscernible]. But this is keeping the logical equivalent [indiscernible]. This is a clausal proof, so we have here the order of the clauses. The order that they have learned and we are going to check them and we check them by the procedure I just explained. We assign all the literals, all the literals in the lemma to false. Do unit propagation and we check whether there is a falsified clause. In that case we reconstruct the arc. And the same thing we do for not A. Again, unit propagation falsifies one of the other lemmas. We do the same for not C, assigning C to false which creates this and in the end we apply unit propagation to all of the clauses in the proof which should result in a falsified clause. This way by applying unit propagation we were able to reconstruct the arc without actually storing them and as soon as you check them you just throw them away or you can store them [indiscernible], but you don't have to keep anything in memory. The biggest disadvantage of this technique is that the existing methods are expensive to validate, to reconstruct the arcs. Now three improvements, how to do it more efficiently. The first one was already proposed in the original paper. The procedure I just showed you was described in a paper by Eugene Goldberg and Novikov in 2003, so it's a decade-old. They also proposed to do the other way around so start by validating the empty clause and go backwards and if you use conflict analysis after you find, reconstructed the arcs, then you can mark the clauses that have got an arc and you only have to check the ones that have been marked during the process. The advantage is that you have to validate much fewer lemmas. The disadvantage is why I initially was against it is it is much more complex because you have this conflict analysis procedure and the checker becomes twice as large. In the end it is too much of a gain to really kind of ignore it. Although it's a decade old, there was no procedure, there is no fast checker of this procedure available, so this is the first kind of contribution we have is that we have a fast open source implementation of this. But even with better checking it's a 10 to 20 times slower than actually solving the benchmark, so we have decided that the solving time is much less than actually checking it. What is the main reason why checking is much more expensive than solving is that besides solvers learn lots of clauses, they also aggressively delete clauses. By aggressively deleting clauses they use, the unit propagation goes much faster and as a consequence, yeah, that's what you actually want to have in the proof as well. So you want to not only exploit to learn information, but you also want to exploit deletion information and this is done by introducing gray boxes. Gray means the clause is deleted. After you learned not B as a lemma and there is not B or C in the formula you know that you can delete not B or C because it [indiscernible]. Each assignment that satisfies this also satisfies this. So in the FMCAD paper we show that you can also combine this by ignoring certain clauses by going backwards and also a nice thing is that we show that you can optimize clause deletion information during checking such that you, the tools that we provide doesn't only heavily shrink the formula which go for the proof which can kind of frequently 90 percent reduction, but also optimizes deletion information so that if you want to check with a verify checker, it's much faster. And the third optimization is so-called core first unit propagation. While checking proofs it does this backwards checking and only 10 to 20 percent of the lemmas and clauses actually will be in the core of the final, final core. Doing all of this unit propagation over those 80 to 90 percent lemmas and clauses is completely useless because you will never use them, so you want to avoid doing unit propagation on them because it's very expensive. The idea to do this is to have unit propagation first only on clauses that are already in the core and do that to fixed point and if there are no clauses to propagate in the core you stop or you try to find if there's a single one which is not marked. You propagate that one and you see if there are now again core clauses that are used. Yeah? >>: So when you start with the [indiscernible] you don't know what the core is. So you in the first round you might have to [indiscernible] and then you proceed to say we use whatever was scanned before and then you scan more. >> Marijn Heule: So if you have an empty clause which is always marked during the start, then you do unit propagation and only step three will be done, because there are no clauses in the core so you, everything will, so it's more expensive to, so for the first step it's not a very good procedure because it's more expensive because you first constantly check are there any core clauses which are become unit, but because there are no core clauses. >>: Right but I was wondering how you make step two additions in the bootstrapping case. It seems that step two you would have to examine the entire trail. You do that in the, looking at by traversing the trail in reverse order do you take the given clauses as [indiscernible] >> Marijn Heule: So what I do is there are two pointers so as soon as the variable is propagated I mark this as to examine and I first go over, so there are list of clauses in the core that are not in the core and then I first check the core and then the other one is kind of, as soon as this one reach the end and all of the variables have been checked for the core clauses, then it starts doing now the same for all the variables it starts to check and examine clauses of the known core clauses. >>: I guess the question is in the first case two has no core clauses so [indiscernible] >>: So [indiscernible] watch clauses that all [indiscernible] >> Marijn Heule: So there are two pointers. There is this trail so we’re counting all the variables that have been set to certain value that are assigned. And for each variable in the order they have been assigned I check are there any core clauses becoming unit at this point. For instance, first for variable A and then there are no core clauses than I go to variable B and then variable C. And you cannot get… >>: [indiscernible] >> Marijn Heule: In some cases you can even get this effect of five speed up because it's really kind of ignores all of the things that, all the clauses that you can really afford seeing them at all, because doing this kind of, having this first loop checking only conflict, only clauses in the core already gives you a conflict so you never actually reach point two. So [indiscernible] you propagate, propagate propagate hit the conflicts at step one and then you stop. Another nice feature of doing it like this is because you can really try to postpone pulling clauses and lemmas in the core because you propagate first on all of these, is that the actual cores both for the core lemmas and the core clauses these will be smaller. You say I really want to avoid pulling things in because as soon as you randomly do unit propagation then you easily pull clauses in which are not in the proof yet. Now I will explain the same proof again but now with the three optimizations. On the left we have now the same proof, but we have these two clause deletions inserted and we will do the proof backwards and we will do it with the core first unit propagation. The first step will be we mark the empty clauses in the core and all the clauses occurring that will be at some point deleted in the proof will be removed from consideration. That's why, they are now gone. The first step is now we going to backwards we check the unit propagation on the empty clause and which, for example, uses these four clauses to find a conflict. Then all of the clauses which are involved are marked. Next step is we continue using C and we reuse as much as possible which is the not A and then we end up with having A or C pulling into the proof. So when you use clause deletion information, so actually when you do backwards if you find a deleted clause then you bring it back. Now we check not A and we bring those two clauses in the core. Now the deleted clause not B or C is brought back and in the end we have not B but not B is not marked so we don't have to do any check; we can ignore it. Then we are done. Notice it is kind of the same proof we had with the, for the other example, but what the other example had no B, C, and the empty clause in the core and now we have not A so the proofs really change, but also notice that in the other example we had all clauses in the core but now not B or C is not in the core. And this is also what happened, so we really by, you do not only kind of check but you can reconstruct the arcs, but by reconstructing the arcs you can actually have a smaller proof and a smaller core and so it's not only that you will have information that you might think okay, if you do everything during solving and you do it afterwards you lose information. Actually, it's better to do it afterwards because you can really exploit it by doing this alternative unit propagation and other things. >>: [indiscernible] C and [indiscernible] check that C is correct. What's [indiscernible] uses B [indiscernible] >> Marijn Heule: Yeah. So actually this [indiscernible]. So maybe, so as soon as you start with not A, everything above it is deleted. Maybe it's just to keep the picture but you cannot use, you are not allowed to use any clauses that are higher in the proof to be used for unit propagation, so as soon as you go down, so if you check the clause or lemma, it's out. So you verify it and it's out. You can only reconstruct lemmas if they are below in the proof. So I implemented, getting this deletion and clausal proof information into the Glucose solver which was the 2012 challenge and the nice thing is that you can have this deletion information everything output to disk with say 2 or 3 percent overhead to the running costs with only adding 40 lines of code and using old techniques which are using Glucose, so you have all the preprocessing everything and the lines are actually very simple. You have a close database management part in the solver and then the only thing what you do is if there is a clause added to the data base you print it and if a clause is deleted from the database you say I delete it and that's it. You don't have to do any more kind of tricks. The only reason it's 40 instead of 20 is because the procedures are slightly different for the preprocessing and the solving. It will have been the same if you could do another 20 lines, which is okay. On the, I check it also with Picosat. Picosat is kind of the state-of-the-art solver for resolution proofs. This is the fastest solver for which has all techniques for resolution proof enabled and if you run them both with proof login so Picosat solves the first solution proof logging and Glucose with DRUP logging, then what you can see is that on the 2009 benchmarks from the SAT competition it's Glucose. This is [indiscernible] plot which means all of the benchmarks for each line are sorted based on the y-axis which is the time in a log scale and so what you see is that the Picosat is much slower and one of the major impacts here is that memory consumption increases a lot because of enabling the resolution proofs and Glucose memory output is exactly the same because the only thing what you do is you dump across to disk when you learn it or when you remove it, but there's no additional memory consumption here. And the third line is the two line implements to check the DRUP proofs which does the checking which also makes, computes the smaller cores and the smaller proofs. This is the time to check the proof. So this does not include the solving time so this is kind of what you see is kind of similar, the cost of checking the proofs and solving the proofs, solving and generating the proofs. >>: This is 100 times faster the first time [indiscernible] benchmarks? And then you say it solves twice as many? >> Marijn Heule: For kind of a given time yes. [indiscernible] so if you take a life for a given time Glucose is almost twice as fast, so this is kind of what I is, you know. But as you can see the running time for Picosat is very high and so there is a point where, so for 900 seconds lots of these benchmarks, I think a third of the instances are going to be solved more than the first because of the memory consumption. >>: And you will [indiscernible] the checking which is ten times slower? >> Marijn Heule: Yeah, yeah. For here there are kind of, there are a few exceptions, but for most of the benchmarks, so up to here that's, it's kind of close. >>: Can you explain why it has this jump? What's the bottleneck? Why it becomes suddenly ten times slower? >> Marijn Heule: The bottom line is I think, I looked into it and because of what you kind of see in one of the solutions is for most benchmarks it's kind of this core first is really helpful, but for other benchmarks it's really, both the loops constantly interact with each other which makes things much more expensive and I think most of these benchmarks here, they are enormous and there are only solved with preprocessing and preprocessing added everything is already resolution and somehow the solver with unit propagation can really, it might be there are a million variables in it. It does unit propagation on 100,000 to check only that there [indiscernible] be assigned to get really the conflict. So yeah, the most important exceptions are benchmarks that can be solved with almost only preprocessing that has kind of millions of clauses that with variable elimination can be solved, but the proofs are big because, so if you vary by elimination, yeah there's just a single resolution step, but then, all the resolvers you add and then you remove all of the original ones and yeah. So frequently what you can have is you add the thousand, remove a thousand and one and then adjust to have one single step. If the loop is kind of slow, but I guess it can be fixed, so with the current free IDs or [indiscernible] that I showed its already kind of effective 20 faster than the original kind of procedure that was proposed before. But I guess this line can go further down. >>: So would you then [indiscernible] in a given proof set to avoid the [indiscernible] >> Marijn Heule: Know. I think there should be just a more sophisticated unit propagation routine. You should somehow be able to easily detect that there is a clause falsified. What you have when you do this variable elimination you assign everything to false and there is then, if you assign one variable more because everything is, I mean you are using resolution, so if you find one of the variables that, for which the resolution is done you assign one more variable and then another clause is falsified. Unit propagation is just with a single, adding a single unit propagation step there is some clause falsified but yeah, maybe changing data structures sorting things you can immediately find this, but now sometimes instead of assigning a single variable you assign 100,000 of them before actually finding it. So there is a lot, I think there is something to gain there. Exactly how to do it, I am not sure, but I guess making the unit propagation better would fix this. Already, you see with the core first you can gain a lot. I think there are other things. And the interesting thing is that if you were to just implement this core first implementation as I have it now in the SAT solver it would be kind of twice as slow, so it's really, so the checking how to do efficiently the checking in the checker is completely different way of thinking than doing fixing the solver because it in a checker you know that -- as soon as there's no conflict the proof fails. So you really know okay that at some point there must be a falsified clause. While in SAT solving you expect that unit propagation would return a fixed point where there's no conflict in most of the cases. >>: [indiscernible] what the graph doesn't show is the different proofs that [indiscernible] find [indiscernible] will find [indiscernible] learned clauses. In your example, proving doesn't see [indiscernible] >> Marijn Heule: This is not the only slide with results, so let me [indiscernible] this is also one of the slides so this is for the original formula. This is number of clauses and so this is all the benchmarks they could both solve, Glucose and Picosat to have the comparison, although as you can see the running times were much different again. This is the number, so you can see the size reduction after a single, so you solve the benchmark, check it. There's a big difference between the original size of the formula and the size that is given after one step and you can, for instance in tools that you can satisfy minimum satisfiable cores it's [indiscernible] this procedure. So this is the Picosat proof which is in memory which uses kind of the data that's still available in the solver at the point of constructing it. If you do Glucose with the backwards checking but without the core first then the size of the proof is as in the red line and the green line is what you get when you apply the core first. Kind of to show you the proofs get smaller when you use this core, so you use the alternative unit propagation. Is this part clear? So for tools that want to have minimum and satisfiable cores you really kind of want to take this step. Especially the first step is very easy, but getting proofs really smaller from kind of after finding a single proof can get much harder. This can be a big step in actually computing minimal satisfiable cores because there is at some point you need to really delete clauses one by one with a single SAT check. So I was happy that I was able to push this with some resistance. There was the SAT competition 2013, the organization demanded that, required that insatisfiability tracks that solvers that participate in the insatisfiability tracks had some proof emitted. There were two kinds of options. Either TraceCheck which is the most widely used resolution format and this Delete Reverse Unit Propagation so we have these clauses with deletion information. The competition allowed for 5000 seconds for each of the benchmarks to solve and allowed the checker to have 20,000 seconds. There were kind of three categories but for the application category nine solvers, there were nine solvers emitting DRUP and two emitting RUP so ignoring the deletion information and there were nine solvers hard combinatorial and the nice thing was of the top-tier of solvers like Glucose, Lingeling they all submitted proof version to the competition. Some statistics. The top-tier, the highest ranked solver implemented the deletion efficiently. 98 percent of the DRUP proofs could be checked in the 20,000 second timeout, so practically also one or two instances of the solvers were not, could be not check because they had reached the timeout but practically all of the proofs had been checked, our or refuted by the checker within the time of and if you look at sort of, there were some solvers that only used RUP so they didn't use the deletion information and only 40 percent of the proofs could be checked. By having this deletion information and even with core first and everything else and backwards which was done here, don't having deletion information kind of brings down saying, checking 98 percent of the proofs to checking only 40 percent of the proofs in the time limit. >>: Did any of the solvers report any incorrect results for the proofs they checked? >> Marijn Heule: There are two issues here. There were a few proofs and even the winners I think for each solver there was at least one benchmark, maybe one mini SAT was the only one I think that had proofs that all went through the checker and then the other ones had maybe one or so the checker, they couldn't reconstruct it. But it doesn't mean that there might be some ordering difference especially sometimes when you delete a clause earlier than you are allowed to delete it and then the proof might fail, so there's an implementation issue. It doesn't mean that there's a bug in the solver if the proof fails, but for the winner and the number two in the competition they all had at least one benchmark where it said, the checker said I'm not able to check it. Of course, if could also be the checker. >>: Was there a penalty? >> Marijn Heule: No. There was no penalty. That's one of the rules was if the checker says I am not able to verify then the benchmark is just, it's the same as not solving it. And it's also the reason it was the first year to let people, because there will be in this case if there was only one solver that all the checks go through that would mean that the solver would automatically be the winner. I think it's a tough penalty because if you say there is a solution and you give the wrong solution you are disqualified. The competition was for in SAT it’s still too early to kind of have this kind of tough rules. The other thing is, what? >>: Want to give the [indiscernible] [laughter] >> Marijn Heule: Something like that. I also discussed with Leo before there were 30 instances in the competition for which there was also an old benchmark [indiscernible] solvers were learning all benchmarks and there was no certification for those benchmarks. There were 30 instances where the fastest solver in depth track was at least ten times faster than the fastest solver in the certified track which you think okay. What's going on here? And for 21 instances I was able to detect the winning solver had actually a bug on those instances. The easiest way to check it which frequently helps is you remove the top 10 percent of the lines, except for the first-line, you remove the top 10 percent of the lines and then you give it to the solvers again and then frequently the fastest one is still unsatisfiable and there's another solver to say it's satisfiable. >>: [indiscernible] example where the [indiscernible] 0 missing or there is a line with a 0. >> Marijn Heule: Yeah. So there was also one of the [indiscernible] was a solver there was an empty line in the proof format in the input file then it's considered an empty clause and is immediately unsatisfiable, but an empty line is to just be ignored. Therefore, by just removing some clauses and still the empty line is still in the file then you will, as long as the empty line is there it was a unsatisfiable but other solvers at some point well say we will find a solution because of the missing clauses. But there are also things -- okay. I don't want to probably go too much into detail there, but it kind of shows that if there are solvers [indiscernible] they say that sometimes they had to turn off some features because it was not easy to implement them using the DRUP information, but maybe it was because they are buggy and then it's hard to implement it because you can never, the checker will, would always say it's incorrect. There were so, yeah, I'm really in favor of checking things because you can really, you kind of see that there are bugs in solvers actually come up much higher in the rankings because of these bugs because these first few instances they were not buggy, they were not found buggy in this track, so they were just, everything was correct. I only after the competition I used this kind of this trick to remove these clauses and then you see that there are, they are buggy but if you look at the competition results, everything is fine. >>: [indiscernible] competition have [indiscernible] benchmarks or are they all [indiscernible] >> Marijn Heule: There are unknown benchmarks in all categories, and especially in the random category we generate everything at the phase transition for and so half of them are kind of expected to be solved and half of them are expected to be unsolved, but there's no way you can ever prove in SAT on these benchmarks so if you prove it's 50 percent it's something you might assume that the other, but you never know for sure. So in the random category there is lots of [indiscernible]. >>: If you train your solver on known benchmarks [indiscernible] most of the benchmarks are known and then there's some chance… >> Marijn Heule: What you mean known? I mean a few say known, for known solved or known unsolved. >>: [indiscernible] benchmarks >> Marijn Heule: No. No . No. >>: Suppose you train your solver on [indiscernible] benchmarks and it has this [indiscernible] for something, you are not going to detect it by rerunning [indiscernible] in the competition. >> Marijn Heule: Sorry. I had the wrong impression. I thought your question [indiscernible] unknown it means it's solver unsolved. >>: [indiscernible] mistakes >> Marijn Heule: For the competition we did it 50-50. Fifty percent of the clauses were somewhere published on the internet and 50 percent were contributions, new contributions to the competitions and that's kind of the best we can do because there's a certain amount of, there's a limited number of people submitting and so some people submit. I think we had one participant for over 1000 instances, but if you would select many of them then you would really favor this one because he also submitted the solver then you can kind of win by submitting your own benchmarks and your solver which is optimized for your own benchmarks and nobody else knows they were there, so you are really limited in how many new benchmarks you also can use. So it's kind of a 50-50, which I think is a good compromise. Yeah, a little gruff about the competition so I already talked about the two different tracks. There are solver, the same color means they are the same solver, but they are in different, either they are in category where there is no checking and login and the other category there is checking and login. And what you see for Glucose and for Riss their performance is pretty much the same in both tracks, so all features enabled and no login or no checks and, but if you look for Lingeling which did not have to do deletion information you really see that if you don't have deletion information hardly anything can be checked anymore and it all breaks down. >>: [indiscernible] >> Marijn Heule: Lingeling is much more than 40 lines because it uses so many techniques and so many data structures and he did not implement it yet. I tried to convince him to implement it for the next ones. But it's another animal. It's a huge solver with lots of things. But you can really see that if everything could have been implemented he would have definitely won also the other, the certified track. But you can see that now Glucose won that track, was the fastest one on the application, certified application. >>: [indiscernible] small addition to [indiscernible] >> Marijn Heule: [indiscernible] what do you want to say with that? >>: You get lots of bang for the buck. So the difference is that you can solve ten more benchmarks? >> Marijn Heule: No. That's not much when you consider how many lines of code have been added. >>: Yeah. That's one reason [indiscernible] how many lines of code do you [indiscernible] >> Marijn Heule: I think that probably Lingeling is about probably 15,000 lines of code and Glucose is maybe a thousand. The difference between, but yeah, for the SAT competition this is considered a big difference. >>: [indiscernible] much faster [indiscernible] instance [indiscernible] right? >> Marijn Heule: Yeah, but it's because lots of preprocessing is done and also what I talked about here before lots of benchmarks are submitted are really [indiscernible] encoding and Lingeling in the early stages kind of tries to fix that. And for easy benchmarks or it doesn't pay off but for the harder ones it pays off, so at some point it really Lingeling takes over because then all of the stuff is useful. But as you can see for our [indiscernible] the problems which are typically very small and there is not too much re-encoding optimizations possible. You kind of see that Glucose is, yeah it's faster. And Riss is also the same story, is also just a few more lines on [indiscernible] kind of similar in size, similar architecture. So it's ten to twelve I have also some slides about expressive proofs, but shall I continue with this and… >>: [indiscernible] to me, I think you should continue. >> Marijn Heule: Okay. I don't know if there is someone else coming into the room at 12:00. >>: [indiscernible] not until 1:30 but you might be done by then. >> Marijn Heule: Okay. We might. [laughter]. So far I've been talking about RUP and DRUP and so it covers resolution this most important learning paradigm with CDLC learning, Boolean constraint propagation, subsumption, everything is kind of covered in this DRUP format, but there are some techniques used in solvers, for instance in Lingeling, all these techniques over here are not covered. There's another polynomial time checking procedure which is called RAT which I will explain a little later which kind of covers this, so instead of doing this DRUP check, if you do the RAT check you kind of cover everything which is kind of used instead of the blocked clauses >>: [indiscernible] >> Marijn Heule: What do you mean, do you add -- in Lingeling you can add blocked clauses, yeah. By the way it is one of the things that was buggy and actually motivated the whole, the in processing rules and actually directs because I actually just thought [indiscernible] I could make up a small example where there was only a few million of instances where there was no bug in it and a small example of only seven clauses was buggy in Lingeling and it's really tricky to see how to implement this in such a way that it's sound. Here roughly you run Lingeling around blocked clause addition. >>: I just thought [indiscernible] >> Marijn Heule: Yeah. I think the nice thing you get for blocked clause addition is you, so blocked clause elimination can frequently shrink the instance and then you can empower there variable elimination because there are fewer clauses, but sometimes it's useful to add blocked clauses for the remaining variables. You add propagations to for the remaining variables. >>: Oh. Okay. I see what you're doing. >> Marijn Heule: So you first bring it down and then you bring back to life or new ones for the things that are still remaining in the formula to have new propagation steps. >>: [indiscernible] >> Marijn Heule: Yeah. I well talk about RAT later how to [indiscernible]. >>: Oh RAT is the [indiscernible]? >> Marijn Heule: Yeah. RAT is the [indiscernible]. One of the problems that is, for which typical SAT solvers are, like Glucose are extremely slow on are the pigeonhole problems and it's very easy problem from a high level is that given n-1 holes and n pigeons can we put the pigeons in the holes such that there are now two pigeons in one hole. Even a little kid can say okay. That's not possible, but for SAT solvers if you encoded it can be, proofs can be exponential, or will be exponential if you don't use special techniques like the one in the other, saw in the bigger circle and one way to do it is you can do it with extended resolution which kind of has a cubic size in number of pigeons. It's kind of, you translate the problem from n-1, n pigeon, n-1 holes to n-1 pigeon in n-2 and you do n-1 times and then the problem becomes trivial. But this is, you cannot, the proofs that you can do over here, if you add, express those proofs then DRUP says I cannot explain this. Another technique which is actually pretty useful for some benchmarks is called bounded variable addition. Given a CNF we try to find a set of clauses which we can replace by a smaller set of clauses by introducing a new Boolean variable x, for instance and x does not occur in a formula. The smallest example for which this works is we have these clauses and we want to replace them by these lemmas. If we add the single one of them to the formula, if you assign this to false and x does not occur elsewhere, then you can see there is no conflict. How do you deal with such a technique? Even if you have extended resolution which you might think okay, let's because of the pigeonholes you have small xn resolution proves but even for this kind of techniques it's not easy to make an extended resolution proofs which can express this. >>: [indiscernible] >> Marijn Heule: What? >>: [indiscernible] this is the case where [indiscernible] equivalent but [indiscernible] >> Marijn Heule: Yeah. Although, it's now obvious, so in this case you can, it's somewhere in between because with block loss addition you really can throw out solutions, so if you do this trick, the number of solutions stays the same. The [indiscernible] the same set of solutions over the common variables. It's a kind of in between satisfiability equivalent and logical equivalent. >>: [indiscernible] >>: So I was under the impression that you would always preserve [indiscernible] under [indiscernible], but what you are saying now is [indiscernible] clause addition [indiscernible] >> Marijn Heule: Can I, if you just use this example, so we have A or B and we have not A or not B so if say this is the formula we can add this clause which is blocked with respect to this formula, so if we add it then, so block loss addition allows you to add this clause which clearly removes one of the solutions. There are different techniques but and potentially these kind of, there are techniques that, and so extended for in [indiscernible] resolution in general has this property of this kind of in between, so it just same set of solution over common variables. >>: [indiscernible] >> Marijn Heule: But to express this is [indiscernible] you cannot really solve this. You can do it but it's tricky. And you actually want to have a technique so the RAT I will explain in the next slide is something that you can, all these clauses have RAT and that is kind of, you don't have to make lots of [indiscernible] resolution steps to explain why you can add it. You can just do exactly what the SAT solver is doing now. If you add the clause to the database just say you added and if you delete it you say you delete it without having a whole set of additions and a whole set of deletions in order to convince the checker that it's actually sound. In first this slide, yeah so, this is kind of motivation why you want to use this bounded variable addition, so these are free pigeonhole problems. These are free bio informatics problems. If you run Glucose without BVA preprocessing, these are the running times. If you do first BVA preprocessing these are the running times and you see that there is a large speed up. For certain problems you really like to have this kind of preprocessing step because it's, it can give you a lot of runtime. You want to ask something? This we've seen. This is the RAT procedure and now the RAT procedure, and so a large example on the next slide, but the, so, a lemma has, RATs, or resolution asymmetric tautology if it has RUP or, and that's the, if there is a literal in the lemma such that all resolvents on L, all the literals and namely with all clauses contained a compliment of the literal are either tautologies or they have RUP. You have a clause and a certain literal, it must be that if you take all possible resolvents on that literal, so if there are say ten clauses containing not L then you have 10 resolvents and all these resolvents must either be tautologies or they must have RUP. That's the property. Should I go through the example? We have a formula, so this is sort of the smallest formula for which a RAT proof is smaller than a RUP proof, although it can be exponentially smaller, but so it's, you need some clauses in there kind of to show off. So this is eight clauses unsatisfiable and this is a RAT proof and the triangle means that the clause has RAT and no RUP, so and it's easy to see that this clause has no RUP because all clauses have length free every assign, not A to false which means we should assign so we should assign A to false, not A to false. It means we should assign A to true, then some literals gets falsified, but since every clause is length free there will be no unit propagation. So it's easy to see that there is so that RUP is not going to work? And it's not and so no. So we now going to check it, so the forward checking. And now we detect, okay. There is no RUP, so now we going to check if it has RAT. What are we going to do? We going to do all resolutions with clauses containing the complement of this is A, so we take all clauses, so this clause contains A. We do resolution and this brings clause B or C. We do the same here. We have the clause C or D. We have the resolution and we have the clause not B or not E. Because resolution gives us these three clauses and it has some RATs if and only if all of these are RUP. So that's the definition. >>: [indiscernible] >> Marijn Heule: Because none of them are tautologies, et cetera. So if there's a tautology we can get rid of it and then we have to check for all of them whether it has RUP, yeah. So first we create those three using the resolution so we take this clause, we compute all of the resolvents. That's the first step. And now we going to check and we see okay. We can reconstruct it using RUP. Like we did before, we can do the same thing with CD and we can do the same thing with not B or not capital D. And now we checked it all and now we know it is RAT and we can add it to the formula and remove all of the resolvents. And now we can use this clause too as a antecedent for B and then we check the empty clause and we are done. So what you can see here is we have only three clauses for the proof. >>: So the [indiscernible] so you have an example with the, where x goes into [indiscernible] and the intuition then is that you select x when you solve and then, when you solve [indiscernible] >> Marijn Heule: Yeah. So what you have is that, so for instance, we first add all these three with x and then we check for all resolvents on x whether they are tautologies or have RUP, but since not x is not there yet, everything has, all resolvents have RAT or RUP because they are 0 resolvents. So first we can add these for free because these are not in yet because x is new and then the, for instance, we add this clause. Then we do all resolutions and then we end up with these three clauses which have RUP because then they are exactly, and we do the same thing there. We add all resolvents where it brings us these clauses in there. So it's clear that you see why it works with this technique, but it the nice thing is it works for all non-[indiscernible] techniques. It is very, I think a very elegant way of checking things. And the same thing for extended resolution proofs. For extended resolution everything is tautologies so. For instance, we have the variable x or, so which means that x is equivalent to, so we add the new variable x which is the ent of A and B, so we can do the first trick here. We add the x because it's, there's nothing there and now if we do resolution on these two we get a tautology and now we do a resolution on these two and we get a tautology. So for extended resolution RAT works because of tautologies. For this technique it works because of RUP and there are techniques where you the mixture. Yeah? >>: But your definition is [indiscernible] is in the previous slides… >> Marijn Heule: Yeah. The definite, the definition is kind of, I want to keep it short and I have to [indiscernible] so… >>: Yeah. But, but [indiscernible] is the lemma [indiscernible] because there is no recursion to that. >> Marijn Heule: No. There is no recursion. We have to be thinking about the [indiscernible] technique which requires recursion, because everything becomes [indiscernible]. And if [indiscernible] fails then you've done the clause, there must be a literal and also in the proof format we explicitly demand that the first literal has to have been read on the first literal, so if you add the clause in the proof then it has to have RAT to really reduce the checking costs. Because the solver, the solver [indiscernible] must know what literal it is which is of course for this kind of example is easy. I want [indiscernible] the definition in the paper there is much more formal especially [indiscernible] have much more formal [indiscernible], but this is kind of my fault here. >>: Is it sound to [indiscernible] RAT or is it [indiscernible] >> Marijn Heule: I think it's sound, but I have to, so it's not obvious I think. And so but if you for instance have a formula so with n variables and all clauses have length n then you have all to 2 to the power n clauses then any resolution proof will be exponential. But the RAT proof you just have, it has all the unit clauses, so it's either A or let's say you have x1, x2, 2xn and that's the proof and then you enter it to the empty clause. So for those kinds of formulas, for [indiscernible] just to give you an example where you have just only the units and you are done. >>: That was the example with A, B, C. >> Marijn Heule: No. You have, this is four literals. This is not for three literals. But it is kind of similar, so you can generalize it that if you have, if all clauses have the leg of the number of variables, then you can just add to all the clauses have RAT. So, but the solver all kinds of nice things that you have. You have small proofs for RAT, or small RAT proofs. I think it's really nice that you compute first all of these resolvents and then you compute all these edges, but you can discard everything and you have this compact information. Yeah. So this was kind of the tool chain that we had, so we have this [indiscernible] and deleted [indiscernible] clauses and then we put it through the checker, but now the question is do we trust the checker because things have become much more complicated especially if you want to have all of the optimizations in the checker, so now it's about 400 lines of code, the checker, but yeah. Can we, so are we done? So no. I think it is really cool to have a verified checker and so one of the focus on, of future work will be to kind of use this tool that we have now which also for RAT it gives us this reduced scores, reduced proofs, optimal clause deletion information, so we come here. We can get this big proof if n is the length and clause is the RAT lemmas we give it to the kind of the proof trimmer which is actually the same [indiscernible] checker before so it's the same tool. This is DRAP chain tool we have now which significantly shrinks it, optimizes clause deletion information and then the proof is small enough that we can do it with a verified check. So then lots of optimizations can go and we can then check it with something which we trust even more. And yeah so this can, for instance, if you see this then you can implement this in say 100 lines of code. This is one of the things I think is very nice to get it done. And Nate, which I showed you is a call for this work is doing his PhD on this. >>: [indiscernible] seems like you had [indiscernible] >> Marijn Heule: Yeah. [indiscernible] is proposal. >>: Oh, proposal, okay. [indiscernible] >> Marijn Heule: Yeah. So [indiscernible] >>: It said something about verified [indiscernible] >> Marijn Heule: Yeah. So he's working on the verified, mechanically verified checker that can have add and delete the [indiscernible] clauses. And he makes it an ACL tool and yeah. And the nice thing about ACL tool is you can get through say 60 or 70 percent the speed of C if you… >>: [indiscernible] >> Marijn Heule: There is lots [indiscernible] subject. We have this [indiscernible] I don't know if you are familiar with but there are all kind of low-level techniques which are now supported, but it of course makes a proof much harder. So he has, on the high level he has a proof and now we have a low-level implementation and it’s bringing them together too. That's probably a year work of having it all together because these proofs can take a long time. Just to give you a little bit of impression about this proof reduction which we have now although I think the more important thing is almost this, that you can optimize the deletion information which can really help to guide the unit propagation because when you first get it everything is not so clear. This is so the size of the input proof of Glucose and this is the size of what the proof gives. So you can see it's slightly more than the factor two that you can reduce it, the proof, but by optimizing the deletion information it's, yeah, and also the clause is, the original clause is kind of, is even, yeah, as the plot I showed earlier, you can have easily, a 90 percent reduction in the original clauses. So if n is much smaller number original clauses and you have a smaller proof and optimal deletion information, which makes it possible that even a mechanically verified checker should be able to deal with these in a reasonable amount of time. So yeah. So kind of the contribution slide again and what is actually [indiscernible]. >>: So the RAT proof is for [indiscernible] it's not for [indiscernible] >> Marijn Heule: You can do it both, but so the checker, so the verified, so wait. So the checker is as things would look here will go forward because if you have an optimal clause deletion information there's really no use of going backwards. It doesn't make sense because everything will be in the core, so you don't gain as much by going backwards and the costs of going backwards is that you have conflict analysis all these things you really don't want to verify, so you really want to have things as clean as possible. So the verifier will go forward with all the optimal clause deletion and a smaller proof and a smaller clauses, but you can do the, so the C version, the trimmer, the proof trimmer will go backwards also for you for the RAT. You can also do the same, the same trick here if you do backwards. There's no, you see there's not much difference if you go forwards or backwards requiring the RAT. You just, when you check it you take all the resolvents with everything above in the proof. So you just don't do resolution with anything that is over here. You just do resolution of everything that is over there. >>: So one line RAT whereas both forward and backward? But the percent here was a forward version? >> Marijn Heule: Yeah. To keep it more clear, yeah. >>: And your trimmer goes backwards? >> Marijn Heule: Yeah. The trimmer goes backwards. Very much the one that I have in mind which is also kind of, let me, so the one I have in mind which I hope to finish maybe in a month or so also has this, this bar op, so it combines this and this, which requires, which requires going backwards and using deletion information. >>: I think I was asked to shut up so you can finish. >> Marijn Heule: Yeah. So was I [laughter] so I am, as you see I am perfectly finished. So this, the work over here does do forward RAT and also, but there's no, so by combining this deletion information and the procedures I have here I think you can have everything. And then combining that with checking the reduced output to a verified checker, I think when you have all possible things you can have want for checking. Yeah. So, this kind of sums it up. So here, I think, this is maybe the more important point for discussion later. Shoot this kind of proof logging being mandatory for competitions and maybe also DSNT competition. I noticed that at least I use this line. It's FMCADs and the only people that discuss these things with me during the coffee break were saying please don't, please don't do it. [laughter] this is too hard and really kind of you should back off, so in the SAT community there, people are more positive about this, but in the [indiscernible] competition there they really fear that it might be extremely hard combining [indiscernible] and all kinds of stuff into this. But I think it's, it will be good to have more faith in it because the tools definitely here have lots of works as well. That's it. So there are kind of four publications this year on this proof checking and so I have some more plans, so we will follow-up with more. Yeah. Thanks for the attention. [applause]