>> Francesco Logozzo: So the last one. Good... all of the days to introduce the authors of the...

advertisement
>> Francesco Logozzo: So the last one. Good afternoon everyone. It does not happen
all of the days to introduce the authors of the papers that change your life, but that is what
is happening to me today. I am glad to introduce Patrick Cousot the author of the paper
on abstract interpretation in Poppel in 1977 that changed my life, really changed my life,
because I was undergraduate at Pisa and I read his paper and said really cool. That's what
I want to do in my life. Then I moved to Paris, did my PhD, found my wife there,
[laughter] then moved to Redmond doing the same thing. My kids were born there. I
work for Microsoft so I can afford a lot of bikes [laughter] so you know--so yep, it is a
really big honor and--yeah, one is broken; you know the sad story, let's forget about it
and let's talk about abstract interpretation and SMT solver today thank you.
>> Patrick Cousot: Hello. Thank you. So this paper has three authors and the other one
is Laurent Mauborngne [inaudible] in Madrid. And none of us know about SMT servers
so we try. So we have two references that we re-factored into… I promised to say refactoring, so I have to find a way of saying re-factoring so we re-factored these two
papers in a journal paper which is submitted. And a small part of it that I present.
So the lack of [inaudible] school was changed by algebraic abstractions which are used in
abstract interpretation for analysis or verification of systems. And the idea is that when
you have properties and specifications, you abstract them into an algebraic lattice, and in
practice you encode these lattices into and if you [inaudible] on which you have a
[inaudible] algorithm. So the analysis is fully automatic such that the system properties
are computed by approximating fixed point and they are made out of algebraic
transformers which are built upon primitive that are operation on these lattices, and in
general you use several abstraction that you combine into a with a reduced product. We
will see what it is.
Then you have another bunch of [inaudible] which are used in deductive methods and I
call it proof theoretical or logical extraction. So then their system properties as
specification I express in first order formula using theories and so you have a universal
encoding of properties and you have one algorithm and some more algorithms and you
manipulate this universal representation so most of the work is done once and for all.
That is a great advantage. The problem is that it is only partly automatic, because most
of the system properties are either divided manually by the end-user and are
automatically checked by some automatic system, or you cannot prove. Very complex
problems require difficult and variant to find. So, for example, when we analyze with a
[inaudible] which are used [inaudible] in a single block, the invariant is 1 GB, so no one
can write, can even read it, so. And what is interesting is that you can have several
theories and you can combine them with Nelson-Oppen which was born about the same
time as abstract interpretation, so one object is because we only know the abstract
interpretation part; we can only explain everything in terms of abstract interpretation, not
the contrary. So that is why we will show that proof theoretic and logical abstraction are
just the particular case of algebraic abstraction, and in fact, this is done by the
mathematician. They do that.
Then we will show that the Nelson-Oppen procedure is a particular case of reduced
product, and when you understand that, it becomes very easy to bridge the two worlds.
You can have them that works simultaneously on one side with algebraic abstraction. On
the other side you have logical abstraction and you use the Nelson-Oppen and reduced
product to tie between the two. So that is one form of convergence between the two
approaches. So we now go to the technical details. When you have a language, you have
to have the semantics, and the syntax. So my syntax is very simple. You have variables.
You have constants. You have functioned simple of some arity. You have terms and you
have predicates and I distinguish atomic formulae which have no quantifiers. Then you
have program expressions that either return or predicate and you have also clauses in the
simple conjunctive normal form that use a conjunction of the terms that they are used
sometimes in the, and some twos. And program I don't care really; I just assume there's
some assignment and there are some forms of test or guards or something like that.
Then you go to the classical notion of interpretation. So it is classical. Just I use
different notation. That is when I have an interpretation; it is a pair for a set of values to
assign values to variables and something here which provides semantics to the function
symbols. So each function symbol becomes a function that I use and predicates, I
interpret them as maps from values of variables to Booleans. Then you introduce a
notion of environment which maps variables to their values and I assume that you have a
mechanism that can evaluate the value of an atomic formula when you give it an
interpretation and an environment. And also you can evaluate the term in some
interpretation and some value, so the classical definition that you find in any Boolean
logic.
>>: [inaudible] fixed point, right? [inaudible] this big union?
>> Patrick Cousot: This is an interesting question so we will come back to it [laughter]
later. When you consider the semantic parameters you have the same notion as in logic.
You have a standard interpretation that is described previously, and I will give the answer
to your question. I will give the semantics in form of post-fixpoint because it may
happen that when you go to logic there is maybe no infinite union or infinite intersection,
and so you may have a fixpoint that doesn't exist. In that case you cannot define the
semantic as a logical formula. But you may find infinitely many formulas that are postfixpoint and you can use it. So I think the example here of all logic. If your logic is not
expressive enough, you may not be able to express the strongest invariant of a loop. But
you may be able to make a proof of correctness by using an invariant that is not the
strongest. So if your semantics give you all of the possible invariants that you can use for
that loop, and the strongest is not there, but by knowing all of the invariants you know
how you can do the proof, or you cannot. So that is why I do that. So I assume that I
have observable that depend on the interpretation and observable can be a set of traces,
execution trees, states, sets of states and things like that. Then I define properties to be
sets of observables, so the properties are set of objects of value property so the property
even is zero, two, four, five set.
And I have a, I associate to the parameter transformer so this is a syntactic process I
explained by induction of the [inaudible] parameter how I built the transformer so all
logic would be the verification condition. Then I define the semantics as a set of postfixpoint that is a set of invariant in the case of logic that satisfy the verification condition.
And in case you have a least fixed point it will be one of them. So here I have a little
example. So here I have this parameter which starts on one and end increment. I inject
predicate on integers and the least fixed point will be access strictly positive. But if I
don't have the symbol greater than in my logic, it will be X is one or X is two, X is three
or X is four and the infinite formula is not part of my logic, so without the symbol I
cannot express the least fixed point.
No problem. I can use post-fixpoint or people do add a new symbol. But by adding new
symbol, you make your logic more complicated and in the end you must reduce to least
fixed point and there is no [inaudible] and then the thing is decidable, so all the benefit of
having servers is lost. At least complete servers. So that's it. So when you have done
that, you see that properties, they have a complete lattice structure for inclusion which is
logical implication. This one is false too. That is [inaudible] in this junction and so at the
level of interpretation I am in [inaudible] theory.
Wow. Someone is willing to do something on this computer, or what. So I will give you
my password. Now you know some information. You know the number of characters.
[laughter]. That is a leak of information, which is very important, in fact. So they should
feel up to the end with a [laughter] I don't understand it, don't do it.
So I think there is repetition. So you have the structure, post-fixpoint exists because I
assume this is increasing that is [inaudible]. More importantly the transformer is built out
of primitives, so here I ask for the assignment area. It says after the assignment the
values that are possible are the same as you had before the assignment, except that you
assign a new value to X which is obtained by evaluating the expression and the
interpretation, and for test is the same. You check with at least two or fours and you keep
the environment if it is true.
And what is satisfying the property? It is among the invariant that I have in my semantic.
There is one that implies the properties that I want to prove. So by going to postfixpoints you have a semantic which looks a bit more complicated which is more generic.
So that is what I will explain. So now it is not the end of the story. And the problem is
that problem does not have one semantic but they have many, many semantics. And
usually when people do verification, they consider only the mathematical semantics, and
they don't consider the implementation semantic so it is an interesting problem to have
these comparisons. For example, you might say I will do an analysis which will compare
the execution to the mathematical definition and say when they differ. Or you have an
interesting analysis saying if I interpret my problem in reels and then you float but there
is a difference of between the value that I can obtain. The [inaudible] so there are two
[inaudible] in that [inaudible] variation.
And so we have showed them this parameter which is very interesting because if you run
it you will get 100. But if you compute with reals, you will get six. And if you use any
approximation to the real, you will get 100. So because six is a fixed point which is
[inaudible] and 100 is a fixed point which is attractive and so anything goes to 100 except
one thing which is real stuff. So if you have a program with a [inaudible] on we say that
is a 100 as a prover, please proof that it is 100 and it is not, because it is six in the reals.
So all I will say to the prover is prove that it is six and the prover will make it then the
run and then he says that is 100, but six and 100 is almost the same depending on the
scale you are considering [laughter]. So another example if you take floats you have four
possible interpretations depending on the rounding mode. And these give completely
different results. So if you have a semantic you must give the four semantics and then
tell what you do, so in some cases you know in which case you are because the machine
can be set to have one of these semantics. But in some other case you don't know, so for
example, you may do an analysis that is value for all these four semantic; that is all the
conclusions of whichever of the semantics you have chosen.
And this is a, the fact that you have many semantics and that you ignore it, is an abstract
interpretation. So in fact when you say I do mathematical proof of my problem, you do
an abstract interpretation which consists ignoring some semantics, and for these
semantics you know nothing. So it is [inaudible] that you can formalize it by [inaudible]
connection but it does not solve any problem. So we go to this machine interpretations,
and we will have that everywhere next. So no we go to the details of abstract
interpretation. So everything is built on that abstract domain, which is a set of properties
you are interested in. You have an order which represents the implication, the logical
implication. You have false true [inaudible]. You have widening, narrowing and you
have these transformers that abstract the real transformers, the complete transformers for
assignment, for double backward for word and test.
Then to make an analysis you reuse this abstract domain, you define your transformer by
induction on the syntax of the [inaudible] and this transformer use the primitives that you
have included in your abstract domain, and so the abstract semantic will be either the
least fixed point or the set of post-fixpoint in case the least fixed point does not exist. So
you see in the abstract, the structure things act the same as in the concrete. Then you
need to have a correspondence between the abstract and the complete, and can we do that
with concretization function. So this explains that this is the abstract version of
implication. This is the abstract version of union, and essentially when you apply
concretization to all abstract values, you get a subset of the properties. And so
abstraction consists just in considering the concrete properties, selecting your subset and
saying I will stay in all of my computation within this subset. So when I have an
operation that goes from gamma ray to gamma ray I lose no information. But if I have an
operation that goes from gamma ray to p outside [inaudible], I will re-approximate by an
element of gamma ray. That is where I will lose information.
And then we have to change the notion of the [inaudible] because we have to postfixpoint but it is a bit trivial that it is sound that says if I have done a proof in the abstract,
that if you have some abstract semantic that imply what I want to prove, then I will be
able to redo it in the concrete. So I will be able to [inaudible] in the [inaudible]
concretization of the abstract value. And it is complete in the other direction. If I have a
proof in the concrete, I can find some way to do it in the abstract. So this is generalized
as classical notion with least fixed point.
And then you have sufficient conditions for completeness, for soundness, excuse me, and
completeness also. And here is one. It says essentially when I make a transformation in
the abstract and go to the concrete, it is another approximation of the concrete
transformation that I would've done. So this condition that works for fixed points will
also work for post-fixpoint semantic. So the generalization changes nothing. And to
prove that you just prove it for the primitives that you have in your abstract domain and
then you prove it by induction on the structure of [inaudible] because this f is defended
activity and so the proof is really trivial to do.
And also I mentioned that in case you have a best abstraction, you have Galois
connections but many implementation you don't have, and when you go to logic you don't
have. So another point in abstract interpretation is widening, so I have to explain that.
So we have widening that the operation that takes two elements and give something that
is another approximation of both elements. So you see that if you take two elements in
the lattice and there is a best approximation which is joined, but sometimes we have no
joint or no efficient way to compute the joint exactly, so we use a widening just to
approximate the joint. So we call that over approximating at widening, so that is one
problem you can use while learning to replace joint when you cannot compute them.
And the other use of widening is to enforce convergence when you have fixed point
computations. So when you have fixed point computation you make iteration, and the
suggestion here is that each time you do a widening, which is a mysterious operation
which will go faster in the iterates. It will over approximate the iterates with the central
property that it will terminate ultimately. So in fact the definition is just what you want.
You want to have an operation which enforces approximation and here it says because
you have this zero solution, will be another approximation, so we practice most widening
are used for the approximating and termination but sometimes this is not the case. So
here the iterates, you see I start with the bottom. I just stop when I have a post-fixpoint
here, and otherwise they take the previous iterate, the next iterate and I do a widening.
You can do so widening with respect to all previous iterates. There are many variants.
The theorem says when you do that you will reach a post-fixpoint infinitely. We need
iteration. Either widening is both over approximating and terminating. So now when
you implement, you essentially implement this until you find your representation that is
structure for representing properties. Then you implement programs to compute the
implication join [inaudible].
So if you have different abstract domain they have different representations, and so you
need to have some form of communication between the two. So it is the advantage of
this approach that is very efficient. It proves the algorithm that you use that are specific
to the kind of properties that you are manipulating. If you have an integer, it is a pair of
integers which is a, there is nothing more [inaudible] machine. But you require
experiment. I've seen many not so many people able to do it very well. So the is no
problem. Why, yes?
>>: What is the reduced product?
>> Patrick Cousot: I will answer your question. [laughter]. I have a bunch of slides to
explain that, yes. It is trivial. It is a conjunction. A conjunction of properties, but you
have to do it in the abstract and efficiently and we will see.
So now I come back to logic and I introduce quantified formula, classical. I just have a
quality and then I just write the free variable of psi. I write in this way which is not
conventional. Then you have a satisfaction [inaudible] which says psi is true for the
assignment of variable given by this environment in this interpretation, and the
interpretation of equality is equality in the end. That's it. And because previously we
have seen that a Boolean may have multiple interpretations, I will work with respect to
many interpretations. So I will take the set of interpretation and so I have to extend the
meaning of formula for a set of interpretations, so it is very, very simple. I say you give
me an interpretation. I give you the environment that makes the formula true for this
interpretation. So it is a map form interpretation to the environment that makes the
formula satisfiable. So now we have a problem. How are we going to represent a set of
interpretation? And in fact, you can represent by a theory, because a theory is a set of
theorems that is closed, sometimes no free variables. And the models of the theory they
are interpretations, and so they are a set of interpretations. In general a theory has many
interpretations. And because of the sentences of the theory have no free variable, I can
say that the I exist in the in the environment for all of the environment because I don't use
it. So these two formulas are exactly equal. So when you refer to the theory, you have a
semantic which has many interpretations, and the these interpretations are the models of
the theory.
So this is for me because I always forget, so that is my slide to remember. So decidable
theory, you can decide whether or not the formula belongs to the theory by [inaudible]
them which is efficient and dominate them. So deductive theory they are such that if you
have a formula in the theory and it implies another one, this other one is also in the
theory, then the theory is satisfiable on at least one model. And it is complete if when
you take any sentence either it is in the theory or is not. I hope that this is correct, but I
never remember this modulo theory so I revise before going on [laughter]. I am new to-so validity for all models the formula is true. Satisfiability modulo theory SMT, that is
for all models, all the--no, no. There exists a model in the theory such that the formula
can be shown to be true for this particular interpretation. And so now you can, that your
secret of this SMT solver, you can when the theories decide [inaudible] by the decisional
theorem, you can check the satisfiability and if the theory is decidable and complete, then
you can also use this algorithm to check satisfiabilities.
So the only problem I see is that many solvers SMT solver, they put some restriction on
the quantifier, the formula that they cannot [inaudible]. So if I understand well the
quantifier is outside it is okay. If you have levered and levered and levered the quantifier
it is not so okay. We have to leave with this. So I think that when you use theories and
so on, you do abstraction, so let's have to show why, because it fit in the modulo abstract
interpretation. That is it. So you take a set of formula first or the logic as your outside
domain, and you take a theory and then you define an abstract domain with this formula.
This will be the implication which is given by quantifying all of the [inaudible] variable
in showing that the essentially logical implication, except that you quantify here. Then
you have formula two [inaudible] disjunction and conjunction is part of this definition
here. You don't have widening and narrowing, but you may be able to define some.
And then the transformers you will see that they are also trivial so you see that it
completely fits in the previous framework, except that the lattice that you get is not
complete because you are missing limits of iterates or limits of change or you cannot
make infinite unions and things like that, and so you have no Galois connection, and so
this means that there are iterate choices that you have to do when you do and that is easy
so and there is in [inaudible] no best choices. By changing your way you make your
abstraction of the function here, for the example, you will chain the result and you have
no way to have the best possible result, because you cannot express it in your formula.
So computation is just that you take a formula, you check for all of the model of the
theory whether the formula is satisfiable or not. And so you understand perfectly which
is the semantic that you give to your formula with respect to the theory.
So the abstract semantic is very simple. You define it in a term of post-fixpoint with a
transformer which is based on the transformer. I hope that I have given them under,
yeah, I have given them here. So you [inaudible] when we add by your formula first
order logic. So for example, for the assignment you say there exists a known value so
you substitute, you introduce this quantifier. You substitute with the new variable in the
predicate, so it is true before and you assign the value before two [inaudible] and you
evaluate the term and you give this result into the variable to which you made the
assignment.
And backwards it is interesting to see that you don't have the introduction of an existing
check quantifier. So that is why many people go backward when they do the [inaudible]
method because there is no quantifier to eliminate, where out here you get one. So I
thought I found the solution but not, because as soon as you will compute fixed point you
will get back the quantifier which says either I am at first iteration or at second or at third
so you get the existing iteration where I am, and so the quantifier that you had here will
reappear here and so you have gained nothing, except either the user gives you the
variable in which case by one iteration you have no quantifier. So in the conjunction is
just you see in some formula that you can add in, if it does not work, you can over
approximate for example [inaudible].
So the implementation is very nice. There is a universal representation of abstract
properties so you have no algorithm to design. You have all of the operation of the lattice
that already exists in the formula so you have nothing to do. For the implication you use
SMT solver so you have nothing to do, and concrete transformers are purely syntactic, so
you have nothing to do, and so implementation is very trivial. So that is really nice. And
moreover you can prove once for all that the syntactic transformer are correct, so you do
it once for all. And the only problem is that when you are given the concrete property,
how are you going to abstract it. For example, I am giving the property is a prime
number, and you have to write a formula that say it is a prime number. It is not just a
trivial task. You can say it is one or two or three or five or seven and then far more
[laughter], so the facility that you had before you pay it when you would have to do these
translation. And the other thing is that there is no widening so you have to define one.
So I thought I should, I thought to do this one for Ken, but he is not there. It is a pity
because I wanted to provoke him on the [laughter]…
>>: It is recorded.
>> Patrick Cousot: What?
>>: Is recorded so if you want to… [inaudible] and we will remember [laughter]
>> Patrick Cousot: And so here I have the slide for Ken.
So what you can do this choose a subset of your, or you can choose a finite sub lattice,
for example. If you choose a finite sub lattice and when you have theories they go, they
do not go fast enough, you go jump in a finite sub lattice somewhere. And so you cannot
do that forever because the lattice is finite or it satisfies the ascending chain condition and
that is a very simple way of defining widening. Using [inaudible] we have [inaudible] to
the power of 2 for example and the power of 10 when you have a [inaudible] because
engineer when they use the powers of 2 minus or plus 1 so if we are lucky those, by
jumping to that there is some chance that it will become stabilized. So we can use. Any
other I think Craig interpolation is some kind of widening that is one we called bounded
widening, where you know a bound beyond which you are sure that the things are wrong
[inaudible] specialize algorithm. So here if you know the specification you have a bound,
and so you can find something in between that you get syntactically by Craig
interpolation. The problem is that it does not enforce convergence, but because when you
go over it can go farther, but you can go back to this solution to have a finite dominating
widening.
So the first day I saw interpolation by Craig I said but isn't that widening, and he said, he
had the transparencies and he say had an additional transparency saying it is not. So
[laughter] that is why I wanted to show that.
So reduced product. At least one is interested. It is not time to sleep. It is [laughter] now
that you know is so easy you can sleep on this one, because it is not yet the reduced
product. So you see that [laughter] we have a version that does not work. So you take
many outside domains, but the finitely only one, because if you have infinitely many you
have [inaudible] difficult to implement. [inaudible] I cannot prove. For each of them
you take a computation, and then you take just a product that gives you, you take the
Cartesian product in the implication and you do it [inaudible] twice. And so the new
meaning, it joins the conjunction of the information given by the component. So for
example, you can have one thing that uses psi and another that uses [inaudible] and you
make two analyzers of psi and [inaudible] they do not interact.
So now the reduced product is almost the same. You have a finite number of outside
domain. You have a computation for each of them. The thing you consider is the
Cartesian product, but you make your reduction. You say two predicates are equivalent
when they have the same meaning and then you cushion your [inaudible] mathematical
[inaudible] the properties by this equivalence and so you have a, you chain the abstract
domain putting together all properties that are equivalent. So mathematically it is a
triviality, so problem is that algebraically is actually really difficult to compute this thing
because the stuff here they are infinite, and there is no way of having an algorithm that
will give you this definition. But it is a good definition. So what we do in practice is we
approximate by doing a pairwise reduction that will iterate it. So we will go to this.
So the Cartesian product is useful at the basic implementation, and the reduction is useful
here because it is the best that we can do mathematically, and this is a compromise where
we will make a reduction but not the most precise one. So let's find an example of
reduction. You see I have one analysis which is [inaudible]. Another one which is
simple [inaudible] answers and so because it is 2 Modulo 4 between these two bounds,
can be 2, then the, cannot be 6 and cannot be less than 1 so I know now that it is 2 and
because it is 2, it is 2 Modulo 0.
So you see that the two outside domain have been reduced by the information that I have
of the other one, because I have the interaction, between the 2 and the [inaudible]. And
this must, so reduction is something that goes to something smaller, and we could go to
false that would, there is nothing smaller than false. So we don't want that. We want
something that will preserve the meaning. It so if I have a [inaudible] with some function
that defines the meaning of limit or the abstract into the concrete, and I have a reduction,
I say that it is meaning preserving when I apply the reduction to an abstract property it
remains the same in the concrete. That is if I compute that and that or I compute that I
get exactly the same. So meaning, permutation just mean they have the same meaning in
the complete although in the abstract they have different representations.
So it is a reduction if it is smaller, that is I improve for the order on the [inaudible] level
and I improve on the order on the congruence here. And so why should I do that and I
have a little example. You say here I say one say X is positive and the other say X is
odd. So if I say X is positive and X is negative or zero then I get here X is zero and odd.
In fact to know that X is negative does not prevent the number to be odd. So I get
nothing here now if I do the reduction, you see I say because it is odd it cannot be zero so
it is greater than zero. Then after this reduction, I analyze this, and I get 4 because you
cannot be less than zero; it is strictly positive. And here I get nothing. But I make the
reduction and now I get 4s on both sides. I should have written 4 here. And so I have
proved that this code is unreachable, whereas here I could not prove it except the height
analyzers. So, although the meaning is the same, that these two expressions in the
concrete they mean the same thing exactly. Because of the way that I transfer
information to transformers, the result not be the same with this one and this one in the
abstract. So I have an interest to be always as precise as possible in the abstract.
So when I have a reduction, so I assume I have a pair set, a computation, a reduction in
the abstract, I can iterate it. So I start from [inaudible] I use nothing to do one more
iteration, I just apply the reduction to the previous iterates, and I may have to do that
forever, so when I pass through the limit, I take the iterates intersection of all the
reductions. Because you may, for example, we have an example where you say i is
greater than zero, then greater than one, then greater than two, then greater than three and
you go on forever and the limit is no, is false. So you have to take the infinite
intersections in some time. And the iterates can be unbounded.
So it may happen that this is not well defined because it is just infinite abstraction on it
well defined, that but then a if you don't stop at any point in the iteration, you get
something better than previously. Each step improves. So if you go there and there is no
intersection you stop before. And another problem is that if you do finitely [inaudible] of
[inaudible] reduction the [inaudible] and more precise, but if you do infinite iteration of
limit preserving reductions it may not preserve the meaning, which is a bit strange, but
here is a stupid example. My concrete is 2 elements and my abstract is completely stupid
because I have infinitely many elements in the abstract to [inaudible] this concrete. That
is not [inaudible] in my reduction of this one, you see I improve in the abstract. I am
smaller and smaller, but if I take the intersection, I go there, and there I no longer have
the same meaning so this shows that you have to take care when you pass the limit.
So, wow, there are only two Greek letters, which is Ρρ Rho and Γγ and gamma so not so
difficult. So I will explain [laughter]. It is a big definition for something trivial. It is
very difficult to make a reduction on many abstract domains at the same time. So the
idea is do it 2 x 2. So I have many finitely many abstract domains. For each of them I
have a computation into a concrete domain, and I have a reduction 2 x 2. It is reduction
IJ takes someone in domain I and someone in domain J and reduce them by [inaudible] to
formula in each of the domains, and I assume that the 2 x 2 reduction that is really the
reverse something smaller than the original, and I also assume that they preserve the
meaning.
So up to now it is many symbols to say something simple. And so now I extend the 2 x 2
to vectors and that is very trivial to reduce vector. I take the two elements. I pick them
here. I reduce them and I put them back in the vector, not changing the other elements.
So eventually the pairwise reduction takes two things and leaves the others not changed.
Then I combine that. That is I take the reduction 2 x 2 for all possible pairs in my vector
of abstract domains. So I get a numerator that will apply for this reduction 2 x 2 and I
just compose them. And then I iterate. So I take the limit of these operator which will do
2 x 2 reduction. Then considering all possibilities and going on forever. Yeah. So let's
go to the result. It should say when you iterate and you pass through the limit, it is more
precise at any iterate is more precise that any pair of reduction, and it is more precise than
the original. That's what we want. And also we have that it will be mainly preserving,
that is it will be, all these will be in the class of all equivalent expression of the property.
The problem is that they may not be the smallest one in this property. There are one so I
have proved that my reduction is [inaudible] correct, but I have not proved that they are
the best possible. And in fact they are not. So in general, the pairwise reduction is not as
precise as a reduced product, but we have sufficient conditions that are sufficient for
having the best but these are many Greek letters so you can read the paper for seeing
them.
And here is a counterexample. I have, this is a concrete, so my properties are a subset of
ABC. My first abstraction is the set Mt. I can only say it's A, so I can say false. It is A or
I don't know. Here I can say false. It is A or B or I don't know. And here I can say false.
It is A or C and I don't know. And now I take this property, the first abstract domain I
say [inaudible]. The second, I say it is A or B. The third it is A or C. So if it is A or B
and A or C, it is A. So here I should have A and I can express it, because I have A in my
domain. So I have improved this one. Here I cannot express A. I can only say A or B,
so I say A or B. And the same for the last one; the intersection would be A and the
approximation of A is A or C here. So I say A or C. Now if you take the 2 x 2 reduction,
see if you reduce these two, you get nothing. If you reduce these two, you get nothing. If
you reduce these two you get A, but you have to reapproximate the result in the domain
so you get back AB, and here you get A and you have to reapproximate in the domain
and you get AC. So you see that if I take any 2 x 2 reduction it reduces nothing, so if I
iterate it with [inaudible] reduce and, so I will not raise the best solution which would
have been this one. Then to get it you have to have all three at the same time, and if they
are N, you have to have all N at the same time, so the reduction is really to be optimal
must take all abstract domains and if you go 2 x 2, you are less precise in general, but
always correct. It is more efficient to implement, because it is easy whenever you
introduce a new abstract domain you make a reduction with some of the previous one for
which it is useful.
So now you have to show that Nelson-Oppen is doing that. So I will start with an
example and stay at the level of example. So I take this formula, Ψ is X equal A or B and
F of X is different than F of A, and F of X is different from F of B and I have two
theories. One where I have A and B, and the other where I have F. So what NelsonOppen is doing is the first phase is purification, they call that. So they will say when we
transform my formula into a conjunction of two formulas, which are all in the same
theory. So this one is in the theory with only As and this one is in the theory with only F.
And to do that I already told you I introduce auxiliary variables so here is Y and Z that
are shared between the two formulas. Psi 1 and Psi 2. And now that I am here I will ask
the server to say do you have a solution for this one. Then I will ask do you have a
solution for this one. And from these two questions I must answer whether this is
decidable or is a satisfiable or not. So the first phase is purification. So we have not seen
that in that interpretation, so no, so I don't speak about it now. I speak about it later.
So then the second phase is from this I will infer equalities or inequalities disequalities
between the variables. So the for the equalities that I get from the first is X is A and Y is
A so I have X equals Y, and also I have X is B and Z is B so I have X equals B, and for
the second I have if X was equal to Y this would be equal, so it is not possible to have X
equals Y, and same here. I cannot have X equals Z because this would be false. Then
you put this back in the two formulas. So now I have X equals Y and B here, plus X is
different from Y and X is different from Z so I get here false. And the original I will not
get that in one shot. I have to iterate. That is I will add after adding inequalities or
equalities I get more that I will push back into the other, and that will get more, and that
will push back into the other, and each time I will say I will send to the server to
conclude. That's why I learned that is not at all like this that server will do. But
[laughter] when you [inaudible] that is what you understand.
So then the conclusion of Nelson-Oppen they are followers because they were wrong if I
know the beginning. So if the theories are disjoined that means that A and B are not F.
And this one is something which says if it passes the limit correctly, and this one says
equalities or disequalities is not enough; you should have conjunction of disjunction of
equalities and things like that. But if the theory has nice properties, you just have to
propagate equalities or disequality. And if you have all of this hypotheses, then the
[inaudible] are determined because there are finitely many variables, so you have finitely
equalities and disequalities that you can propagate, so we must dominate. It is sound that
is when you propagate in formation then you take the computation is the same, so it is
meaning preserving as I was saying. And this is complete if the formula is satisfiable, it
will always succeed.
And also I have seen in some paper that theorem provers they do, they have the same
technique as they use, although they are not complete. So why do you have these
hypotheses. It is essentially to get completeness. Because if you don't have this one then
you will propagate not enough information, but the information you would propagate
would be [inaudible]. If you don't have this one it is, this one is essentially when I pass
through the limit I see that I have this which is 2 so I don't care. And the disjointedness
of the theory signature is there to say I cannot propagate something else then equalities or
disequalities. So that my interval and congruence example I was propagating more than
equalities and disequalities. I was propagating 100 values, in fact, and this is a condition
that ensures that propagating equalities and disequalities will be sufficient for concluding.
So you see that if I eliminate all of these restrictions I get determinates and I get it sound.
And I abandon its complete. But in such analogies nobody cares about completeness
because we serve an undecidable problem, so you have no solution anyway. So that was
this that I explained. I forgot that yesterday evening I added because I look at the
[inaudible] and I know you have won the competition. Not in all categories, but they are
really the best. Congratulations. The point is that the paper, they have to be complete or
otherwise they will not win the competition. So I understand that if you want to win the
competition when the answer is yes, you must then say yes or no. But otherwise, we
don't care, because since everything is undecidable, you can abandon all of these
possibilities, and what is nice is I understand you have nothing to change in your SMT
server. You just use it for theories that are not satisfying this at all. And it will give
something which is correct if I understand well.
>>: [inaudible]
>> Patrick Cousot: What?
>>: For [inaudible] arithmetic it is incomplete.
>> Patrick Cousot: Ahh. Already you are incomplete, so I am worried for nothing,
because I [laughter] I just [inaudible] you said it was very important and [inaudible]
people want to know why it does not work. I am happy to see that you're on the right
side [laughter] [inaudible] of completeness [inaudible]. So now I have to show that this
is true that you have a reduced product. And so the problem is that you have more than
values, because in the previous thing I was assigning values to variables, but when you
go to static analysis they don't do that; they do more. And in the example is [inaudible]
so [inaudible] generalization of complex number to locate the position in the space. And
you have a normalization like a with a the norm of complex number, which is psi
expression, so if you make an analysis of this one must be one most often, except when
it's not and then we normalize.
And if you normalize ABCD you get no chance to prove that this is going to be one. So
what we do is we had something that will [inaudible] this expression, so we give this
value to variables say [inaudible] equals that, and whenever we modify A or B or C or psi
we see the influence on the denominator. And when we need these values we know that
it will be in the value of the denominator. So that is like I told you [inaudible] variable
but it is done in a [inaudible]. In fact we just decompose a fraction by putting D in the
variable and assigning it to D and D is always assigned the same formula so we can make
the special analysis for it. So in fact it is not solvable justifies variable to which we
assign a sub expression that we have somewhere in the theorem and we want to observe
this sub expression. So this first phase of purification is just this name, sub terms and
keep track of their values.
Then so ahh, so this is purification. So you know that perfectly so, don't need than the
[inaudible] you see in the Nelson-Oppen you take pairwise theories, formulas and
theories and you propagate the equalities or inequalities of one into the other. And the
trick is that it is only the reduction that you can do because of the disjunction [inaudible]
because the theory of disjoint they cannot share values, so information on values that you
have one like positive, it cannot influence any other one because any other one can speak
of positive because of the theory of disjoint. But if you have for example, parity and
[inaudible] or something like that, then the symbol under disjoined because they share
plus and so it is forbidden. But then you must propagate more than just equality like
inequalities. Manuel?
>>: So does the disjointedness require that for soundness or…
>> Patrick Cousot: No. In my opinion I have reread the proofs and I have the impression
really that it is for completeness.
>>: [inaudible] also [inaudible].
>> Patrick Cousot: Yes. But you see also for, in practice if you don't make the reduction
that they have shown, you will get very poor result. So what they would have to do is if
they share information, look at what is the common information that you can express in
your theory and propagate it to the other which makes the thing more complicated. But,
so here we are. So now we can use this to combine the two kinds of analyzers and from
my discussion with Nikolai a few days ago, I understand that it is a bit different from
what you are doing. But I don't know exactly what you are doing. So maybe it is the
same. So my idea is following, one you other than abstract interpreter which has
traditional algebraic domains for example Boolean [inaudible] what you want. And here
you have a logical abstract domain with several theories.
And you have already this reduction that exists because this Nelson-Oppen will do this
reduction for you. And on this side you already have these reductions because you
analyzer who usually are some form of pairwise reduction. So what you have to do now
is to write these reductions at, for example, collect, if I stick to equalities on this side, you
can collect all of the equalities that you can find in these domains, and just add it as a
conjunction in the formula here. So you really inject all of the equalities that you have
learned on that side, you will inject them and then the Nelson-Oppen will take them into
account. And from this side, if you have only equalities you can always generalize the
domain can express equalities.
It is very rare that they cannot express equality, so you can propagate the background
from logical too. And with just this I think you have the minimal implementation and
just transfer equalities and fit perfectly the classical scheme. You still have to find a
widening, but if you want to be extremely simple, you stabilize on this side and then you
forget about this one [laughter]. [inaudible] I can't imagine, or you or you will do one
more iteration, and you formalize that change. You put it to two. That's another less
stupid way. So that's it. You need to know completeness, this will be the actual
reduction is really simple. The end-user will not need to have inductive invariant if you
do the widening on the left side, otherwise it will be trivial.
Then one thing which is nice is that the interaction with a [inaudible] you can formalize
rather easy when you have abstract domain sometime it is used to communicate the
Boolean [inaudible] 1000 facets. It is not that easy. So I think this would solve one
problem we have, and also if you want to introduce a new abstraction, if you introduce
this on that side, it cost less. It cost you nothing. Because I saw everything is just
implemented before. And you pay Nikolai for improving his [laughter], introducing new
theory but it is not your work. You are doing your static analyzer, so it's not your
business. And you can complain to him if it does not work. Whereas on this side, you
have to do it yourself and if it does not work, it is your fault. So, have no implementation
and I have a student who I hope will be able to do something to experiment. So that is
good news. It is great. And the bad news is that I am not sure that we solve all of the
problems in this way, because we get more expressivity. I am not sure we get more
efficiency. And I am even less sure that we get real productivity, reproducibility. That is
a bit long here. I look in the dictionary for spelling, but it was not the proper word. It is
some other one. So because if I understand when you run two times on the SMT server
on the formula, you don't get the same answer at the same time.
>>: [inaudible] [laughter].
>> Patrick Cousot: [inaudible]. So you see the [inaudible] abstract analyzer [inaudible]
the code of [inaudible] is 6054 hours. It goes to 36 on two, three machines. It will tell
them sometimes it will be 36 all the time, it will be 200. It will not like it. So we need
for static analysis system that behaves the same way in all circumstances.
>>: [inaudible] the size of the [inaudible] [laughter].
>>: [inaudible] take exponential time in the size of the [inaudible] every single time.
[laughter].
>> Patrick Cousot: So maybe it should be almost [inaudible], but it is here. And so I
have an announcement of a past event before conclusion. Because we had a seminar at
ENS by Leopold Haller who is a young student, and he explained that DPLL is abstract
interpretation, and I was convinced that because that was the first time I probably
understood the DPLL [laughter] and so it may be that there is another connection
between the abstract interpretation and the SMT solver because [inaudible] server are
different but not so different from SMT server. So there might be other connection that
might be interesting to explore. Thank you.
[applause]
>>: I have a question, I guess. So I understand the main issue is finding abstract remains
that the signatures are non-disjoint and so in the decision procedure integration
[inaudible] can you combine theories of non-disjoint signatures…
>> Patrick Cousot: Yes?
>>: And are there other cases where you can't complete integration such as the
[inaudible] concrete [inaudible] that even though the signatures are not disjointed, they
are not guaranteed?
>> Patrick Cousot: You know, I am thinking that you can have a theory of int [inaudible]
which is not good for you but for us it is nice, and the theory of module constant that I
showed. Because they share plus you presently is forbidden because they share a symbol
in the signature, but when you make the analysis it's very easy to transfer one expression
to one side or the other. It was no problem. It is the same expression in both cases.
>>: [inaudible] converging and not complete.
>> Patrick Cousot: Yes. And it will be not complete, but…
>>: It will not be complete?
>> Patrick Cousot: No it will not be--with the present state of the art, I think it is not
complete because you have to transfer more than equalities. I don't believe the
[inaudible] is…
>>: [inaudible] integration [inaudible] and intervals [inaudible].
>> Patrick Cousot: The complete integration would transfer…
>>: [inaudible] reduce [inaudible] the bases…
>> Patrick Cousot: Yes. We have one. The one we have is complete. That is we have
an algorithm which was done by Gransier years ago, and you sure get exactly the right
reduction, because it is simple finite. If you try to reduce, for example [inaudible] with,
so you have some kind of [inaudible] plus linear congre answers [inaudible] linear
expression equal, ahh, the constant equal [inaudible] would you lure a constant? And to
intersect that with the Boolean [inaudible] would be really difficult. But if you have a
domain with a high loop [inaudible] to make the reduction with [inaudible] not so easy
either. So you have difficult [inaudible] and [inaudible] which may be complete
reduction, but they are so [inaudible] you don't want to do that. And so in the simple case
we have completeness and completed the case we do a sub [inaudible]. For example, in
[inaudible] we don't do all of the pairwise reduction. There is an order among the
abstract domain and we do the reduction in some of there, because we know which are
the reduction and which are the useful. [inaudible] pointer and [inaudible] it will just
transfer zero, null; we don't care. So we don't do the reduction.
>> Francesco Logozzo: Any other questions? Okay. I think we are done so, thank you
again. [applause].
Download