36209 >> Rustan Leino: Good afternoon, everyone. I'm Rustan... pleasure to introduce Nadia Polikarpova who has a long history...

advertisement
36209
>> Rustan Leino: Good afternoon, everyone. I'm Rustan Leino and it's my
pleasure to introduce Nadia Polikarpova who has a long history of doing
interesting things in verification. She was a Ph.D. student in Bertrand
Meyer's group in ETH and wrote a thesis on semantic collaboration which
extended this sort of thing that we had tried to do in the spec chart project
and the VCC project here in the Rice group and extended it in ways that we
could not have imagined at the time to let you write specifications of
programs. And then she has looked at specifications in different languages,
written fully verified collections library. She did an internship here with
Michal Moskal working on security properties of the TPM which she did in the
context of the VCC verifier. She's participated in several verification
competitions and there was one that I thought that she was the clear winner
and somehow no winner was ever announced. I don't know what happened to the
contest. It was very strange, but in my eyes, she was the winner.
>> Nadia Polikarpova:
Thank you.
>> Rustan Leino: Yes. And she's worked on a tool called boogaloo, which
executes boogie programs. And if you know what boogie programs are, and the
partial commands, you'll be wondering how this is done, and you're welcome to
ask her afterwards. And many of you who are for example in the ironclad
project or have used Daphne will be familiar with and will love the calc
statement that is in there to write proofs, and that's also Nadia's work
here. So today, though, she's not going to talk about any of that. She's
going to talk about synthesis. And let me end by announcing that if you're
watching this talk online, I'll be monitoring questions if you have questions
you can type those in as well. So with no further ado, welcome Nadia.
>> Nadia Polikarpova: Thank you very much, Rustan. So I'm currently do a
post doc at NYT with Armando Solarzano, so I'm working on synthesis because
verification was too easy for me. And I'm going to talk today about
something that we've been doing with Armando in the past, like half a year.
And it's program synthesis from refinement types. It's very much work in
progress still, so there's some things at that maybe are not perfectly
working yet, but I will be really curious to hear what you have to say about
this and I'll be happy about any feedback. As we all know, developing
programs is hard. And developing correct programs that always do what
they're intended do is even harder. And the goal of program synthesis is to
help us with that by providing us with a way to describe programs that is
more high level, more concise or more intuitive than what program languages,
mainstream program languages can offer us today. And in the other program
synthesizer will transform this description into something that is still
efficiently executable. But since the difference between those two
descriptions is quite large, there is no algorithm that can just simply do
this compilation and usually some kind of search is involved in synthesis.
So in a high level, most program synthesizers look like this, there will be
some component that can explore this pace of candidate programs and there
will be another component that can check that a candidate actually matches
the description that the user provided and then give some kind of feedback to
the Explorer and this goes on and on until we find something that actually
matches. All right. And if we want, one important point is that if the we
want the synthesis practices to be completely automatic, then of course
verification has to be completely automatic as well. And the whole field of
program synthesis is quite large and I'm going to focus on like one specific
area which is automatic synthesis of recursive functional programs which is
an area that is very popular in recent years. And in that area, people are
using mostly those three different kinds of inputs of specifications and
corresponding verification procedures because of course the verification
procedure and the input language have to match on some level. So one kind of
specification language that people use are simply input/output examples. And
of course, how can we verify a program for a set of input/output examples, we
can just execute them. So we use like a good old testing and some tools that
were successful with this kind of specification are Escher by summit and
ouse. Myth is from Stephan Church and his group, and lambda squared is from
Rice and UT Austin. Then another kind of synthesis tools for functional
programs using bounded checking as the way to verify candidate programs and
what I mean by bounded checking is you have some kind of executable assertion
or probably you have a program that is unoptimized, uses specification for
generating an optimize program and so the techniques that are used here are
usually like sat based bounded checking. And so sketch is sort of a popular
tool that works on a similar principle but there are some sketches called
SynthRec that actually does automatic synthesis of functional programs. And
finally this goes all the way to the other end of the spectrum where, as a
specification, we use fully formal specifications perhaps written in some
kind of rich logics with quantifiers or recursive predicates and as a way to
verify programs against those specifications we can use deductive
specification and Leon from Victor Concha's group is one example of such
tool. So all those techniques have of course their advantages and
disadvantages and tradeoffs so here pretty much located from like the least
formal that requires least level of expertise from the user to sort of the
most formal. Of course they will have their disadvantages as well, so if you
look at both techniques on the left, they only provide guarantees of
correctness for a finite number of inputs and as a result, they might not
work as well for more complex programs. So for input and output examples,
the result is that for a more complex program, the user might have provide a
lot of inputs and outputs and think about a lot of core cases. For example,
I know that for Myth, even a program as simple as dropping a certain number
of elements from the list requires 13 input/output pairs. And for those kind
of tools that do sort of exhaustive boundary checking, well, if a certain
bound is not enough to guarantee that the program is really correct, the
checking has to go up to a higher bound and then it gets slow so it doesn't
scale to checking the programs on many, many inputs, on the other hand, the
deductive verification doesn't have this problem of scaling to many inputs
but it does have the problem that it is rarely fully automatic because for
deductive verification you know that sometimes you need to give some hints as
to how to instantiate quantifiers or even worse, you have to provide
invariance. So wouldn't it be just great if we had something that gives us
unbounded verification but fully automatically? Of course. That would be
great. And if it works for a big enough class of programs, that would be
what we want. So in this work, we decided to try a different kind of input
language for synthesis. And a different kind of verification procedure which
is based on types and type checking. So it's not a new thing in the
functional programming community to use types to specify computations. For
example, if you are a Haskell programmer, you're probably familiar with this
tool called Hoogle. And what Hoogle does, I can show you right now, it's a
website where you can search for functions from standard Haskell library
using their type. For example, if I'm thinking, what was this function
called that takes an integer and the value of any type A and produce as list
of As of the length of the first argument? And I don't remember its name.
And Hoogle will tell me, oh, the first is all that Hoogle returns that is
replicate, and this is exactly what I wanted. I wanted some number of copies
of the value. But of course, the Haskell type system is good enough to do a
search in the standard library but it's not rich enough to describe a goal
for synthesis. So we need a more expressive type system for that and we
decided to use refinement types. So the refinement types, this term is used
a lot in different context, so I'm going to quickly explain what we mean in
this work by refinement types. If you're familiar with work of Frenchie Gio
and his group on liquid types, then this is exactly what we mean. And if
you're not familiar, I would just go through the basics of this kind of
refinement types. So we them general decidable refinement types. And
basically, what it is is a conventional type think ML or Haskell type that is
decorated with a predicate that restricts the range of failures that this
type has. So for example, this one here describes a type of natural numbers.
And the difference between refinement types and why we call them decipherable
refinement types, the difference between those and general dependent types is
that those predicates are drawn from some kind of decidable logic,
efficiently decidable by SNT solvers. And this is an important fact because
this makes our verification decidable. So this judgment over here would say
that variable N has this type of natural numbers. And this particular type
we're going abbreviate as nat later. Okay. So what we can express with
those -- yes?
>>:
So can you explain [indiscernible] is that kind of a computation class?
>> Nadia Polikarpova: So it's basically the whole approach is parameterized
with what kind of logic you want to use in your refinement. It's not source
specified, but the logic that people usually use would be linear integer
arithmetic with interpreter functions and a raise, so this is kind of a class
that is well explored and can express a lot of stuff. But you can plug in
other logics there as long as you can decide them efficiently. Thank you.
Okay. What can we express with those? So not only we can talk about base
types, restrict base types like integers, but we can also give refinement
types to functions. For example, this type over here describes function max,
maximum of two integers as a function that takes two unrestricted integers, X
and Y, and return as value that is great or equal than both of its arguments
and as you can see, in function types, we can give names to the arguments so
that we can use them later in the type of the result. And this is what makes
it dependent function types and this is what lets us express pre and post
conditions. And finally, we also have algebraic data types which are
polymorphic. For example, this one here says that Xs is a list and each
element of this list is a natural number. So this Nat again is just an
abbreviation for this. So the definition of list is just what you would
expect with a regular type system but as you can see, we could instantiate
this type parameter here with a refined type to express a non-trivial
property that every element of the list is greater or equal to zero. And not
only we can express such universal properties of data structures. We can
also talk also for example about the lengths of the list using the construct
called measure. So you can think of this measure lengths as just a function
defined inductively on the list. But it's really syntactically restricted in
such a way that you just have one definition of length for everybody list
constructor, and in terms of verification, what happens with this definition
is basically there's this syntactic transformation that takes those
definitions of length and just appends them as refinements to each -- to the
type of each constructor of list, and at this point, you can just treat this
length completely as an uninterpreted function. So you can forget about this
definition and everything we know about length is just that the length of nil
is zero and the length of cons can be calculated from the arguments in this
way. Okay. And using this, using measures, we can express recursively
defined properties about algebraic data types. And the cool thing about this
is that the type system actually works for us to instantiate and generalize
those properties completely automatically so whenever we're going to
construct a list, a type system just by using the type of the, for example,
the cons constructor will generalize this property that all the elements are
natural numbers. And whenever we are matching a list, so we're
deconstructing a list, we will get those properties back completely
automatically without needing any kind of heuristics to instantiate
quantifiers. And this is why refinement types have been so successful in
verification of non-trivial properties with very little to know manual input
and doing these thing completely automatically. So they've been used in
verification. How can we use them in synthesis? Well, let's try to use a
refinement type to specify a refinement to specify a synthesis build. So
remember the function replicate that I showed you in Hoogle. How could we
specify such a function? Well, I want to say give me a function that takes a
natural number and a value of any type beta and returns a list of values that
are equal to this second argument and of the length that is equal to the
first argument. Okay. This is basically the complete specification of
replicate. And in order for the synthesis to work, we also have to provide
some kind of components that can be used as computation primitives. So in
this case, we provide it with obviously the list data type and also we would
give our synthesis procedure the increment and decrement functions over
integers for which we also have to provide their refinement types. But, by
the way, we don't even need their implementation. And the goal is now to
find a function that has this type and uses those components, is allowed to
use those components. Okay? So let me show you how this work west side our
prototype implementation real quick. Oh, no, first, first there's something
I forgot to mention, is so look at this type of the list elements here.
Surprisingly, if we replace it with just beta, this specification is as good
as the previous one. And this could be surprising at first, but really, if
you think about it, since this type parameter can be instantiated with any
refined type, what the specification is really saying is that whatever
property X happens to have, every element of the list must have the same
property. Including the property of being equal to a particular value. So
this actually shows us how expressive polymorphic refinement types really are
because they let us abstract or quantify our refinements. So let me show
what our prototype implementation would do given this as input. So I have
prepared it here for replicate. So it takes us a split second to generate
this implementation here which basically says it will be a recursive function
that the algorithm decided to name F2 which takes N and the Y as arguments
and it synthesized this branching here. If the length parameter is less than
equal to zero, which basically means zero because its type is natural, then
we'll return a nil and otherwise it will cons a Y to a recursive call of the
same function of two on the same Y argument but the first argument is
decremented. Which is what you would expect. But note that the algorithm
was able to infer this condition here, which is pretty nice, but I'll tell
you later how it's done. Okay. Let me show you another example that is a
little bit more involved. So this is insertion into sorted list. So we want
to synthesize a function that takes an X of any type beta, and it takes an
increasing list of betas and it produces an increasing list whose set of
elements is the unit set of elements of Xs and the singleton set X. What is
an increasing list? How can we define a sorted list using refinement types?
Well, it's actually very easy. We say that an increasing list of alphas is
either an empty list or to make an increasing list of alphas, we have to cons
some alpha to an increasing list of elements that are greater or actual to
the head. That's very easy. So here, we assume that the comparisons are
actually generic here so they're defined on any alpha whatsoever. All right.
And on top of that, we can just -- we added the length measure in the
previous example we can add this -- a different measure that returns the set
of elements of the list but exactly some the same way. And with this
definition in hand, week define this insert function. Okay. So for this
example, it would take our tool slightly longer because it's a more complex
example, but still, in just over one second, we can synthesize this
implementation, which is again a recursive function with two arguments and
would match on the list and say if the list is empty, we'll just return the
singleton list of X and then otherwise it will compare X to the head of the
list and then if X is less equal to the head of the list, it would just cons
X to that whole list again and otherwise, it will cons that Y to the
recursive call off of the same insert function, which is again, the
implementation that you would expect. It doesn't seem like much at first,
but actually, to verify such an implementation, you need some non-trivial
reasoning if you think about it. Because to verify this branch that actually
this cons why to insert XY produces a sorted list. What you need to know is
that all the elements of the list that is returned by the recursive call are
greater or equal than Y. And basically, this means that you need to know
that you -- if you insert something that is greater than or equal to Y into a
list that are greater or equal to Y, you get a list where everything is
greater than or equal to Y, which basically means that you have to strengthen
the specification of the insert function itself for it to provide it with
this property which requires some kind of specification discovery? And if in
the language of refinement types, what it really means is that we have to
figure out a refined instantiation for this beta type here to say that in
this particular call to insert, its type won't be just an IncList of betas,
but it will be an IncList of something -- oh, this should be beta, sorry, not
int -- of something that is greater or equal to Y. So how do we do this?
How do we automatically discover predicates like this, and in general, how do
we do this type checking of refinement types completely automatically, which
is what we need for synthesis? Yeah, by the way, since this non-trivial
reasons is involved, this is actually a first example of automatically
generated implementation of insert tool IncList that is fully verified,
unboundedly verified because Leon can generate this as well but it cannot
verify it because it cannot discover these kind of properties. All right.
So do we do this type checking? Well, the thing is there is a technique that
can infer this kind of refinements completely automatically, and do
refinement type checking completely automatically and this is liquid type
inference from again from Bertrand Meyer's group at UCSD and this technique
relies on a combination of conventional [indiscernible] style type inference
to infer the shapes of the refinement types, which means the conventional
type that under lies the refinement type. And then it uses predicate
obstruction to infer the refinements. Well, so can we just use this as the
verification for a synthesis procedure and just be done with it? Well, that
would be nice but unfortunately this method is fundamentally a whole program
analysis. And how do I mean that? Well, let see how liquid type inference
would do on this example where we want to check this expression if X is less
than zero, then a single to enlist with minus one otherwise single enlist
with one I guess is type list of naturals. Obviously it does not have this
type. There is a type error here. But how would liquid type inference do
here? Well, the thing is the liquid type inference is meant for really type
inference in the context where no user type limitations are provided. So it
doesn't even assume that this top level, the type of top level expression is
known. It just tries to discover type of every expression from the types of
its sub-expressions completely from scratch. So say it doesn't know this
type. So let's look at all the sub-expressions of this expression and so we
know what types of one minus one are. And we know that those list
expressions are all of type list but we don't know the instantiation for the
generic parameter. So what a liquid type inference would do, it will first
invoke hidden linear which would infer the shape of its types and so it will
know that it's a list of integers but we don't know what the refinements are
yet. Then they would insert predicate unknowns in all the places in those
inferred shapes where the refinement is missing. And then they will use
predicate abstraction to reconstruct those refinements bottom up in a
completely bottom-up style. So for example, and they would construct the
strongest refinement that is allowed by the sub-expressions. For example,
here, since the nil is not restricted by anything, its strongest type is list
of false. Then basically to discover the type of this cons, we would have to
take some kind of list upper bound of those two and we would get the list of
minus ones. And here, in the same way, we will get a list of ones, and then
at the top level, we will take sort of list upper bound of those and let's
say in our language we can only express it this type as list of true. And at
this point, we see that list of true is not a subtype of list of nats and we
discover that there is a type error. But you can see that we had to analyze
the whole program before we could discover this type error. And there's
really two problems here. First problem is that the type information is not
propagated top down because we don't even assume that there is any type
information at the top. But the second problem is that there are really
those two stages. There's this Hinlan Millner shape inference which is known
to be a global approach. So it generates all the unification constraints and
then solves all of them for the whole program. And only after that if first
phase is done for the whole program, we can start inferring the refinements.
And this kind of whole program type inference might work completely fine in
the setting of verification. But it's really a terrible idea for synthesis
and let me give you a little analogy to show you why. Let's say you have a
combination lock and you -- a verification is like when you're pretty sure
that you know the combination and you just want to double check that you're
not wrong, so in that situation, you are not -- so you're not really hindered
by the fact that the logical will only tell you if the combination is correct
once you get all the numbers right, which is like a global verification. But
synthesis is like lock-picking. So if you really don't know the combination
and you want to determine the combination, then it will be really great if
that lock could tell you for every digit if that digit is correct or not.
You would be able to pick that lock much faster. So we need this kind of
magic lock technology for modular verification technology to enable scaled
synthesis. Okay. How can we modify this global bottom-up liquid type
inference to make it modular and enable scalable synthesis? Well, first of
all, we have to make use of the fact that we actually have this top level
type available and try to propagate this type information top down. So in
this case, let's say we know this must be a list of nats, we have those
sub-expressions, so we easily propagate this information down to both
branches of the if and basically say, well, if the whole thing is a list of
nats, then the then branch must be a list of nats under the assumption of the
if condition and the other one also must be a list of nats under the
assumption of the negation of the condition. Something like this.
Unfortunately, we cannot propagate type information all the way top down to
the leafs because it's not possible to propagate it through functional
applications. A type of function application doesn't uniquely determine the
type of function and the type of the argument. So at this point we sort of
have to switch direction and go bottom up for a while until those directions
meet. And this is really the idea behind bidirectional type checking which
was discovered by Pierson Turner in the year 2000. And we will be using this
idea here. So let's say we got down to here and then we start top down, and
then we start bottom up but then at this point, those two directions meet and
this is where we can do our type check and this will be much more local. So
at this point, we do some shape inference and then we discover that there is
a type error without even looking before even looking at the second branch of
the if. So we really made this type checking much more local and much more
modular. Okay. So basically, this is our proposal for synthesis from
refinement types. It's just like before except instead of this cold program
liquid type inference we use this new technique which we call modular
refinement type reconstruction and it combines the ideas from bidirectional
type checking as I just told you. It still uses the same kind of techniques
motivated by predicate obstruction to discover the predicates and one
technical challenge that it really has to address is well, now, we cannot do
a phase of shape inference for the whole program before we start this kind of
refinements. So we needed to find a way to interleaf shape inference and
refinement inference. And this turned out to be pop. And one other thing
that we do differently from liquid type inference is since we're doing things
mostly top down, we are actually inferring the weakest types, the weakest
refinements instead of the strongest refinements. And do that allows us to
use exactly the same mechanism that we use for inferring types to infer
branch conditions in the conditionals which is what you saw in the first
example. Because if you think about it, we have a mechanism for predicate
discovery, why not use it for branch conditions. Okay. So at this point,
just putting it all together, the whole enumeration and verification parts of
our approach, I can just show you the first example again, the replicate
example but really step by step how the whole search works. So on this
slide, what we have is so the current goal type, this is what we want to
synthesize, and the current available components. This is the environment
that we can use and the current program that is the output of the synthesis.
So the first thing that our tool will do is look at this gold type and see,
well, it's a function type. So we know that the output will be a function so
it's really easy to deal with that so basically what it would do is we'll
say, well, to synthesize a function is really just synthesizing it's body
given its arguments. So it will add the arguments of the function like N and
X into the environment, into a set of available components and it will also
give this function a name because it wants to make this function recursive in
its first argument but not the second because this first argument is of type
nat which is a predefined well-founded order. So the tool is allowed to
recurse in this argument but not on the second one which has a type which we
don't know anything about. And to enable recursion, what the tool does is
basically adds as another component to the environment the same function
which would basically be used as a recursive call. But note that it's type
is slightly different so instead of the first argument just being of type
nat, it's actually of type -- something between zero and strictly less than N
which so basically our tool weakens the type of this function in such a way
that it can only be called on arguments that are strictly smaller than the
one that we are originally called with which will guarantee that all the
recursive calls terminate. And by the way, if you're used -- if you're in
verification, you are probably used to that you can ignore termination
arguments for a while and say you're only verifying packet correctness. But
in synthesis, you really cannot ignore termination at all because
non-terminating programs are always shorter than terminating ones. So you
will always get garbage if you don't take care of that approach of
terminating. Okay. At this point, our target type is this. We just need a
list of betas that are of length N. So the tool will first try a bunch of
simpler expressions that are just function applications but it will not
succeed. So at some point, it will decide to introduce a conditional, but
the condition is still unknown and it's represented by this predicate unknown
U1. And then at this point the tool will focus on synthesizing the first
branch. So for the first branch, it will start enumerating functional
applications from simplest to more complex. And the simplest expression that
can go into this first branch and has the right type shape so it's a list of
betas, would be nil, an empty list. So it tries this value nil for the first
branch and then it would try to use [indiscernible] obstruction to infer the
weakest condition under which this would be an appropriate implementation of
the function. So at this point, so, as you can see, you want to use this as
an assumption here, as sort of a path condition. And this point it will
infer that under this condition, that N is less than equal zero this is
actually the type of nil is actually a subtype of what we want.
>>:
[Indiscernible] doesn't say an equal zero there?
>> Nadia Polikarpova: Oh, that's -- so this is a very good question. I kind
of avoided the question of how we actually infer those predicates but so what
liquid types do and what we do as well, is we are given a set of atomic
predicates or other atomic predicate templates and all the predicates that we
infer are just conjunctions are those atomic predicates. So here, I assume
that the atomic predicates that we're given are variable less than equal
zero, variable greater than equal zero or variable non-equal zero. From
those, we can make various kind of inequalities and equalities but we always
infer the weakest one that fits. So here, since this one is weaker than
equality, this we'll get. But if you add equality as an atomic predicate,
then you might as well get equality. That's just a matter of luck then
because those two will be incomparable syntactically. But semantically -actually, we do semantic checks on them as well to cut the serve space. So
you will actually get this one anyway. All right. So now we are done with
the first branch and then of course now the task is synthesize the second
branch under the assumption of the negative condition. So now we add this
less than equal zero to the assumptions and now we have to synthesis again an
expression that has this type. So again, the tool will try a branch of -will start trying function applications starting from the simplest ones so
nil cannot be made so satisfy this restriction on the length in this case
because we know that N is greater than or equal to zero and the length of the
nil is zero and we know that from this. So nil doesn't fit. So we try
something a bit more complex. Maybe there will be a cons and for a cons, we
have synthesized the arguments now. And again, for each argument, we will
start trying simpler expressions first. So at some point, we'll arrive to
this cons of X and then as a second argument to cons we decide to use this
recursive call. And at this point, we have to synthesize now the arguments
for this call. So what I wanted to draw your attention to is that when we
are synthesizing the first argument of F, we are actually really lucky here
because the precondition on this first argument is really strong. So this
precondition will be used to filter the candidates for this first argument
very locally. So for example, if we will be trying N here, then even before
synthesizing the second argument and going all the way up to the type of the
whole Ls branch, we will know that N is not a suitable candidate here because
we know that a suitable argument has to be less than N. So N is not
suitable. IncN is definitely not suitable, and at this point we know that we
have to choose decN locally. Okay. At some point this will give us the
desired result and we don't have any holes in our program anymore and this is
done. So as you can see, the enumeration part of our synthesis procedure is
at this point really basic. So it really does explicit enumeration from
simpler expressions to more complex, but so what we really put some thought
into is that verification part. So we tried to make it as modular and
automatic as possible and already this combination enables -- lets us
synthesize some interesting programs. But we're hoping that if we also make
the enumeration part smarter at some point, then we will get even better
results. So this prototype ->>: Can you go back one slide? So in this specification of F,
[indiscernible] what is M there? It says [indiscernible].
>> Nadia Polikarpova: Yeah. So we renamed the arguments here because M and
X are already taken. We just picked fresh names or the arguments because we
don't want to repeat them.
>>:
So M is a given constant.
>> Nadia Polikarpova: So basically, M was initially the argument that was
given here. So and this M would be added to the environment, which is take
the M from that type and add it to the environment. And that M is also used
in the type of ->>:
So the system automatically inferred that B should be left on M?
>> Nadia Polikarpova: This is just because this -- M is the name of the
first argument of the function in the outermost call. So we are inferring
the body of F called with M and we know that if we want to make recursive
call from there, then the first argument has to be less than this M. Yeah, I
agree, it's not very clear here.
>>: So is the assumption that every time function will make recursive
[indiscernible] and it will decrease somehow?
>> Nadia Polikarpova: So basically, this method is also parametric with
respect to what particular order you choose to make your recursive calls
terminate. So what our tool uses at the moment is it chooses the first
argument that it can recurse on and just uses that one, but it will be also
possible to -- it actually has a switch to make, for example, the less
[indiscernible] tuple of all recursable arguments as ->>: [Indiscernible]? For example, if I want to remit copies to this, so
what kind of thing would come up there?
>> Nadia Polikarpova: So for data types, what we do at the moment is
basically you're allowed to specify the measure that would be used to compare
those lists. So for example, if you define a length of lists and length maps
lists to integers and integers already have an order, a predefined order in
our system, you can just say compare list by length or in other instance you
can say compare them by elements. This is one of the choice we could also
make of course structural recursion that will also be possible but we thought
this will be more flexible if we do it this way. More questions? Okay. So
yeah. The tool that I showed you is called Synquid from synthesis and
liquid. And it's available in Bitbucket. It is, as I said, it's still a
work in progress. Hasn't been really released yet but hopefully will be
soon. And you're welcome to try it. And my last slide, I present my kind of
vision for where this project might be going so I showed you Hoogle in the
beginning, but wouldn't it be cool if we had something like Hoogle but uses
refinement types and can do both more precise search and documentation but
also if the function that you're looking for does not exist, it could
synthesize a function from using all of those functions from the base library
as components so I call this Hoogle plus. So for example, you give Hoogle
plus something like, well, I want a function that takes some value X, X and
the list of Xs, and produces an integer value that is equal to the number of
occurrences of X and Xs. So here, I basically just using another measure on
lists which returns a bag of like a multi-set of elements that I call bag and
then I say well, it's the multiplicity of X in that multi-set that I want
because there's no primitive function in Haskell that returns the number of
recurrences of an element and list, this query would require to do some
little synthesis and then maybe what it returned would be something like
this. Yeah. But that ->>:
[Indiscernible] lock key?
[Laughter]
>> Nadia Polikarpova:
Yeah. Questions?
Maybe [indiscernible] friends on Hoogle or circles.
>>: So I'm trying to think how the technique work in the specification was
in the form of example. So [indiscernible] mention that one problem with
examples is that you might require too many of those. But one way to avoid
this aspect is to say that you're looking for small program or the smallest
program that you can find which matches the examples. And then I think
[indiscernible] would do the trick. So if I'm synthesizing a case of copying
the number of [indiscernible], my feeling is that the program [indiscernible]
synthesized is in fact the shortest program that we can understand with that
example.
>> Nadia Polikarpova: Right. But for example, one of the tools that I
mentioned, Myth, is using exactly this heuristic that they are looking for
the shortest program. But as I said, they still need13 examples to specify
drop and they -- their paper even says that it was not trivial sometimes to
come up with those examples and it's really an interactive process where you
think you have specified everything but then the tool always comes up with
some corner cases. And one of the reasons why they need so many example they
have this property of trace completeness so when they're synthesizing a
recursive function, because you don't know -- you don't have any
specification for the function, so whenever you would have the synthesize
implementation using your cursive call, you need another example that would
specify this recursive call, so basically, if you're specifying length of
list, you specify something a list of length four, you have to specify also
for length, 2, and 1, and zero. And this is how I think those sets of
examples get larger.
>>: [Indiscernible] limited by the technique they're using? I can assume
that if I want to specify a drop of function, I give a long list and I would
specify I want drop input and output that won't be present. And simplest
function would be [indiscernible]. [Indiscernible], we only needed two
examples.
>> Nadia Polikarpova: Mm-hmm. Okay. Yeah. Maybe that's the limitation of
their technique, but so I think what would be really cool is to combine
those. And I think it's even not that difficult because examples are
refinement types in some way. So bringing examples into this framework would
be great because of course, the disadvantage of refinement types is that you
cannot expression anything. You can't expression everything you want because
it's still decidable logic and the combination of those things and examples,
that would be a really great idea.
>>: Even your technique will probably have this issue if the overall
specification for that function is not strong enough or inductive enough to
prove this correctness and you need to actually refine it to be able to
strengthen it as well.
>> Nadia Polikarpova: Yeah. This cannot help on the -- of course your
specification is not strong enough but we think that it's even an advantage
in some cases to be able to provide a partial specification because one other
problem with examples is that what if -- basically you have to know what the
output is of the program and might not be trivial all the time. For example,
you want to specify insertion to red leaf tree. You basically have to know
how red leaf trees work to be able to specify the output whereas with
specifications, it's much easier to say here's answer environmental red leaf
tree and here's what I want in terms of the set of elements and then you go
figure it out. So this really, yeah, it's right over here.
>>: So there are tools like ACL2 and Isbell that instruct proofs of
inductive sorts of things. You mentioned that insertion into sorted lists
was the first that actually verifies this as well. What would something like
ACL2 do? Can it construct terms that are executable program?
>> Nadia Polikarpova: This is very good question. Probably, I mean, I don't
know really an answer to this question. It's probably possible to use -- yet
of course there's a lot of research in the area of proof synthesis that is
kind of separated from program synthesis even though we know theoretically
that it's the same thing. So I think there's much more potential in bringing
that work, that old schoolwork in program synthesis more into program
synthesis and seeing what those things can do. Yeah, this is really -- when
I said this is the first verified implementation, synthesize implementation,
I was really comparing with those really program synthesis tools that, yeah,
that I was considering.
other tools.
>>:
But it would be a good comparison to look at this
[Indiscernible].
>> Nadia Polikarpova: So yeah, I mean, with any query, of course there are
limitations, but, so, I think, so what I learned from let's say
[indiscernible] and his group is that people are finding more and more
creative ways of arranging those types to expression properties that you
wouldn't think before would be expressible. And of course, so, the types
system that we use here only -- so only has those features that actually -but their research on type checking actually went further than that and they
have more futures that we hope we can add later so they have things like
obstruct refinements for example where you can parameterize your type not
just by a type as in polymorphic types but also by predicate. So you can
easily specify things like let's say filter using those abstract predicates.
Yeah. And so I think those -- this kind of refinement types are really just
this surprising combination of -- they're surprisingly expressible and still
decidable and I think -- and I thought it was really worth exploring for
synthesis but of course there are limitations to this.
>>: One great thing about refinement types is that you're able to locally
[indiscernible] space. So is this in comparison if these were not there?
What would the time be? So let's say you're just trying out end to end and
you don't have these perfect types for each variable.
>> Nadia Polikarpova:
>>:
Right.
So --
How much is the gain you said?
>> Nadia Polikarpova: I mean, I cannot say how -- I cannot really compare
with a different kind of specification but where with data in the paper and
our preliminary experiments is basically we did the synthesis in this way
with local check and again we disabled local checking and just do all the
checking on the top level and so we saw that, well, what you would expect for
very small examples there's no difference. But with the bigger examples,
there was a big difference. So maybe I can even bring it up. Oh, yeah.
Right here. So for example, examples like append, deletion from a list, and
both of the functions on sorted list that we try to synthesize timed out -- I
think time out was like 2, 3, minutes so it could not synthesis it with the
whole program analysis but with a modular analysis, it would take like some
kind of seconds. And yeah, so it's what you would expect this kind of whole
program analysis doesn't really scale as we go to a more accomplished
program.
>> Rustan Leino: So thank you all for coming and for your questions. So
Nadia is going to be here all week. If you would like to chat with her 1 to
1, please let me know. So thank you, Nadia.
[Applause]
Download