>> Daan Leijen: It's my pleasure to introduce Ross... student of Soaring Learner at the University of California San...

advertisement
>> Daan Leijen: It's my pleasure to introduce Ross Tate who is interviewing today. Ross is a
student of Soaring Learner at the University of California San Diego, where they do have nice
weather, and he is well known here at Microsoft already because he did two internships here
and he won the Microsoft Fellowship one year. He did a lot of work on category theory of
effects in compiler optimizations, and he worked with Red Hat on a new language called Ceylon
and today he's going to talk to us about usability through optimization.
>> Ross Tate: So as Daan said, I am from UCSD and I research programming languages. One
thing I've been working for is making programming languages more usable, and how I've been
going about doing that is improving the technology for program optimization. Before I really
get into all of that, I would like to give my overall perspective of programming languages. So
the way I see it, people have this amazing ability for intuition and creativity and they've applied
those abilities in order to build computers which complement them with the ability to process
large amounts of data and calculate them with high precision, and since their invention they
have become ubiquitous now in our society. Yet these computers are aren't really useful to us
if we can get them to do what we want them to do, and so I view programming languages as
the sort of means of communications between these two worlds that enables people to enable
computers to enable society. Not having this role, that would mean that programming
languages suffer from all of the problems that people have, all the problems that computers
have and all the problems that communication has and so there are a lot of ways we can work
towards improving programming languages. One of these problems is that programming
languages always have to balance between human usability and computational efficiency. This
is where my work comes in. I'm working on improving technology for program optimizations
and that way we can take the emphasis off computational efficiency and focus more on human
usability. And the reason I feel that this is important is that in my experience I have seen that a
lot of people always keep efficiency on their mind as they program. I had a recent example of
this pop up at UCSD in our grad student lounge were some student had written this program
here and the details of this program aren't very important, what is important is that another
random person came along about a week later and said oh, by the way, your program sucks. It
could run faster by taking this out of the loop. And then another week later another random
person came along and said well, this is a case where the optimizer sucks, a typical optimizer
will take care of this for you. And then another person says yes, let's rely on optimization
technology, but then another person came along and said well, how do you know this string is
immutable. If the string is being changed then this value strlen can be changed as well. But
then another person came along another week later and said actually it doesn't matter,
because presumably because you can look inside this loop and inside strlen and see that
neither of them modify the string and so it's effectively immutable through this block of code
and so the value won't change. And so there's this month-long discussion among a bunch of
random grad students about this…
>>: How did this not run in parallel?
>> Ross Tate: Yes, yes [laughter]. I usually say that, but I forgot.
>>: [inaudible] numbers and things.
>> Ross Tate: Yeah, yeah [laughter]. Way more detailed than we were thinking at this point
[laughter], so the interesting thing is that there is this month-long discussion among these
random grad students about random program and this whole month not a single person
noticed it was off by one error here, and so this bugs me because it means that we are focusing
on very detailed things like efficiency and whether it's being compiled optimized or not
compiled when we don't even have a correct program yet. When this actually happened I took
a picture and you can see I removed some of the inappropriate comments [laughter] and the
goal of my research was to make it so that you can write the code the way you'd like to and not
worry about performance efficiency; let the compiler take care of that for you and that way you
can focus on the more important things like correctness.
>>: So you are also assuming that i is automatically initialized for you?
>> Ross Tate: Yes, I think it's going to be somewhere up there and there are many things wrong
with this code, again someone like, someone just TA-ing.
>>: It probably wasn't an undergraduate [laughter].
>>: [inaudible].
>> Ross Tate: There was already a lot of stuff on program optimization, right? And the issue
though is that a lot of this technology out there relies on a lot of repeated manual
implementation effort; that makes it sort of a black box that the typical programmer can't do
anything with. It's just something that works by magic and they can't affect it. So what I've
been doing is going through these technologies and replacing them with these more reusable
axiomatic systems that are more automated and have a more open interface that programmers
can interact with them. So some areas I looked into so far are things like translation validation
which makes sure that the optimizer does what it's supposed to do and doesn't actually change
the semantics of the program, and looking into things like extensible compilers making it so
that we can easily add new optimizations to a compiler and even inferring optimizations
entirely automatically, so that optimization will be available for newling [phonetic] which is as
they're being designed. These three applications here are all kind of unified by this one
technology I came up with that is called equality saturation, and this is sort of an axiomatic way
to reason about programs. Now this is the overall layout of my talk so feel free to ask questions
at any point. Right now I'm going to start off with translation validation and I'm going to do so
by asking you a question. How many of you have had the compiler actually inject a bug into
your code, that is you went through all this effort to make your code nice and angelic,
completely bug free only to hand it off to the optimizer which through some fault turned it into
demonic code that does something that you didn't tell it to do?
>>: I wrote that demonic optimizer [laughter].
>> Ross Tate: For those of you who have been so fortunate as to avoid this situation, not cause
the situation, let me enlighten you as to how this goes. If you run into one of these bugs it is
absolutely a nightmare because you can look at your code for hours and say it should be doing
this. It should be doing this. I don't get why it's not doing this. The fact is it should be doing
that because your code is fine; it's the compiler that's wrong and you just don't realize it.
Furthermore, whenever you try to observe the bugs, say by inserting print lines or running a
debugger, it shuts off the optimizer so all of a sudden the bug goes away so it's like there's
quantum physics happening inside your code which like quantum physics can be very confusing
for the program. Furthermore, once you've finally figured out that the compiler is at fault, and
then you have to figure out how to rewrite your code in some weird way to make it so that it
stops introducing this bug anymore. All your coworkers get confused as to why your code is so
ugly. As you can see this is frustrating for a programmer but it's not only frustrating for
programmers; it's also frustrating for companies, and many companies have a policy of not
using these optimizers because they can't afford these kinds of bugs. They do so at rather high
cost. After all this means that they pay all their programmers to do these optimizations by
hand and hand optimized code can be more difficult to maintain over time and so these costs
accumulate over time as well. So these companies would really like to be able use one of these
optimizers even though it's a little iffy because typically they do work. So how can we go about
doing that? To get sort of a single run of the optimizer we have this original program and the
optimized program. We can incorporate a technology called translation validation and what
this does is it looks at these two pieces of code here and tries to figure out whether they are
equivalent. If it succeeds, that means you can use the optimizer code safely because you know
it's the same as the original code; you haven't introduced any bugs. And if you can't figure this
out then you just default back to the original code just to be safe. Now there are many ways to
build translation validators. The most common one is to use by simulation relations to sort of
try to figure out how these two programs walk together step-by-step. But these things have
some difficulty with bigger rearrangements of code and so we have looked into another way of
doing translation validation using a quality saturation. So to illustrate my technique, I'm going
to start off with a very basic kind to program, just simple expressions here and we'll elaborate
on some more complex programs as we go. We consider, okay this i x 2 + j x 2. We will hand it
off to the optimizer and it turns into i + j << by 1. And want to figure out whether or not these
two programs are equivalent. The idea I had was let's take these programs and turn them into
nice mathematical expressions and once we're in this nice mathematical world we can start
applying nice mathematical axioms to reason about them. We would know for example that <<
anything by 1 is the same thing as multiplying by 2, so that tells us that the optimized program
is equivalent to this intermediate program here. So really we know that multiplication
distributes to addition and so it tells us that the intermediate program is equivalent to the
original program. So just by using these very basic language axioms, we can figure out that
these two programs are actually equivalent to each other. Now this works very well for these
nice clean mathematical expressions in programs, but we wanted to make this approach work
for more realistic programming languages like C and Java. We had to figure out how to
accommodate challenges like loops which I'll go into later on and effects. The issue is that
typical imperative languages have things like statements, which can not only do mathematical
expressions but also can read from the heap and modify the heap, so we want to figure out
how to represent these statements as mathematical expressions. So to do this I came up with
this concept of effect witness and what it does is it takes the d reference of r here and it
represents it as an expression load from location r from the current state of the heap Sigma and
so all uses of heap are explicit in our representation. Similarly when we modify the contents of
r here well then we map this to a store which not only takes the value of store and the location
store too, but also it takes the state of the heap it is modifying and then returns the new state
of the heap after the modification is done, and so all modifications of the heap are also explicit
in our representation. Now if we consider this program in more detail what we are doing is we
are taking the contents of some location and then putting those contents back into the same
location, and so assuming a reasonable memory model this program really doesn't do anything
whatsoever. And so we would like to do a reason about that with our mathematical
expressions, and the way we do so is we say whenever you have an expression that looks like
this one, well then this store is actually going to be equivalent to the original incoming heap
Sigma. With this kind of reasoning we not only can reason about the values of programs, we
can reason about the effects of programs. So with this approach I designed a translation
validator that works for C and we implemented so far for C and Java, and it takes these two
programs, the original optimized program and these programs come in the form of a control
flow graph, the standard representation of comparative programs, but this is not very good for
algebraic reasoning as we found out, and so what we did is we came up with a new
representation, one that I call a program expression graph and you converge to that. And the
reason we call it a program expression graph is that it represents the entire program, or really
the entire method as a single expression that forms a graph rather than the tree because it has
some recursive loops in it. And so once we have this nice mathematical representation then we
can move on to equality saturation, and what this does is apply these algebraic axioms in order
to infer equivalences and hopefully you can figure out that these two programs are equivalent.
Now to get a more detailed picture of this let's consider this program here. Don't worry about
the details of this program; it is just contrived to illustrate how my technology works. Let's
suppose we hand this off to an optimizer and it spits out this optimized program here. We can
all look at this and say well, what it did is it took these two different references of p here and
stored them into a temporary local variable that it used twice. Similarly we had these two
multiplications by b here and we signify that through a single multiplication by b. So the
translation validator has to do is figure that out entirely automatically, and the way ours goes
about doing that is first taking this original version of f and converting it through a program
expression graph, our own representation. To do that we take this d reference of p here and
translate that to a load that takes location p and the incoming state of the heap Sigma. And
then this call to strchr here gets translated to this call where these two parameters s and the
result of that load operating on the current state of the heap Sigma and has these two outputs
and what these outputs correspond to his this pi v node is the return value of the function call.
That is the value being exploited into this x here, where this pi Sigma is the resulting state of the
heap after the function call completes. In particular, when we have this next reference of p
here, well then we use that new state of the heap rather than the original state of the heap and
so we distinguish these two loads in our presentation. Then we go on to this additional
multiplications here which we store into the heap in order to get a new state of the heap for
the function. And lastly, when we return x, we mark that this pi v is the returned value of the
function where it's at store as the resulting state of the heap after the function call completes.
Once we're done with the original program, then we move on to the optimized program and do
the same process which I will skip over, but we're going to reuse nodes as much as possible in
our [inaudible] and speed up the process. At this point once we have this nice mathematical
representation, we make sure it's complete so that we can actually throw away the original
control flow graphs and just work on this mathematical system. So we can start applying
axioms, so one is the fact that multiplying by two is the same thing as << by 1 and so knowing
that we can add this equivalence here. Note that we are adding an equivalence; we are not
throwing away the original program. This is how we differ from a lot of other techniques which
are more destructive and this is how we make our system able to adapt to a variety of
compilers and a variety of optimizations. Particularly in this situation this << by 1 isn't actually
useful for this translation value, to validate this translation, rather we want to use additional
multiplications and applied distributivity. Once we are done with the math up there we can
move down to the function call and we can incorporate some knowledge that we know this
LLVM, this optimization framework that we targeted is using and in particular, LLVM knows that
this call to strchr doesn't modify the heap and so the state of the heap after the function call is
going to be the same as the state of the heap before the function call, and so we can add this
equivalence to the representation. Once we've done that, well we have these two loads from
the same location and now we know that they are operating on the same state of the heap and
so we know that the result of the loads is also going to be the same and consequently the
results of these additions are going to be the same and these multiplications are going to be the
same and transivis [phonetic] tells us this addition is equivalent to that multiplication. So this
leads us to these two stores and we can see that they are operating on the same heap at the
same location and now we know they are storing the same value. So they are going to result in
the same state of the heap after the storing as well. What we have just proven is that the
original f and optimized f have the same overall affect. And so all we have left to prove is that
they have, that they return the same explicit value. This is really easy in this case because they
actually start off with identical values in our representation. And the reason why that happens
is that our representation being mathematical gets rid of all of the intermediate variables and
so [inaudible] differences between control flow graphs sort of go away and in fact many
optimizations are validated just by translating our representation without even having to apply
these kinds of equivalences.
>>: Where does the [inaudible] analysis come from?
>> Ross Tate: So alias analysis we use to know if we have a load in store that the resulting
effectiveness or can commute with each other, so if you have two stores and they can
commute with each other, so if you have two stores then they can commute past each other or
if you have a load in store and they are in the same location, then we just use the same one. If
we can prove they are different locations than we just reason that the load is going to be the
same before the store as after the store.
>>: But if there's been [inaudible] how does that work [inaudible]?
>> Ross Tate: So you're talking about making that like the alias analysis out?
>>: Well right. I mean there's the alias [inaudible] on each program but there's also [inaudible].
>> Ross Tate: Oh. I see
>>: [inaudible] the when you are removing [inaudible].
>> Ross Tate: So the fact that there are two programs doesn't really create too big of a problem
for this analysising, because there are still values within the program so it's usually within the
programs that you have operator commutes and then once you figure that out then you'll
figure out that these two operator values are the same across the programs, so the then will
say that it tends to be within two programs or within the program itself and so it's not a big
problem [inaudible].
>>: [inaudible] program point specific information that you used to sort of bind your
conversion to the [inaudible].
>> Ross Tate: So we use, what I am showing here is actually very simplified. Like we have
things like ordering analysis and things like that so knowing one integer is bigger than the other
and stuff and we have alias analyses as well. These can all work on top of this kind of basic
system. Usually yeah, yeah, so alias analysis because we are doing this entirely automatically,
we often don't have alias information, so sometimes, actually this is a problem that LLVM was
at that even though we told it not to, it would do an interprocedural analysis to figure out alias
information and then do optimizations that weren't valid from what we could know. But if you
had a set up, like if you gave it to us, if you actually gave us information, then we could start
applying it.
>>: I see. So you have a way to incorporate [inaudible]? So you are trying to prove [inaudible]
constrained what if there are preconditions that enable the optimizations [inaudible]?
>> Ross Tate: So this is basically the same thing as the alias [inaudible]. There is some
precondition, some fact finder input that we don't know about, and so if you give us
information then, and we have the logical reasoning about it like we do for alias analyses and a
few other things, then we can prove stuff about it, but we can't like we can't infer the context.
We are entirely intra-procedural system.
>>: So basically what you're saying is give me any kind of proofs of the program and if I can use
it to add more equivalence [inaudible]…
>> Ross Tate: They don't even have to be equivalences. You can put arbitrary logics on top of
this, well not arbitrary, but many logics on top of it.
>>: You can only use that if you can improve your graphs somehow like here it's strchr you
improved it because you added on this [inaudible] equivalences said it doesn't modify the heap
and with an alias you would do the same, what other things would you do besides adding…
>> Ross Tate: Oh, you mean with aliases, the alias analysis what you do is you actually tell us
these things are distinct and then we will infer equivalences from that so we will infer the
equivalences. You don't have to give us the equivalences.
>>: Oh, okay.
>> Ross Tate: Same as with inequalities and stuff like that. Sometimes we will take advantage
of inequalities and use those to infer things or particularly bounds [inaudible] that deal with
overflows and stuff like that. So going back, we proved that the return values are identical and
the overall effect is identical and so we basically validated this optimization, right? And we are
able to validate a wide variety of optimizations so to test this out we ran it on the LLVM on
SPEC 2006 C Benchmark suite and what this last problem indicates is that if LLVM made some
sort of change to the program, then three out of four times we were able to validate that
change. What's cool is that another team of researchers actually came along and implemented
their own translation validator, specialized towards LLVM, even specialized towards this exact
configuration of LLVM, and while this did get them significant performance improvement, they
actually were only able to match our validation rate. That is they weren't able to improve their
success rate by the specialization process, and so this is important. Do you have a question? So
this is important to recognize because if you consider these two programs are just side-by-side,
specialization requires a lot of knowledge. You have to know how the language works, but you
also have to know all of the optimizations that the compilers could be applying and you have to
even know what order those optimizations are going to be in, and so using the specialization
approach requires a lot of repeated manual optimization because this means that if you make a
different translation value not only for each compiler that you are validating, but also for each
configuration of the compiler that you are validating, whereas with equality saturation all we
had to know is where the basic axioms about this language and we are able to take care of the
rest entirely on our own. Why this is particularly important is equality saturation can even
validate optimizations that it's never seen before and this will be important for this next
application that I'll be going into.
>>: I have a question on the previous slide, or each one of those percentages is that referring
to a series of optimizations all done together and then validated or individual optimizations that
are validated?
>> Ross Tate: This is method in, method out, you know, how many methods we were able to
do, so there could have been many optimizations and usually when it gets messed up is when
too many optimizations happen and some important intermediate state gets lost. That sort of
tends to cause the whole app that goes on. So if it's, if it's a train that doesn't sort of erase
some path information then it tends to be fine, but if it erases path information we and
basically all the other participants that we know about get stuck at some point.
>>: How did you select which optimizations were validated?
>> Ross Tate: We turned on everything [inaudible] so as I said, some of these are because we
actually used inter procedural information as well, and so that's just something that we can't do
anything about without somehow changing LLVM, but, so this is run inter-procedurally and
then we validated what we could.
>>: Question. [inaudible] translation validation [inaudible] the way it is [inaudible] very simple
once you [inaudible] but in your case you are extracting this [inaudible] producing [inaudible] so
is there a danger of sort of getting into a loop [inaudible] will you keep adding most of the
graph?
>> Ross Tate: The right thing doesn't necessarily terminate always. In our practice if there are
no loops in the thing then we will terminate, but if there are other loops than one of our basic
axioms which I will be getting into later on will guarantee that it won't terminate, so we found a
depth--yeah, because there are just an infinite number of options at that point, and so what we
found is that a breath research through expressions works better than a depth research
because it so prevents you from going down a rat hole. It makes sure that you fire options
broadly rather than okay, go through this loop, okay, keep going through the loop and we don't
get stuck in that kind of trap.
>>: [inaudible] you add particular expressions [inaudible] infinite [inaudible] expressions, right?
>> Ross Tate: One of the big things is that our stuff will reuse nodes as much as possible, so I
get into that more later on, but even if you put the axioms for like associativity and
communitivity, that's like a huge blow up in the number of expressions that are equivalent, by
our representation that is actually quite compactly represented, all of those equivalent
expressions. Yes, there is a lot of variety, but because we had this sort of additive approach and
because we reuse things and say here's the equivalence and here's an equivalence and we have
this nice locality aspect of it where you can reuse sub expressions a lot tends to not blow up in
our representation until we start adding loops.
>>: Can you explain the difference between all and [inaudible]?
>> Ross Tate: This is concerning all methods and there were only extreme methods where the
optimizer actually made the change, so [inaudible] optimizer doesn't, just says input, output,
can't do anything and so this is only concerning optimized ones.
>>: And what was the proportion of the say 1864 functions that were actually changed
[inaudible] from this table?
>> Ross Tate: I don't know. Looking at the numbers I would say it's not huge but yeah,
unfortunately I don't know it. Anymore questions before I go on? Yes?
>>: [inaudible] all of the failures are because of [inaudible] procedures [inaudible]?
>> Ross Tate: No, not all of the failures. Talking with other people who have worked in this
area and stuff like that things that come up are basically too many optimizations have been
applied and, or I mean, it's a good thing that a lot of optimizations have been applied, but done
in a way that somehow some important intermediate state got lost and we can't figure out that
intermediate state that connects the two sides together. So as you saw, optimizing sort of drift
from one side into the other side and you converge in the middle but if something has been
done that makes that middle state just not there anymore, we can't figure out and basically
everybody seems to get stuck in that situation. Good to go? With this approach now the
translation validation, we are making it so that companies can be use an optimizer even though
it's unreliable but still safely do it through translation validation. After doing that we are
making it to actually extend optimizers with new optimizations and make this successful for
typical programmers and the reason why I thought this was important is that back in my
industry days what I remember having to do is a lot of optimization by hand. I would write
some program like this, image processing program here and I'd realize that this i times 50+ j
isn't really the best way to access this image. Rather we need to be doing image plus plus in
order to get rid of that multiplication from inside the loop. Once I recognize this I have this
choice to make between keeping this program easier to understand, not only for me but more
important for my coworkers, and there is this other program that I would like to keep more
efficient. This choice comes up a lot because many optimizers include these [inaudible] three
won't actually perform this optimization. If you are running in image processing, you are
probably run into the situation and you may have thought okay, why don't I extend the
compiler in order to take care of this optimization for me so that way I can keep around this
intuitive code, but execute this more efficient code. Let's consider how much work is involved
that. Well, if you are so fortunate that the optimizer is open source, then all you have to do is
check out a copy of the source code, learn the architecture of the optimizer, and then
implement your optimization even though you were doing image processing, so you may not be
aware of programming language techniques. Then you have to integrate that into the pipeline
and in your compiler and then in your compiler distribute to your coworkers. Before you do
that, you should make sure that you debug implementations since after all we just talked about
how compiler bugs can be quite annoying and you don't want to be guilty of those. Then you
have to deal with the fact that this compiler you just checked out is being upgraded by the
community as well so you have to merge all of those upgrades with your changes in order to
ensure your team has an updated compiler. You can see this is quite an intimidating amount of
work and it has scared off a lot of programmers including myself. And so in light of this why
don't we just make a trainable optimizer, and for this in order to extend it with that
optimization I showed you, all you have to do is we give a single example of the optimization.
In fact, that example I just showed you works just fine and from that alone we were able to take
care of the rest for you. And there's a little catch to this though that is sure, I could learn this
exact optimization here, but how often are you going to be writing programs that work on 50 x
50 images? Not that often. So you really what you intended me to learn is some more general
version of optimization, one that would work for any w, any h, any use of image i times h plus j
and change those into an image of plus plus. Furthermore, you want me to learn a valid
optimization, one that doesn't introduce any bugs and so you hope I learn psych conditions like
this h has to be loop invariant and this use can't just modify i, j or image. Now recognizing that
this is really your intent, yeah?
>>: [inaudible] once.
>> Ross Tate: What?
>>: If image plus plus [inaudible].
>> Ross Tate: Yeah, yeah, but there are many more psych conditions which is why we don't
want people to have to deal with this. Recognizing this intent is to make a system that will take
a concrete example, and automatically figure out how to generalize into this more viably
applicable form. And the insight I had for doing that was that this optimization we gave us here
needs to be correct, otherwise we don't want to learn it, and so in particular these two
programs need to be equivalent. We can understand why these two programs are equivalent;
you can understand why this optimization works and how we can go about generalizing it. So
with that we made up an architecture for a trainable optimizer, and we assume that the
programmer gives us these two snapshots of the program before and after for which they want
to infer for the optimization that they implied. What we do is we and these off to a translation
validator that not only make sure that the programmer didn't make any mistakes in his
optimizations, but also gives us a proof that these two programs are actually equivalent, and
this proof is important because then we can determine which details of the program are
important and which aren't important and can be generalized. So by generalizing this proof we
can actually get generalized versions of the input and output programs giving us a generalized
optimization for which the original one given to us by the programmer in the concrete instance.
And so with this architecture what we enable is to be able to teach, is a way to enable
programmers to teach compilers probably pick optimizations that are guaranteed to be correct
by writing just one example of an optimization being applied, written in a language that they
already know rather than some compiler specific optimization. And I believe this makes
extensibility very accessible to the typical programmer. Now let's get some more detailed
understanding of how this works, let's say this programmer knows that I am giving a
presentation and says Russ please learn 8 plus 8 minus 8 can be trans-formed to 8 and so they
say okay, we are going to hand these off to a translation validator that's going to prove that 8
plus 8 minus 8 actually equals 8, and then we want to generalize this proof, but before we do
that we have to look inside the proof and so let's take a look at what this translation validator
actually does. So the translation validator starts off knowing absolutely nothing about the
equivalence of these two programs. Really all it knows is properties about the overall language
it's working with, so it knows, for example, that anything minus itself equals zero and it can use
this fact to infer that 8-8 equals zero. And so as it runs it's going to construct a proof of this
database. In particular, it's going to say that implying this axiom we could add fact one to this
database. We also know that if something is equal to zero, then anything plus that equals itself
and so using this axiom and this fact we can infer that 8 plus 8 minus 8 has to equal 8. Again we
make a note by using fact one and applying this axiom we will be able to add fact two to the
database. Now this fact two is important because it actually proves that this translation
validation or transformation given to us by the programmer is in fact valid.
>>: [inaudible]?
>> Ross Tate: Huh?
>>: Can you use associativity?
>> Ross Tate: Oh yes, I am sort of assuming that there are parentheses there. If the
parentheses are there then we are not using associativity, but yes. That is another axiom we
can add on. So we just proved that this translation given to us by the programmer is correct in
so we can move onto proof generalization. Now the thing I found out is that proof
generalization actually works best by going backwards through the proof, so I'm going to be
going from right to left and to see how this works, I'm going to move these axioms out of my
way and I am going to start off with some general program A transforms into some general
program B and I want this to be valid and so I need to prove that A equals B. And the way I'm
going to go about doing that is I'm going to look at this concrete proof here to figure out how it
can refine A and B so that they actually are equivalent. As I do so I am going to maintain the
invariant that this is the most general program transformation for which a portion of their proof
that I process so far. Since I haven't processed any of the proof so far I start off with the most
general program transformation. Now to start refining things, as I said it works best by going
backwards, so we are going to look at the last axiom that we applied, this one here. In order to
have applied this, that means that there had to be some c and d such that we knew c equals
zero and 10 we use that to infer that d plus c has to equal d. And so we can look at our
concrete proof to see how we use this. In particular, our proof tells us that we if we apply this
axiom using fact one and so c equals zero has to be fact one and so we can add that to our
database as a future goal to prove. Similarly the proof tells us that implying this axiom we
added fact two so d plus c equals d has to be fact two. But this time we already have a fact
two, namely A equals B, so to reconcile these differences what we do is we can unify these two
effects. We can take all references of A and replace them with d plus c and we can take all of
these cases of B and replace them with d. after doing so it actually makes sense, as the
transcriber notes from below to above, from below to above. After all if you use this fact one
you can get this fact two. Note that when I was making the substitution I also made the
substitution within the generalize transformation and so I restored the invariant. This is the
most general transformation for which the portion of proof that I processed so far applies. In
particular, if I can figure out how to refine c so it will actually equal zero then that
transformation will be correct. So to go about doing that, again, say we are going backwards so
let's look at the previous axiom I applied, this one here. In order to do this there had to be
some e since we inferred e minus e equals zero. Now this axiom makes no assumptions. We
can start off with an empty database which is what you would expect from a good proof and
then we can look at our notes and see that in applying this axiom we had a fact one, so e minus
e equals zero has to be fact one, but once again we already have a fact one mainly c equals zero
and so once again we reconcile these differences by unifying these two facts. In particular, we
are taking all references of c and replacing them with e minus e. Once we've done this, we can
we can transcribe our notes again and say that well, if you add fact one, or you can apply this
axiom to add fact one to the database, and in so doing what I've just built is a generalized proof
that this generalized transformation over there d plus e minus e transform to d is in fact valid,
and our original concrete proof and concrete transformation are just an instance of those
generalizations by taking all of instances of d and e and replacing them with 8s. So by
understanding why this transformation given to us by the parameter is in fact valid, we are able
to learn a more broadly applicable optimization, particularly by examining the proof of
equivalence of this of [inaudible] was able to learn optimization from the programmer and to
sort of recap what I did at a higher level. I hand these two programs off to a translation
validator which gave us the proof that they were equivalent and this proof said, proof works
because these weights are the same and because these weights are the same, and so when we
hand this proof off to the proof generalizer it then inferred or they maintain those equivalences
in the generalized transformation. However, the proof didn't need for all four 8s need to be the
same and so we could use two different symbols, d and e in the generalized transformation.
And what I proved is that using this process will always learn the most general optimization for
which there is a proof of validity. And so with this strong guarantee we are able to run a large
variety of optimizations from just single examples of them being applied and this inter-loop
strength and bound reduction is in fact that image processing optimization that I showed you
earlier.
>>: [inaudible] use [inaudible] try to prove this equivalence you used this technique you just
described like x again?
>> Ross Tate: So these all have loops and stuff and so if we want to do loops then we have to
use pegs and if there is anything with side effects you have to use pegs, so yeah, what I showed
you is a…
>>: [inaudible] it's all cool [inaudible].
>>: Yeah, so if you had load and source, you use the same techniques…
>> Ross Tate: Yeah, so you're building off equality saturation kind of approach. Underneath to
make this sort of, to make that technique I showed you for a realistic programs so we use pegs
and then equality saturation on top of everything. So here, the cool thing is with this approach
and in fact, and the fact that our technique actually can generalized to other optimizations as
well, what this does is it tells us that since we build our translation validation as a technology
for translation validation improves so will our ability to learn new optimizations from
programmers.
>>: [audio begins] distribution?
>> Ross Tate: It's things like putting the multiplication inside the loop and to get rid of--and
sometimes you can put a multiplication inside the loop so you can actually get rid of it by
distributing it through everything and other times it's better to factor it out and put it at the
end of the loop, so that's just moving operations into and out of the loop.
>>: So can you or how hard would it be to do something like loop interchange or [inaudible]
unrolling should work right?
>> Ross Tate: So by unrolling do you mean the one where you take the loop and then make it
sort of double copies of it or do you mean the one where you pull out one iteration of the loop?
>>: The latter one was called peeling [laughter].
>> Ross Tate: Okay. Some people use the, unrolling for the latter one too, so that is why I
asked.
>>: Other one.
>> Ross Tate: So the other one we actually haven't gotten that because basically our
representation actually as I was showing you, and I'll show you later on has iterations kind of
kind of tied into the semantics of it all so we have looked into ways to get rid of that. We
figured out that you can do sort of meta-operators that can allow loop unrolling, but we haven't
actually tried putting that into practice. Loop peeling is easy and…
>>: Something like the loop [inaudible] dependence testing to prove it correct would probably
go beyond the proof system at this point?
>> Ross Tate: So loop interchange depends on how, exactly how they are bundled. So if you
can figure out optimum, what we do, or if we do alias analyzing ahead of time in order to figure
out how to turn it into a peg better, and in that case these two loops will actually be completely
separate expressions and so loop interchange is extremely easy to do in that situation because
they actually are already separated. Another situation where it's much more difficult because
you have to figure out, you have to sort of figure out after the fact that they can be separatable
about and it's a little harder to do. Sometimes we can do it and sometimes we can't. And so
compared with my simulation relation, my simulation relation can do things like loop unrolling
better, but loop interchange they have a terrible time with. So there are some pros and cons to
the two different approaches and what's interesting is that the two different approaches seem
to have the same walls that I was talking about earlier with the translation validation. So again,
because this works with translation validation, if we use the by simulation relation translation
validator, then things like loop unrolling would be just fine, but here since we are using our
equality saturation approach things unrolling are difficult but loop interchange is better.
>>: So how does this work practically? I mean if you can do this why have all of this code in a
normal compiler hand written that [inaudible]?
>> Ross Tate: There is sort of a detail here which is that like I said there is the most general
optimization for which this works, but what does generality mean? In our situation we
formalize what generality means and it depends on the logic that you're working with for that
proof, and so generally for this process you need sort of a first-order logic to do all of this stuff
and so when you implement optimization by hand you can do things that require higher order
logics, so there are some optimizations that are better done by hand because they actually will
work in broader situations, but the reason you don't want to do that all the time is because
there are a lot of optimizations that are only going to be useful for certain domains and so you
really don't want to learn like someone in that domain and then they would say okay learn this
optimization and have it there. So I wouldn't say that you should go all the way this way and I
wouldn't say you should go all the way the other way, it really is a matter of balancing the two.
>>: You have performance results that you're going to show us?
>> Ross Tate: I have performance results for optimization later on. The issues that come with
the performance optimization part of it is that we, because we chose LLVM and byte code that
they are a little bit too high level for a lot of things so if you can find programs that have some
key bottlenecks and you actually put these optimizations in then you will get good performance
results. The issue is is there a bottleneck in the code that is actually something that you can
optimize. If there isn't which is oftentimes the case then you are not going to get good
performance results, so there is a big issue with evaluating optimization in general. So I can go
into more details about this, optimization evaluation problem that I found out when I started
doing this research, but later on I will actually show you some success we had with the ray
tracer where there was a big bottleneck and it had to do with these kinds of optimizations. So
now that means that we can extend the compiler optimizations by just giving example.
Another thing I looked into his making it so we can infer optimizations entirely automatically
given the properties of the language. The reason I found this to be important is that when you
make a new language, optimization tends to be a big hurdle in order to get that language
adopted. You might wonder, well, there aren't that many languages being made every year,
but in fact many companies like the old bitter game companies that I worked with actually had
their own in-house language that's maintained by a single person who would have really like to
have had an optimizer but doesn't have time to implement an optimizer, so this technology
would've benefited them and also domain specific languages are becoming more and more
common and so by incorporating my technology you could learn domain specific optimizations
for those domain specific languages. To see how this works I'm going to focus on one very
classic optimization known as loop induction variable strength reduction and what this does is it
takes his program here and it translates it into this program over here, and particularly gets rid
of that multiplication from inside the loop. The way it goes about doing that is it says okay
there is 4 times i is going to turn this into variable j. And to accommodate that change they're
going to change the increment by one, to an increment by 4 and they're going to change the
bound of 10 to a bound of 40. And the reason why you might want to think about doing this is
because they are useful for things such as array optimizations were typically this four times is
like the size of the array elements and so you want to get rid of that multiplication from inside
the loop. And so if you were to go through actually implementing this optimization for your
language then there is still one more subtle issue that you have to deal with it which is called
phase ordering. And the issue is that you can start off this program here and sure you can
apply loop induction variable strength reduction in order to get this ideal version here, but you
are also going to be writing a number of other optimizations such as the fact that four times will
be replaced with << by 2 and if you apply those optimizations first, well then they will block out
those loop induction variable strength reduction optimization in particular because they get rid
of this multiplication from inside the loop. And so this issue of optimizations sort of conflicting
with each other is what's known as phase ordering problem. Recognizing this let's consider
how much work is involved in implementing an optimization for your language. With
traditional techniques will you do is first identify all of the multiplications inside the loop and
then filter out all of the non-inductive cases, that is variables being modified in some way
besides incrementing. Then you decide which of the remaining cases you want to actually
optimize because if you do too many of them you overload your registers. Then as you add
new loop variables for each of the remaining cases and insert the appropriate increments for
each of the remaining cases and replace the appropriate multiplications with the appropriate
loop variables, and so once you've done all this, you have to integrate all of your optimizations
into your pipeline and address that phase ordering issue and then you have to make sure you
debug everything because there are a lot of little things that can go wrong here and again, we
don't want these compiler bugs because they are very painful. You have to do all of this
process basically for each optimization that you do and so this approach requires a lot of
repeated manual implementation. So I have figured out a way to apply equality saturation to
this problem. In particular, with my approach all you have to do to get loop induction variable
strength reduction is add just three basic axioms of your language. [inaudible]. From that
alone you will be able to learn the loop induction variable strength reduction automatically.
More generally it helps us if you also give us estimates of what your operators cost so that we
know how to prioritize them and then once you've done that we can automatically provide for
a large variety of optimizations all of which are guaranteed to be correct, because we will
actually be able to offer a proof of the transformed program is equivalent to the original
program. So you don't have to worry about this debugging problem here. So to see how our
approach works we start off with some control flow graph because, the standard procedure for
imperative programs and again this isn't very good for axiomatic reasoning, so we're going to
translate it to our own representation, the program expression graph that I talked about earlier.
Once we have this nice mathematical representation then we move on to equality saturation
and infer a bunch of equivalent ways to represent the same program by applying those
algebraic identities. Once we have all of these equivalent representations, then we can
incorporate what we call a global profitability heuristic which analyzes all of these equivalent
optimizations and picks out what is the optimal one according to the cost model that you gave
us. Once we have this optimal choice, well then we can bring it back to the control flow graph
in order to get the standard representation so we can move on to other stages of the compiler
like lowering down the assembly level. Now to see how this works in more detail let's consider
this loop induction variable strength reduction example I showed earlier. There is a new
challenge here which is that this comes in the form of a control flow graph, but again I said
control flow graphs aren't very good for this kind of axiomatic reasoning, and so we want to
figure out how to represent them as an expression. The issue is that this program has a loop in
it and this loop really represents an infinite number of values, so how do we represent an
infinite number of values as a finite expression? So to solve this problem, the idea I came up
with is to use expressions with loops in them themselves, essentially recursive expressions.
This 4 times i loop value here will represent this expression over here. In particular we have
this theta node that says that loop variable i starts off at zero and is incremented by one in each
iteration of the loop. And so once we have this nice loop, or a nice mathematical
representation of this loop, we can move on to the familiar equality saturation process that I
talked about earlier. We can apply an axiom that says that << by 2 is the same as multiplying by
4 and so we can add this equivalent representation here. And note that we are once again
adding information. We are not throwing away the original control flow graph, and so this is
very different from the prior approaches, in particular it is additive and this is how we deal with
that phase ordering problem because we are still free to explore in another direction.
Particularly we can apply another action that says that any operator distributes through our
theta nodes and this results in this multiplication of additions here which then allows us to
apply distributivity and that results in this four times theta node here, and there is already a
four times theta node over here and so we can reuse that same node in order to keep our
representation as compact. After we do that we can apply a zero [inaudible] in order to
simplify those expressions and what we have just built here is what we call an EPEG or an
equivalence program expression graph. What this does is it has all these equivalent classes of
values here and it says that here are a bunch of equivalent ways to represent these various
subcomponents of this program. Once we have all of these equivalence options, then we can
incorporate a global profitability heuristic to tell us which of these options is in fact the best
one, and so it will analyze this EPEG in order to figure out one representation for each
equivalence class that optimizes the cost analysis that you gave us. Once we have this we can
still divide down to this program expression graph and this gives us our optimizer result, and
lastly it translates us back into control flow graph, and so this corresponds to this graph here.
In particular, it says that there is a loop variable j that that starts off at zero and is incremented
by four in the iteration of the loop.
>>: I didn't see the 10 and 40 coming out for the [inaudible].
>> Ross Tate: Yeah, sorry I was just focused on this thing here that you start adding less than
the 40 [inaudible] starts going into much bigger picture.
>>: [inaudible] distribute [inaudible] a turning point saying that [inaudible].
>> Ross Tate: Yeah. We have an axiom that says…
>>: [inaudible] 10 would be smaller than 40 and then four times [inaudible].
>> Ross Tate: Yeah. This is where you also have to know that the upper bound is small enough
to make sure there is no overflow and stuff like that, so all of the axioms I've showed you so far
hold for modular arithmetic so they are not an issue, but things like inequality axioms, the
standard ones don't hold for modular arithmetic so you have to make sure that the bounds are
appropriate.
>>: So this representation just [inaudible] or is it [inaudible]?
>> Ross Tate: It's a more complete form of it, so it's actually one that so [inaudible] you can't
throw in CFG. It's not enough information. Values are different and actually will get conflated.
And [inaudible] you can say this is sort of is very similar to a conversion [inaudible] where all of
the loop in the c’s here are explicit. There are versions that don't have explicit loop in the c’s
but you can't throw away the CFG because they actually merge values that are not the same, so
this is sort of one that's been flushed out all the way to make it completely independent from
the control flow graph, and so we figured out a way to convert from this representation back to
the control graph and also gave [inaudible] semantics and showed that we have a
transformation and we have proven costs by moving from two representation actually
preserves semantics, so it's sort of a very thoroughly done as [inaudible], but along the same
lines.
>>: But you do have this challenge of [inaudible] loops in a way [inaudible] loops in the control
program, is that right or no? And another example you had optimizations [inaudible] earlier, if
it's in this form then how do you reason about those [inaudible]?
>> Ross Tate: So if you had another loop down here say, and so they are effectless loops, this
becomes easier because actually in our thing we built this would be one loop expression,
another loop expression. They won't even be in any sequence on top of each other and so
that's where something gets much easier. When there is an effect loop for [inaudible] forces
them to be decentralized and that gets things a little messier.
>>: Can you still reason about them figure out what kind of optimizations you want to do?
>> Ross Tate: We've had [inaudible] on that one so it depends on just how complicated the
story is. So if the story is simple enough it works out. If it's too complicated, then we at least
automatically we won't be able to do it. Actually we can do it by hand and that's another thing.
More questions?
>>: [inaudible] have you ever said anything about in-line? So is it just totally trivial or is there
some subtleties about it that [inaudible]?
>> Ross Tate: So I've been talking about axioms here, but really our engine allows arbitrary
what they call equality analysis to come in and they'll do fancier things. So one of them is an inliner and so it will say okay, here are things that have been approved for inlining because they
[inaudible] inlining here and so we align him and it's fairly standard how the language works,
just add the equivalence [inaudible] between the function call and the function call replaced
with all the expressions [inaudible] with all the expressions replaced. Some things we found so
it's very important to get rid of intermediate variables. We found things like trying to do
lambdas with--and the thing that's possible to do, but in practice because of the sort of
exponential kind of exploration, this additive approach, it just becomes very, very messy when
you actually try to use like lambdas and try to substitute inside the lambda, it doesn't work very
well. Lack of intermediate variables is actually a big thing towards getting this approach to
work. Yeah?
>>: [inaudible] curious you consider formalizations like skipping iteration [inaudible] zero or
backwards [inaudible] up [inaudible] are replacing [inaudible].
>> Ross Tate: The first one is easy. The first one we can do. Actually that one will happen
automatically. That's a loop peeling thing. The going backwards one, is not easy, so that when I
wouldn't, that one requires you to know, so basically we represent these are the sort of like this
data node is a, this is where the iteration thing comes in. This says iteration zero at zero,
iteration one it's [inaudible] iteration zero. So as far as semantics representation incorporates
iteration counts into it. So those little shifts are fine, but things like reversal, you can't even
reverse an infinite sequence, right, so you have to really have to know reversal with respect to
some maximal point and similarly start taking every other one you can do we have sort of an
even and odd things. That's how you get the loop peeling, or the loop unrolling thing, but we
haven't tried putting that into practice. Some loop things are a problem and other loop things
are difficult. I could spend those up first and…
>>: So this is a totally off-the-wall question, so you have [inaudible] graph…
>> Ross Tate: Yeah, yeah, yeah.
>>: Okay so [inaudible] how [inaudible] is it?
>> Ross Tate: So they are basically very similar strategies, this equality saturation thing. The
Denali is within a block and within that block having just a single block very much changes the
picture of things, so you can use very different techniques for doing this process and you can,
basically they work on like six instructions at a time, so the scale of things is completely
different, so we are doing whole Java methods at a time or whole C methods a time, and so
they don't have to worry about loops and stuff like that, so I really view them as this is more of
meant to be of generic purpose compiler or smart generic purpose compiler whereas Denali is
a, well here is a six, small chunk of six [inaudible] instructions that need to be optimized to all
hell and go through it as detail as possible.
>>: [inaudible] how much of the same [inaudible] in terms of possible.
>> Ross Tate: So they had finite [inaudible].
>>: Right.
>> Ross Tate: So they actually explored the entire [inaudible] case so that's why they can use a
set solver and it will actually tell you this is true or not true, whereas, we can't because our
state space is infinite, so that's why the breadth first search is important rather than the indepth first search because we can't explore the whole space.
>>: So this is a very structured [inaudible] so you are saying that [inaudible] does that work for
any [inaudible]?
>> Ross Tate: It's the, it's the, I forgotten the term. [laughter].
>>: Reducible.
>> Ross Tate: Yes, reducible, thank you [laughter]. Reducible CFGs we can do and there is a
way to translate irreducible CFGs with some duplication. Any reducible one we can handle and
in fact, something that took a lot of struggling, suffering was figuring how to make loops that
came from non-structured loops and still revert them back into non-structured loops, so if they
had breaks or continues actually restore them into a loop that still has breaks and continues
rather than have a bunch of duplication in another one of our branches, so that was messy but
it's been figured out.
>>: In terms of reduction [inaudible] do you have to figure out [inaudible] or do they just pulled
out [inaudible]?
>> Ross Tate: This thing just happened, I mean we didn't, like a lot of optimizations are like let's
just see what happens and so we threw them in and they worked just from the axioms. Some
of them didn't work and it was either because there was some, sometimes it's because there is
like something big like a reverse in the loop doesn't really work for this kind of representation,
and other times it's because were missing an axiom and so we just add that axiom when we get
it. So like an intern looping [inaudible] bound axiom actually requires something in the long run
[inaudible] something like you mentioned, at the end of this loop we know that j is actually 40,
right? This actually has, knowing that fact requires sort of a higher order axiom and so we
added the higher order axiom into the system and then we could do some fancier stuff that
way. So something I've learned in general, this has come up a few times, is that the
representation that you choose for optimization is a big factor as far as which optimizations are
easy to do and are not easy to do, and so I chose this one here but there are many other ones
that could be valid for different kinds of programs and different kinds of things and part of that
proof [inaudible] process actually, I made sure that the algorithm I came up with actually could
be generalized to other kinds of representations as well, so it's not just stuck with PEG; it can
actually work with other kinds of programs.
>>: So this is obviously fairly akin to program verification techniques it uses. Usually in
program verification when you have loops you summarize them via loop invariant and there are
a variety of techniques [inaudible] invariance. I was wondering whether you think you have all
the power you need or if you were to apply loop and invariant inference techniques and
incorporate that into your techniques would you get more?
>> Ross Tate: We found is we actually already have in our, actually in our anti-aliasing, there
are a few things that are best done before you start doing optimizations in order to basically
learn loop invariants and because, the reason is because it's hard, sometimes it gets possible
but it's harder to do loop invariants dynamically because you have all of these equivalent
representations and you are basically trying to do induction over equivalent representations
and so it doesn't really work very well we found out in practice. So we do loop invariants
before hand and then we use those ones and once we have a few beforehand, then we can
figure out how to augment them dynamically as well, but we still need, we found out we still
need some starting point to go from in order to get those. All good? We probably already have
seen this but looking back where it came from we start off with four times i that got changed to
j and so we have this increment by 1 and we change it to increment by 4 and there's a bound of
10 and change it to bound the 40 and this looks familiar, because this is in fact loop induction
variable strength reduction, and what's cool is that I didn't program this optimization explicitly.
It just sort of happened and we call this emergent optimization and we've found that there are
many optimizations that will emerge just from these basic axioms automatically. So this is how
we were able to get sort of this language optimizer automatically for optimizations. Now I've
been talking about language optimizations here, but we also found out that we can also apply
this to libraries. The reason why this came to me as something important is that I'm really bad
and back in my undergraduate days I had to write a ray tracer, and I had this choice between
mutable vectors and immutable vectors. Now mutable vectors meant that I had to write a big
sequence of plans like this in order to implement this very basic expression with immutable
vectors. Furthermore, mutable vectors if I chose them meant my code would be error-prone
because I would have to track things like ownership, make sure that the wrong person doesn't
modify the wrong vector at the wrong time and with immutable vectors, they don't have those
kinds of problems. In light of that you might have expected me to choose immutable vectors,
however, I was worried that immutable vectors would be inefficient. In particular, even this
basic expression here allocates a number of intermediate objects and those objects are just
made and thrown away almost immediately. Because of this I decided to go with mutable
vectors in order to get better performance. I remember the decision years later when I started
working with optimization, and so what I did was to see whether I could apply my optimization
techniques to this library design, and so I went back to that ray tracer and re-implemented to
use immutable vectors like I would've liked to have implemented and I did find out that it
actually ran 7% slower, so I was justified in my performance concerns. What I want to do is
apply my techniques in order to replace these very manually intensive library modules and get
these sorts of nicer ones and still have the same performance guarantees as with the manual
ones. So the idea I had was using these techniques where I could enable library use
optimizations. In particular, this idea that if we could express the various guarantees about
using your library as axioms. So for example, if I had a vector library that has these two, adds
these two vectors together and gets this first component, well that first component is going to
be the same thing is getting the two vectors first components and then sending them together.
And so once we incorporate all these axioms into this quality saturation process, then we will
automatically infer optimizations for using my library, specialized for using my library. And so I
applied this to that ray tracer and was actually able to get immutable vectors to run faster than
the mutable vectors because they were more axiom friendly. In particular, the algorithms we
learned were able to reduce the number of allocations by 40% and get rid of all those
intermediate objects. So by using this psychology we can make it to where you can design your
libraries the way you would like to and still get the performance issues by taking advantage of
these axioms.
>>: This fellow [inaudible] axiom in his writing DSL's for compiling high-level specifications of
things like [inaudible] transport down to [inaudible] ships and telling libraries, so I take it,
rewriting is strictly more powerful, right?
>> Ross Tate: This is in the same language, so we take Java code and we rewrote it to Java code
and so this is the performance on top without even having to worry about lowering down. All
these metrics I am showing you are on top of the JVM’s optimization as well, so this is
optimization that JVM couldn't do and the JVM Optimizer is actually fairly advanced we found
out.
>>: Sure but I'm just thinking of it, I'm thinking in terms of comparison to a rewriting system, so
what is your, what is the expressiveness of the things that you [inaudible]?
>> Ross Tate: Well, so we can rewrite, this is essentially, you can think of this as a rewriting
system but with this, that profitability heuristic that allows you to explore many re-writings
simultaneously, whereas with the typical rewriting system you have to worry about only like, it
basically has the phase orienting problem. If you rewrite this in the wrong order you get
problems and so we sort of addressed that issue by using the quality saturation, keeping this
additive approach and also we figured how to extend it through loops and stuff. Sound good?
All right.
>>: Sorry, the whole idea here is reading [inaudible] stuff, abstract data types and certain doing
higher-level transforms, it seems quite related to…
>> Ross Tate: I'm not familiar with the word so it could be…
>>: Right. Essentially the idea is that if you have sort of, if you have some knowledge of higherlevel [inaudible] in your data structures you can do communitivity and things like these you
know optimizations are better [inaudible] heuristics [inaudible].
>> Ross Tate: In general, the cool thing is that we've figured out, or the experiment showed
that that actually makes a difference. We can do stuff automatically. All we have to know is
those axioms as you're saying, and like my intent with all of this was to make it to where people
could program differently, not just the existing programs and run them and make them run
faster, but actually get it so people could actually write their libraries in a way that's nicer so
this at least substantiates that that would work.
>>: You're trusting essentially the pragmas or whatever the inability…
>>: We’re trusting the axioms are correct.
>>: That's right, but you're inferring them from the type signatures?
>> Ross Tate: No. Like I've written the library, so as I go through it said okay, here are some
axioms that I imagine would be useful and so then I okay them for optimizations. The library
writer would provide the axioms. We can't infer them automatically. In fact, actually ideally
the ideals…
>>: Actually it wasn't so much about the axioms; it was really about the purity, like knowing
that the function was pure.
>> Ross Tate: And again that would be an axiom, the reason why I like the axiom thing is that it
means you don't have to have the library code and so practically with 00 systems where you are
dealing with interfaces…
>>: Okay. But you are not addressing the problem of checking the [inaudible] of the axiom.
>> Ross Tate: No. That is a whole other story. Yeah?
>>: Does the library writer also have to write class functions?
>> Ross Tate: We didn't have them do that, no. We had a very, well our [inaudible] were very
naïve but we found out that naïve [inaudible] models actually work pretty well. It would be
better, actually the one thing where it would be better though in practice is knowing that some
methods are more sensitive than other ones, so right now it just reads calls uniformly and it
would really be better if it would say like no, no, calling that function is nothing like calling this
function. Please do more of these ones unless of these ones, but we didn't actually do that in
our system. We, the techniques we used cannot accommodate that, but for the amount of
optimization that we were trying to go for, that wasn't something that we would go for. All
right. So we can talk about performance but everyone wants to talk about the performance of
my own tool, right? Sorry?
>>: [inaudible] aliasing, the last approach where you [inaudible] axiom [inaudible] new object
[inaudible] aliases are there, you may want to [inaudible], I don't know.
>> Ross Tate: Oh, so…
>>: So do you syntactically [inaudible] objects or do you…
>> Ross Tate: We are not rewriting objects or anything. We are just rewriting code, so here, I
mean here you would still have, u and b would still stay around, there would just be this thing.
So you don't have to worry about the aliasing or anything like that for the optimization that we
are dealing with here. The only things, the only things we are aliasing that would useful, would
be knowing if that state operation is going to commute and basically all of the time the alias is
pretty useful in knowing when that state operation is going to commute. So aliasing isn't really
too big a deal for these type of library axioms. Going onto performance, this is where that tool
chain I showed you in the various stages of the compiler, and so to evaluate how effective this
tool chain was, how efficient it was, we ran this on a SpecJVM 2006 Benchmark Suite and we
found out that the quality saturation actually runs quite quickly. The slow part is actually the
global profitability heuristic, and to figure out why we did some investigation and we found out
that the equality saturation was doing such a good job that even though we stopped it off early,
by the time we stopped it off we could find a trillion, trillion, trillion ways to represent a single
method on average. [laughter]. So you can imagine it takes a while to figure out the best
option out of a trillion, trillion, trillion options and so that's why the global profitability heuristic
takes a while. In light of the fact, even though there is good reason that this takes a while we
still wanted to figure out a way to get over this hurdle in order to get this technology adopted.
And so we had the idea of combining this technology with the previous technology I showed
you earlier in order to speed up optimization in general. So this technique I'm going to show
you works with really any event optimizer and not just our own, and there are a lot of these
things out there such as Denali, and these things are smart, but as a consequence they are slow
and some of them are really slow and you have a lot of code that you want to compile but you
don't want to wait around forever, so the other things out there are things like rewriters which
are quick and efficient, but also rather naïve, and so what we would like to have is sort of the
ability to combine the intelligence of these advanced optimizers with the speed of these
efficient rewriters. So the idea I had for doing this was I could take off just a piece of your code
base and ship it off to the advanced optimizer in order to get a very well optimized version of
that code base. We could then ship that off through the optimization generalizer in order to
learn optimizations from the advanced optimizer and then tack on what we call a decomposer
in order to break this up proof of true equivalence up into lots of up into a bunch of little
lemmas. Then from each of these lemmas we learned a bunch of mini optimizations specialized
to your code base which we could then incorporate into our efficient rewriter and pass the rest
of your code base through this rewriter using these lessons that we learned from this advanced
optimizer. To see whether this was effective or not, we ran this on that ray tracer I talked
about earlier, and we found out that the rewriter was actually able to get, produce the same
high quality code as the advanced optimizer just using the stuff we learned from that part of
the code base, and furthermore it was able to do so 18 times faster than the advanced
optimizer, so significantly addressed that performance problem I talked about earlier. To recap,
this made it to where we could refer optimizations automatically and efficiently and overall
made it so we can actually use optimizers, take advantage of them even though they are
occasionally broken, means that we can extend optimizers in order to address our sort of more
domain specific needs and made it so the optimizers are available for new languages and also
for library writers in designing their libraries. And so by continuing this line of research, what
I'm hoping to do is make it so that discussions like this are no longer necessary. We can focus
on the more important things like correctness, and something else I wanted to talk about is that
this is only one line of my research. Another line of research that I've done here mostly is typed
systems. In particular, thanks to Juan and Chris I learned all about existential types for dealing
with typed assembly languages for C sharp and what Chris is working is this operating system
[inaudible] that's guaranteed to be memory safe and the big issue is that they want to be able
to take C sharp code either for things like a scheduler or for user programs and run them in
their operating system, but C sharp code is memory safe but compilers are broken and so you
want to infer types at the assembly label to make sure that even assembly code is still memory
safe and so that's what I did here. This got me introduced to existential types and made this big
category theoretic framework and this turned out to be very useful for dealing with Java. My
students found this out for me, that Java has all sorts of problems with wildcards. They had this
piece of code that they are writing for my class project and they are very frustrated because
this code wasn't compiling and they had no idea why. I looked at the code and I found out that
the code is actually correct; the type checker was broken. Did some further investigation and
found out that the wildcards in particular, Java just does not do a good job with, so I applied
this existential framework because wildcards are actually existential type. In order to prove
algorithms I was using for type checking and also figure out how to refine the type system a bit
and showed that it's practical to do these refinements to make sure that Java state is actually
decidable. While type argument inference we can't solve, but at least the rest of it, subtyping
and basic things we actually--subtyping wasn't known to be decidable before, so now it is
decidable.
>>: What happened there? I thought [inaudible] was on top of all that stuff…
>> Ross Tate: Wildcards are subtle.
>>: [inaudible] after, where did the wildcards come in?
>> Ross Tate: That was the Java 5, the generics wildcards all working together.
>>: [inaudible] the same time?
>> Ross Tate: Yeah.
>>: No, no. Generic [inaudible] wildcards.
>> Ross Tate: [audio begins] for Java I'm saying, they came at the same time.
>>: The generic Java prototype.
>> Ross Tate: Yeah, yeah.
>>: It had the wildcards [inaudible].
>>: Right. [multiple speakers]. [inaudible]. [laughter].
>>: Yeah, that's what I thought. Okay good.
>> Ross Tate: There is a good reason for wildcards and what they address. There is also some
messiness, so like with the Ceylon stuff, so I guess I will go on to Ceylon.
>>: Okay.
>> Ross Tate: Ceylon, there is this Ceylon at Red Hat making a new language, in particular there
are people who worked on the hibernate project for people who are familiar with that. They've
had tons of years of experience dealing with Java code and they become very frustrated with
some of Java's problems like wildcards, and so what they wanted to do is sort of cleanup Java in
general OO enterprise programming and so we're trying to learn old lessons from these
languages, so they are aiming for things like decidability and so that's how my work got
involved. We figured out that wildcards have some good aspects and had some weak aspects
so we are incorporating both of them, or trying to throw away the weak aspects while
incorporating the strong aspects, stuff like that. I have been working with them on making sure
the type systems are nice and clean, and has all of these properties that they told me. They
told me things like principal types. They want decidable types and decidable inference where
they allow it, they want, stuff like that, so I have been helping them design their language and
make everything look nice and pretty and meet their guidelines they have given me. So this is
sort of one line, another line of research that I'm doing and sort of the longer-term track, very,
very far down the line, some problems that I have seen is that if you have some tool for C sharp,
well you don't really know where that tool transfers over Java. Similarly if you have some cool
proof for Java you don't really know whether that proof changes over to C sharp. This is
frustrating because these languages, while they have some key differences, they also have a lot
of similarities and so you would expect that a lot of tools and proofs could be transferred across
these languages, but we don't really have a way of formalizing that. What I want to do is make
sort of a meta-language for programming languages in which we can formalize requirements of
proofs and tools and formalize copies of languages in order to make it so one tool can transfer
across many languages simultaneously, and similarly when you make a new language, you know
what kind of properties you want to aim for so that you can have access to these existing tools
and proofs that were made for all of these other languages already out there. And I've done
some work on this with effects here and also looking into foundations in cut stations and
mathematics and looking into resource constrained computations, so just sort of scoping the
landscape right now, but I'd be happy to talk about, to any of you about that today or
tomorrow, but at this point I want to thank all of the people that I have had a chance to work
with at UCSD and here. I guess thank all of you for the Fellowship that let me do all of this
research and then open up to any questions that you may have. [applause]. I talk too fast.
>>: [inaudible] translation validation, why didn't you use [inaudible] verification on those?
>> Ross Tate: What you mean by that? Sorry.
>>: Translate your, seem to me like you are building your own E graphs and everything so
instead let's say translate [inaudible] all of the transformations and formulas you could
[inaudible] graphs from [inaudible].
>> Ross Tate: Oh, okay. We tried that. It didn't go very well. So this is also years ago when
these ecologies were younger, so it may actually work now. The reason why we think it didn't
work well was in particular because we had recursive expressions and the algorithms that,
things for the E graphs solvers stuff in Z3 and in Simplify and stuff like that don't seem to really
like recursive expressions. They just run forever, go down a rabbit hole and you never hear
back from them.
>>: Do you think it's because you say something [inaudible] modifier somewhere [inaudible]
extension [inaudible]?
>> Ross Tate: It would do so that we had that operator, that operation that distributes to the
theta node that can go on forever and ever and ever. We think that if we didn't have programs
that didn't have, that operate with that specific axiom or didn't have any of the loops in, then it
would be fine.
>>: [inaudible] have to be careful.
>> Ross Tate: So it didn't like those kinds of things, and those were specifically the kinds of
things that we wanted to have work.
>>: You wrote your own qualifier extension?
>> Ross Tate: Yeah. [inaudible] undergrad. So yeah, that's how we got that to work.
>>: I have one more question. You know, those are like amazing results with the optimizer and
stuff and you actually ran them in like we'll Java byte codes, could be a realistic tool, but do you
feel like [inaudible] you could really incorporate it in the Java compiler, for example, is it
practical enough [inaudible].
>> Ross Tate: I wouldn't say it is ready for J compiling.
>>: No. I was just saying can I build, would I be able to build a compiler with no optimizer and
just give it axiom and I integrate your framework…
>> Ross Tate: There are certain optimizations that I wouldn't put in that class, particularly if this
is rather high-level, so things like register allocation, all of those low-level things, don't really,
like once you get this level of abstraction, it's, you can't see the differences between those, so I
wouldn't try doing those kinds of things in this system. Also as you saw there are performance
problems. Actually, they are probably better now because it's years later and the solvers that
we used have probably gotten better. Actually, what I think would be better is rather a sort of
a compromise of systems. Take a traditional compiler that has many of the core things on
there, add the extensibility aspect to them and then make it so in particular, the strategy I
showed you with the learning optimizations from the super optimizer, I think what would work
best is running the super optimizer on your code base once a month, learning all these axioms
once a month, applying those for the rest of the month and that way you would have the
various, I mean you've seen all of these things and you have a much smaller set of axioms that
run much more efficiently, and then another month later after your code changes enough,
learn again and then we go again. I think that's a better approach towards going about this
because it would accommodate those strengths and weaknesses of the system.
>>: So I see that you have something with LLVM up there, and I was wondering if you could
compare a little bit with the work on [inaudible] which formalizes the LLVM in [inaudible]
representation [inaudible] and so on, what kind of semantics do you use with the LLVM?
>> Ross Tate: We didn't use the formal semantics for the LLVMs. Actually so Mike Stepp is the
one that took care of all of the LLVM specific stuff, so I did the general-purpose kind of stuff. So
he dealt with like byte codes and bit code, byte codes and bit codes and then he would talk to
me when he ran into some problems like okay, I can't figure out how to represent this, how do I
represent this, and then we would talk about better ways to represent those kinds of concepts
and stuff, but overall he was the more detail oriented part of that part of the project.
Unfortunately I can't go into the details of the LLVM.
>>: Do you have a transfer of the LLVM into your representation [inaudible] LLVM semantics
[inaudible]?
>> Ross Tate: Yeah, and then we recovered back to LLVM.
>>: [inaudible].
>> Ross Tate: Yeah. Our translation from LLVM to our representation was very, very basic. It
was like here is the LLVM expression maps to this; it was very simple ones. So it was more the
axioms were really the things which incorporate the semantics of the LLVM, and they were
mostly things like integer axioms and stuff like that. There was a bit of a mess with the fact that
there are many different sizes of integers and all of the operations in LLVM are typed and so
getting all of those to interact well with each other could have been messy, but we figured out
how to deal with them. Yeah?
>>: Is there a class of loop optimizations that are optimizations held to cache [inaudible]? Have
you thought about any thoughts on expressing that sort of thing as a cost function?
>> Ross Tate: That cache locality and the parallelism comes up a lot when I talk about this stuff
and basically they have two common issues. The two things have the same issue behind them
which is generally interprocedural stuff. Basically for cache locality, you typically have to know
how memory is laid out and that's typically not written, not available in the function that you
are working on, so it's hard to make a cost model that doesn't even know how the memory is
laid out and stuff. At least that has been our, when we have done some investigations into this
this is a problem that we always ran into. You didn't know how the memory was laid out, at
least not automatically. If another tool came along and said here is some information, then we
might try incorporating that, but that was one where we didn't have any luck because we didn't
have such a tool.
>> Daan Leijen: Okay. Let's thank the speaker. [applause].
Download