>> Daan Leijen: It's my pleasure to introduce Ross Tate who is interviewing today. Ross is a student of Soaring Learner at the University of California San Diego, where they do have nice weather, and he is well known here at Microsoft already because he did two internships here and he won the Microsoft Fellowship one year. He did a lot of work on category theory of effects in compiler optimizations, and he worked with Red Hat on a new language called Ceylon and today he's going to talk to us about usability through optimization. >> Ross Tate: So as Daan said, I am from UCSD and I research programming languages. One thing I've been working for is making programming languages more usable, and how I've been going about doing that is improving the technology for program optimization. Before I really get into all of that, I would like to give my overall perspective of programming languages. So the way I see it, people have this amazing ability for intuition and creativity and they've applied those abilities in order to build computers which complement them with the ability to process large amounts of data and calculate them with high precision, and since their invention they have become ubiquitous now in our society. Yet these computers are aren't really useful to us if we can get them to do what we want them to do, and so I view programming languages as the sort of means of communications between these two worlds that enables people to enable computers to enable society. Not having this role, that would mean that programming languages suffer from all of the problems that people have, all the problems that computers have and all the problems that communication has and so there are a lot of ways we can work towards improving programming languages. One of these problems is that programming languages always have to balance between human usability and computational efficiency. This is where my work comes in. I'm working on improving technology for program optimizations and that way we can take the emphasis off computational efficiency and focus more on human usability. And the reason I feel that this is important is that in my experience I have seen that a lot of people always keep efficiency on their mind as they program. I had a recent example of this pop up at UCSD in our grad student lounge were some student had written this program here and the details of this program aren't very important, what is important is that another random person came along about a week later and said oh, by the way, your program sucks. It could run faster by taking this out of the loop. And then another week later another random person came along and said well, this is a case where the optimizer sucks, a typical optimizer will take care of this for you. And then another person says yes, let's rely on optimization technology, but then another person came along and said well, how do you know this string is immutable. If the string is being changed then this value strlen can be changed as well. But then another person came along another week later and said actually it doesn't matter, because presumably because you can look inside this loop and inside strlen and see that neither of them modify the string and so it's effectively immutable through this block of code and so the value won't change. And so there's this month-long discussion among a bunch of random grad students about this… >>: How did this not run in parallel? >> Ross Tate: Yes, yes [laughter]. I usually say that, but I forgot. >>: [inaudible] numbers and things. >> Ross Tate: Yeah, yeah [laughter]. Way more detailed than we were thinking at this point [laughter], so the interesting thing is that there is this month-long discussion among these random grad students about random program and this whole month not a single person noticed it was off by one error here, and so this bugs me because it means that we are focusing on very detailed things like efficiency and whether it's being compiled optimized or not compiled when we don't even have a correct program yet. When this actually happened I took a picture and you can see I removed some of the inappropriate comments [laughter] and the goal of my research was to make it so that you can write the code the way you'd like to and not worry about performance efficiency; let the compiler take care of that for you and that way you can focus on the more important things like correctness. >>: So you are also assuming that i is automatically initialized for you? >> Ross Tate: Yes, I think it's going to be somewhere up there and there are many things wrong with this code, again someone like, someone just TA-ing. >>: It probably wasn't an undergraduate [laughter]. >>: [inaudible]. >> Ross Tate: There was already a lot of stuff on program optimization, right? And the issue though is that a lot of this technology out there relies on a lot of repeated manual implementation effort; that makes it sort of a black box that the typical programmer can't do anything with. It's just something that works by magic and they can't affect it. So what I've been doing is going through these technologies and replacing them with these more reusable axiomatic systems that are more automated and have a more open interface that programmers can interact with them. So some areas I looked into so far are things like translation validation which makes sure that the optimizer does what it's supposed to do and doesn't actually change the semantics of the program, and looking into things like extensible compilers making it so that we can easily add new optimizations to a compiler and even inferring optimizations entirely automatically, so that optimization will be available for newling [phonetic] which is as they're being designed. These three applications here are all kind of unified by this one technology I came up with that is called equality saturation, and this is sort of an axiomatic way to reason about programs. Now this is the overall layout of my talk so feel free to ask questions at any point. Right now I'm going to start off with translation validation and I'm going to do so by asking you a question. How many of you have had the compiler actually inject a bug into your code, that is you went through all this effort to make your code nice and angelic, completely bug free only to hand it off to the optimizer which through some fault turned it into demonic code that does something that you didn't tell it to do? >>: I wrote that demonic optimizer [laughter]. >> Ross Tate: For those of you who have been so fortunate as to avoid this situation, not cause the situation, let me enlighten you as to how this goes. If you run into one of these bugs it is absolutely a nightmare because you can look at your code for hours and say it should be doing this. It should be doing this. I don't get why it's not doing this. The fact is it should be doing that because your code is fine; it's the compiler that's wrong and you just don't realize it. Furthermore, whenever you try to observe the bugs, say by inserting print lines or running a debugger, it shuts off the optimizer so all of a sudden the bug goes away so it's like there's quantum physics happening inside your code which like quantum physics can be very confusing for the program. Furthermore, once you've finally figured out that the compiler is at fault, and then you have to figure out how to rewrite your code in some weird way to make it so that it stops introducing this bug anymore. All your coworkers get confused as to why your code is so ugly. As you can see this is frustrating for a programmer but it's not only frustrating for programmers; it's also frustrating for companies, and many companies have a policy of not using these optimizers because they can't afford these kinds of bugs. They do so at rather high cost. After all this means that they pay all their programmers to do these optimizations by hand and hand optimized code can be more difficult to maintain over time and so these costs accumulate over time as well. So these companies would really like to be able use one of these optimizers even though it's a little iffy because typically they do work. So how can we go about doing that? To get sort of a single run of the optimizer we have this original program and the optimized program. We can incorporate a technology called translation validation and what this does is it looks at these two pieces of code here and tries to figure out whether they are equivalent. If it succeeds, that means you can use the optimizer code safely because you know it's the same as the original code; you haven't introduced any bugs. And if you can't figure this out then you just default back to the original code just to be safe. Now there are many ways to build translation validators. The most common one is to use by simulation relations to sort of try to figure out how these two programs walk together step-by-step. But these things have some difficulty with bigger rearrangements of code and so we have looked into another way of doing translation validation using a quality saturation. So to illustrate my technique, I'm going to start off with a very basic kind to program, just simple expressions here and we'll elaborate on some more complex programs as we go. We consider, okay this i x 2 + j x 2. We will hand it off to the optimizer and it turns into i + j << by 1. And want to figure out whether or not these two programs are equivalent. The idea I had was let's take these programs and turn them into nice mathematical expressions and once we're in this nice mathematical world we can start applying nice mathematical axioms to reason about them. We would know for example that << anything by 1 is the same thing as multiplying by 2, so that tells us that the optimized program is equivalent to this intermediate program here. So really we know that multiplication distributes to addition and so it tells us that the intermediate program is equivalent to the original program. So just by using these very basic language axioms, we can figure out that these two programs are actually equivalent to each other. Now this works very well for these nice clean mathematical expressions in programs, but we wanted to make this approach work for more realistic programming languages like C and Java. We had to figure out how to accommodate challenges like loops which I'll go into later on and effects. The issue is that typical imperative languages have things like statements, which can not only do mathematical expressions but also can read from the heap and modify the heap, so we want to figure out how to represent these statements as mathematical expressions. So to do this I came up with this concept of effect witness and what it does is it takes the d reference of r here and it represents it as an expression load from location r from the current state of the heap Sigma and so all uses of heap are explicit in our representation. Similarly when we modify the contents of r here well then we map this to a store which not only takes the value of store and the location store too, but also it takes the state of the heap it is modifying and then returns the new state of the heap after the modification is done, and so all modifications of the heap are also explicit in our representation. Now if we consider this program in more detail what we are doing is we are taking the contents of some location and then putting those contents back into the same location, and so assuming a reasonable memory model this program really doesn't do anything whatsoever. And so we would like to do a reason about that with our mathematical expressions, and the way we do so is we say whenever you have an expression that looks like this one, well then this store is actually going to be equivalent to the original incoming heap Sigma. With this kind of reasoning we not only can reason about the values of programs, we can reason about the effects of programs. So with this approach I designed a translation validator that works for C and we implemented so far for C and Java, and it takes these two programs, the original optimized program and these programs come in the form of a control flow graph, the standard representation of comparative programs, but this is not very good for algebraic reasoning as we found out, and so what we did is we came up with a new representation, one that I call a program expression graph and you converge to that. And the reason we call it a program expression graph is that it represents the entire program, or really the entire method as a single expression that forms a graph rather than the tree because it has some recursive loops in it. And so once we have this nice mathematical representation then we can move on to equality saturation, and what this does is apply these algebraic axioms in order to infer equivalences and hopefully you can figure out that these two programs are equivalent. Now to get a more detailed picture of this let's consider this program here. Don't worry about the details of this program; it is just contrived to illustrate how my technology works. Let's suppose we hand this off to an optimizer and it spits out this optimized program here. We can all look at this and say well, what it did is it took these two different references of p here and stored them into a temporary local variable that it used twice. Similarly we had these two multiplications by b here and we signify that through a single multiplication by b. So the translation validator has to do is figure that out entirely automatically, and the way ours goes about doing that is first taking this original version of f and converting it through a program expression graph, our own representation. To do that we take this d reference of p here and translate that to a load that takes location p and the incoming state of the heap Sigma. And then this call to strchr here gets translated to this call where these two parameters s and the result of that load operating on the current state of the heap Sigma and has these two outputs and what these outputs correspond to his this pi v node is the return value of the function call. That is the value being exploited into this x here, where this pi Sigma is the resulting state of the heap after the function call completes. In particular, when we have this next reference of p here, well then we use that new state of the heap rather than the original state of the heap and so we distinguish these two loads in our presentation. Then we go on to this additional multiplications here which we store into the heap in order to get a new state of the heap for the function. And lastly, when we return x, we mark that this pi v is the returned value of the function where it's at store as the resulting state of the heap after the function call completes. Once we're done with the original program, then we move on to the optimized program and do the same process which I will skip over, but we're going to reuse nodes as much as possible in our [inaudible] and speed up the process. At this point once we have this nice mathematical representation, we make sure it's complete so that we can actually throw away the original control flow graphs and just work on this mathematical system. So we can start applying axioms, so one is the fact that multiplying by two is the same thing as << by 1 and so knowing that we can add this equivalence here. Note that we are adding an equivalence; we are not throwing away the original program. This is how we differ from a lot of other techniques which are more destructive and this is how we make our system able to adapt to a variety of compilers and a variety of optimizations. Particularly in this situation this << by 1 isn't actually useful for this translation value, to validate this translation, rather we want to use additional multiplications and applied distributivity. Once we are done with the math up there we can move down to the function call and we can incorporate some knowledge that we know this LLVM, this optimization framework that we targeted is using and in particular, LLVM knows that this call to strchr doesn't modify the heap and so the state of the heap after the function call is going to be the same as the state of the heap before the function call, and so we can add this equivalence to the representation. Once we've done that, well we have these two loads from the same location and now we know that they are operating on the same state of the heap and so we know that the result of the loads is also going to be the same and consequently the results of these additions are going to be the same and these multiplications are going to be the same and transivis [phonetic] tells us this addition is equivalent to that multiplication. So this leads us to these two stores and we can see that they are operating on the same heap at the same location and now we know they are storing the same value. So they are going to result in the same state of the heap after the storing as well. What we have just proven is that the original f and optimized f have the same overall affect. And so all we have left to prove is that they have, that they return the same explicit value. This is really easy in this case because they actually start off with identical values in our representation. And the reason why that happens is that our representation being mathematical gets rid of all of the intermediate variables and so [inaudible] differences between control flow graphs sort of go away and in fact many optimizations are validated just by translating our representation without even having to apply these kinds of equivalences. >>: Where does the [inaudible] analysis come from? >> Ross Tate: So alias analysis we use to know if we have a load in store that the resulting effectiveness or can commute with each other, so if you have two stores and they can commute with each other, so if you have two stores then they can commute past each other or if you have a load in store and they are in the same location, then we just use the same one. If we can prove they are different locations than we just reason that the load is going to be the same before the store as after the store. >>: But if there's been [inaudible] how does that work [inaudible]? >> Ross Tate: So you're talking about making that like the alias analysis out? >>: Well right. I mean there's the alias [inaudible] on each program but there's also [inaudible]. >> Ross Tate: Oh. I see >>: [inaudible] the when you are removing [inaudible]. >> Ross Tate: So the fact that there are two programs doesn't really create too big of a problem for this analysising, because there are still values within the program so it's usually within the programs that you have operator commutes and then once you figure that out then you'll figure out that these two operator values are the same across the programs, so the then will say that it tends to be within two programs or within the program itself and so it's not a big problem [inaudible]. >>: [inaudible] program point specific information that you used to sort of bind your conversion to the [inaudible]. >> Ross Tate: So we use, what I am showing here is actually very simplified. Like we have things like ordering analysis and things like that so knowing one integer is bigger than the other and stuff and we have alias analyses as well. These can all work on top of this kind of basic system. Usually yeah, yeah, so alias analysis because we are doing this entirely automatically, we often don't have alias information, so sometimes, actually this is a problem that LLVM was at that even though we told it not to, it would do an interprocedural analysis to figure out alias information and then do optimizations that weren't valid from what we could know. But if you had a set up, like if you gave it to us, if you actually gave us information, then we could start applying it. >>: I see. So you have a way to incorporate [inaudible]? So you are trying to prove [inaudible] constrained what if there are preconditions that enable the optimizations [inaudible]? >> Ross Tate: So this is basically the same thing as the alias [inaudible]. There is some precondition, some fact finder input that we don't know about, and so if you give us information then, and we have the logical reasoning about it like we do for alias analyses and a few other things, then we can prove stuff about it, but we can't like we can't infer the context. We are entirely intra-procedural system. >>: So basically what you're saying is give me any kind of proofs of the program and if I can use it to add more equivalence [inaudible]… >> Ross Tate: They don't even have to be equivalences. You can put arbitrary logics on top of this, well not arbitrary, but many logics on top of it. >>: You can only use that if you can improve your graphs somehow like here it's strchr you improved it because you added on this [inaudible] equivalences said it doesn't modify the heap and with an alias you would do the same, what other things would you do besides adding… >> Ross Tate: Oh, you mean with aliases, the alias analysis what you do is you actually tell us these things are distinct and then we will infer equivalences from that so we will infer the equivalences. You don't have to give us the equivalences. >>: Oh, okay. >> Ross Tate: Same as with inequalities and stuff like that. Sometimes we will take advantage of inequalities and use those to infer things or particularly bounds [inaudible] that deal with overflows and stuff like that. So going back, we proved that the return values are identical and the overall effect is identical and so we basically validated this optimization, right? And we are able to validate a wide variety of optimizations so to test this out we ran it on the LLVM on SPEC 2006 C Benchmark suite and what this last problem indicates is that if LLVM made some sort of change to the program, then three out of four times we were able to validate that change. What's cool is that another team of researchers actually came along and implemented their own translation validator, specialized towards LLVM, even specialized towards this exact configuration of LLVM, and while this did get them significant performance improvement, they actually were only able to match our validation rate. That is they weren't able to improve their success rate by the specialization process, and so this is important. Do you have a question? So this is important to recognize because if you consider these two programs are just side-by-side, specialization requires a lot of knowledge. You have to know how the language works, but you also have to know all of the optimizations that the compilers could be applying and you have to even know what order those optimizations are going to be in, and so using the specialization approach requires a lot of repeated manual optimization because this means that if you make a different translation value not only for each compiler that you are validating, but also for each configuration of the compiler that you are validating, whereas with equality saturation all we had to know is where the basic axioms about this language and we are able to take care of the rest entirely on our own. Why this is particularly important is equality saturation can even validate optimizations that it's never seen before and this will be important for this next application that I'll be going into. >>: I have a question on the previous slide, or each one of those percentages is that referring to a series of optimizations all done together and then validated or individual optimizations that are validated? >> Ross Tate: This is method in, method out, you know, how many methods we were able to do, so there could have been many optimizations and usually when it gets messed up is when too many optimizations happen and some important intermediate state gets lost. That sort of tends to cause the whole app that goes on. So if it's, if it's a train that doesn't sort of erase some path information then it tends to be fine, but if it erases path information we and basically all the other participants that we know about get stuck at some point. >>: How did you select which optimizations were validated? >> Ross Tate: We turned on everything [inaudible] so as I said, some of these are because we actually used inter procedural information as well, and so that's just something that we can't do anything about without somehow changing LLVM, but, so this is run inter-procedurally and then we validated what we could. >>: Question. [inaudible] translation validation [inaudible] the way it is [inaudible] very simple once you [inaudible] but in your case you are extracting this [inaudible] producing [inaudible] so is there a danger of sort of getting into a loop [inaudible] will you keep adding most of the graph? >> Ross Tate: The right thing doesn't necessarily terminate always. In our practice if there are no loops in the thing then we will terminate, but if there are other loops than one of our basic axioms which I will be getting into later on will guarantee that it won't terminate, so we found a depth--yeah, because there are just an infinite number of options at that point, and so what we found is that a breath research through expressions works better than a depth research because it so prevents you from going down a rat hole. It makes sure that you fire options broadly rather than okay, go through this loop, okay, keep going through the loop and we don't get stuck in that kind of trap. >>: [inaudible] you add particular expressions [inaudible] infinite [inaudible] expressions, right? >> Ross Tate: One of the big things is that our stuff will reuse nodes as much as possible, so I get into that more later on, but even if you put the axioms for like associativity and communitivity, that's like a huge blow up in the number of expressions that are equivalent, by our representation that is actually quite compactly represented, all of those equivalent expressions. Yes, there is a lot of variety, but because we had this sort of additive approach and because we reuse things and say here's the equivalence and here's an equivalence and we have this nice locality aspect of it where you can reuse sub expressions a lot tends to not blow up in our representation until we start adding loops. >>: Can you explain the difference between all and [inaudible]? >> Ross Tate: This is concerning all methods and there were only extreme methods where the optimizer actually made the change, so [inaudible] optimizer doesn't, just says input, output, can't do anything and so this is only concerning optimized ones. >>: And what was the proportion of the say 1864 functions that were actually changed [inaudible] from this table? >> Ross Tate: I don't know. Looking at the numbers I would say it's not huge but yeah, unfortunately I don't know it. Anymore questions before I go on? Yes? >>: [inaudible] all of the failures are because of [inaudible] procedures [inaudible]? >> Ross Tate: No, not all of the failures. Talking with other people who have worked in this area and stuff like that things that come up are basically too many optimizations have been applied and, or I mean, it's a good thing that a lot of optimizations have been applied, but done in a way that somehow some important intermediate state got lost and we can't figure out that intermediate state that connects the two sides together. So as you saw, optimizing sort of drift from one side into the other side and you converge in the middle but if something has been done that makes that middle state just not there anymore, we can't figure out and basically everybody seems to get stuck in that situation. Good to go? With this approach now the translation validation, we are making it so that companies can be use an optimizer even though it's unreliable but still safely do it through translation validation. After doing that we are making it to actually extend optimizers with new optimizations and make this successful for typical programmers and the reason why I thought this was important is that back in my industry days what I remember having to do is a lot of optimization by hand. I would write some program like this, image processing program here and I'd realize that this i times 50+ j isn't really the best way to access this image. Rather we need to be doing image plus plus in order to get rid of that multiplication from inside the loop. Once I recognize this I have this choice to make between keeping this program easier to understand, not only for me but more important for my coworkers, and there is this other program that I would like to keep more efficient. This choice comes up a lot because many optimizers include these [inaudible] three won't actually perform this optimization. If you are running in image processing, you are probably run into the situation and you may have thought okay, why don't I extend the compiler in order to take care of this optimization for me so that way I can keep around this intuitive code, but execute this more efficient code. Let's consider how much work is involved that. Well, if you are so fortunate that the optimizer is open source, then all you have to do is check out a copy of the source code, learn the architecture of the optimizer, and then implement your optimization even though you were doing image processing, so you may not be aware of programming language techniques. Then you have to integrate that into the pipeline and in your compiler and then in your compiler distribute to your coworkers. Before you do that, you should make sure that you debug implementations since after all we just talked about how compiler bugs can be quite annoying and you don't want to be guilty of those. Then you have to deal with the fact that this compiler you just checked out is being upgraded by the community as well so you have to merge all of those upgrades with your changes in order to ensure your team has an updated compiler. You can see this is quite an intimidating amount of work and it has scared off a lot of programmers including myself. And so in light of this why don't we just make a trainable optimizer, and for this in order to extend it with that optimization I showed you, all you have to do is we give a single example of the optimization. In fact, that example I just showed you works just fine and from that alone we were able to take care of the rest for you. And there's a little catch to this though that is sure, I could learn this exact optimization here, but how often are you going to be writing programs that work on 50 x 50 images? Not that often. So you really what you intended me to learn is some more general version of optimization, one that would work for any w, any h, any use of image i times h plus j and change those into an image of plus plus. Furthermore, you want me to learn a valid optimization, one that doesn't introduce any bugs and so you hope I learn psych conditions like this h has to be loop invariant and this use can't just modify i, j or image. Now recognizing that this is really your intent, yeah? >>: [inaudible] once. >> Ross Tate: What? >>: If image plus plus [inaudible]. >> Ross Tate: Yeah, yeah, but there are many more psych conditions which is why we don't want people to have to deal with this. Recognizing this intent is to make a system that will take a concrete example, and automatically figure out how to generalize into this more viably applicable form. And the insight I had for doing that was that this optimization we gave us here needs to be correct, otherwise we don't want to learn it, and so in particular these two programs need to be equivalent. We can understand why these two programs are equivalent; you can understand why this optimization works and how we can go about generalizing it. So with that we made up an architecture for a trainable optimizer, and we assume that the programmer gives us these two snapshots of the program before and after for which they want to infer for the optimization that they implied. What we do is we and these off to a translation validator that not only make sure that the programmer didn't make any mistakes in his optimizations, but also gives us a proof that these two programs are actually equivalent, and this proof is important because then we can determine which details of the program are important and which aren't important and can be generalized. So by generalizing this proof we can actually get generalized versions of the input and output programs giving us a generalized optimization for which the original one given to us by the programmer in the concrete instance. And so with this architecture what we enable is to be able to teach, is a way to enable programmers to teach compilers probably pick optimizations that are guaranteed to be correct by writing just one example of an optimization being applied, written in a language that they already know rather than some compiler specific optimization. And I believe this makes extensibility very accessible to the typical programmer. Now let's get some more detailed understanding of how this works, let's say this programmer knows that I am giving a presentation and says Russ please learn 8 plus 8 minus 8 can be trans-formed to 8 and so they say okay, we are going to hand these off to a translation validator that's going to prove that 8 plus 8 minus 8 actually equals 8, and then we want to generalize this proof, but before we do that we have to look inside the proof and so let's take a look at what this translation validator actually does. So the translation validator starts off knowing absolutely nothing about the equivalence of these two programs. Really all it knows is properties about the overall language it's working with, so it knows, for example, that anything minus itself equals zero and it can use this fact to infer that 8-8 equals zero. And so as it runs it's going to construct a proof of this database. In particular, it's going to say that implying this axiom we could add fact one to this database. We also know that if something is equal to zero, then anything plus that equals itself and so using this axiom and this fact we can infer that 8 plus 8 minus 8 has to equal 8. Again we make a note by using fact one and applying this axiom we will be able to add fact two to the database. Now this fact two is important because it actually proves that this translation validation or transformation given to us by the programmer is in fact valid. >>: [inaudible]? >> Ross Tate: Huh? >>: Can you use associativity? >> Ross Tate: Oh yes, I am sort of assuming that there are parentheses there. If the parentheses are there then we are not using associativity, but yes. That is another axiom we can add on. So we just proved that this translation given to us by the programmer is correct in so we can move onto proof generalization. Now the thing I found out is that proof generalization actually works best by going backwards through the proof, so I'm going to be going from right to left and to see how this works, I'm going to move these axioms out of my way and I am going to start off with some general program A transforms into some general program B and I want this to be valid and so I need to prove that A equals B. And the way I'm going to go about doing that is I'm going to look at this concrete proof here to figure out how it can refine A and B so that they actually are equivalent. As I do so I am going to maintain the invariant that this is the most general program transformation for which a portion of their proof that I process so far. Since I haven't processed any of the proof so far I start off with the most general program transformation. Now to start refining things, as I said it works best by going backwards, so we are going to look at the last axiom that we applied, this one here. In order to have applied this, that means that there had to be some c and d such that we knew c equals zero and 10 we use that to infer that d plus c has to equal d. And so we can look at our concrete proof to see how we use this. In particular, our proof tells us that we if we apply this axiom using fact one and so c equals zero has to be fact one and so we can add that to our database as a future goal to prove. Similarly the proof tells us that implying this axiom we added fact two so d plus c equals d has to be fact two. But this time we already have a fact two, namely A equals B, so to reconcile these differences what we do is we can unify these two effects. We can take all references of A and replace them with d plus c and we can take all of these cases of B and replace them with d. after doing so it actually makes sense, as the transcriber notes from below to above, from below to above. After all if you use this fact one you can get this fact two. Note that when I was making the substitution I also made the substitution within the generalize transformation and so I restored the invariant. This is the most general transformation for which the portion of proof that I processed so far applies. In particular, if I can figure out how to refine c so it will actually equal zero then that transformation will be correct. So to go about doing that, again, say we are going backwards so let's look at the previous axiom I applied, this one here. In order to do this there had to be some e since we inferred e minus e equals zero. Now this axiom makes no assumptions. We can start off with an empty database which is what you would expect from a good proof and then we can look at our notes and see that in applying this axiom we had a fact one, so e minus e equals zero has to be fact one, but once again we already have a fact one mainly c equals zero and so once again we reconcile these differences by unifying these two facts. In particular, we are taking all references of c and replacing them with e minus e. Once we've done this, we can we can transcribe our notes again and say that well, if you add fact one, or you can apply this axiom to add fact one to the database, and in so doing what I've just built is a generalized proof that this generalized transformation over there d plus e minus e transform to d is in fact valid, and our original concrete proof and concrete transformation are just an instance of those generalizations by taking all of instances of d and e and replacing them with 8s. So by understanding why this transformation given to us by the parameter is in fact valid, we are able to learn a more broadly applicable optimization, particularly by examining the proof of equivalence of this of [inaudible] was able to learn optimization from the programmer and to sort of recap what I did at a higher level. I hand these two programs off to a translation validator which gave us the proof that they were equivalent and this proof said, proof works because these weights are the same and because these weights are the same, and so when we hand this proof off to the proof generalizer it then inferred or they maintain those equivalences in the generalized transformation. However, the proof didn't need for all four 8s need to be the same and so we could use two different symbols, d and e in the generalized transformation. And what I proved is that using this process will always learn the most general optimization for which there is a proof of validity. And so with this strong guarantee we are able to run a large variety of optimizations from just single examples of them being applied and this inter-loop strength and bound reduction is in fact that image processing optimization that I showed you earlier. >>: [inaudible] use [inaudible] try to prove this equivalence you used this technique you just described like x again? >> Ross Tate: So these all have loops and stuff and so if we want to do loops then we have to use pegs and if there is anything with side effects you have to use pegs, so yeah, what I showed you is a… >>: [inaudible] it's all cool [inaudible]. >>: Yeah, so if you had load and source, you use the same techniques… >> Ross Tate: Yeah, so you're building off equality saturation kind of approach. Underneath to make this sort of, to make that technique I showed you for a realistic programs so we use pegs and then equality saturation on top of everything. So here, the cool thing is with this approach and in fact, and the fact that our technique actually can generalized to other optimizations as well, what this does is it tells us that since we build our translation validation as a technology for translation validation improves so will our ability to learn new optimizations from programmers. >>: [audio begins] distribution? >> Ross Tate: It's things like putting the multiplication inside the loop and to get rid of--and sometimes you can put a multiplication inside the loop so you can actually get rid of it by distributing it through everything and other times it's better to factor it out and put it at the end of the loop, so that's just moving operations into and out of the loop. >>: So can you or how hard would it be to do something like loop interchange or [inaudible] unrolling should work right? >> Ross Tate: So by unrolling do you mean the one where you take the loop and then make it sort of double copies of it or do you mean the one where you pull out one iteration of the loop? >>: The latter one was called peeling [laughter]. >> Ross Tate: Okay. Some people use the, unrolling for the latter one too, so that is why I asked. >>: Other one. >> Ross Tate: So the other one we actually haven't gotten that because basically our representation actually as I was showing you, and I'll show you later on has iterations kind of kind of tied into the semantics of it all so we have looked into ways to get rid of that. We figured out that you can do sort of meta-operators that can allow loop unrolling, but we haven't actually tried putting that into practice. Loop peeling is easy and… >>: Something like the loop [inaudible] dependence testing to prove it correct would probably go beyond the proof system at this point? >> Ross Tate: So loop interchange depends on how, exactly how they are bundled. So if you can figure out optimum, what we do, or if we do alias analyzing ahead of time in order to figure out how to turn it into a peg better, and in that case these two loops will actually be completely separate expressions and so loop interchange is extremely easy to do in that situation because they actually are already separated. Another situation where it's much more difficult because you have to figure out, you have to sort of figure out after the fact that they can be separatable about and it's a little harder to do. Sometimes we can do it and sometimes we can't. And so compared with my simulation relation, my simulation relation can do things like loop unrolling better, but loop interchange they have a terrible time with. So there are some pros and cons to the two different approaches and what's interesting is that the two different approaches seem to have the same walls that I was talking about earlier with the translation validation. So again, because this works with translation validation, if we use the by simulation relation translation validator, then things like loop unrolling would be just fine, but here since we are using our equality saturation approach things unrolling are difficult but loop interchange is better. >>: So how does this work practically? I mean if you can do this why have all of this code in a normal compiler hand written that [inaudible]? >> Ross Tate: There is sort of a detail here which is that like I said there is the most general optimization for which this works, but what does generality mean? In our situation we formalize what generality means and it depends on the logic that you're working with for that proof, and so generally for this process you need sort of a first-order logic to do all of this stuff and so when you implement optimization by hand you can do things that require higher order logics, so there are some optimizations that are better done by hand because they actually will work in broader situations, but the reason you don't want to do that all the time is because there are a lot of optimizations that are only going to be useful for certain domains and so you really don't want to learn like someone in that domain and then they would say okay learn this optimization and have it there. So I wouldn't say that you should go all the way this way and I wouldn't say you should go all the way the other way, it really is a matter of balancing the two. >>: You have performance results that you're going to show us? >> Ross Tate: I have performance results for optimization later on. The issues that come with the performance optimization part of it is that we, because we chose LLVM and byte code that they are a little bit too high level for a lot of things so if you can find programs that have some key bottlenecks and you actually put these optimizations in then you will get good performance results. The issue is is there a bottleneck in the code that is actually something that you can optimize. If there isn't which is oftentimes the case then you are not going to get good performance results, so there is a big issue with evaluating optimization in general. So I can go into more details about this, optimization evaluation problem that I found out when I started doing this research, but later on I will actually show you some success we had with the ray tracer where there was a big bottleneck and it had to do with these kinds of optimizations. So now that means that we can extend the compiler optimizations by just giving example. Another thing I looked into his making it so we can infer optimizations entirely automatically given the properties of the language. The reason I found this to be important is that when you make a new language, optimization tends to be a big hurdle in order to get that language adopted. You might wonder, well, there aren't that many languages being made every year, but in fact many companies like the old bitter game companies that I worked with actually had their own in-house language that's maintained by a single person who would have really like to have had an optimizer but doesn't have time to implement an optimizer, so this technology would've benefited them and also domain specific languages are becoming more and more common and so by incorporating my technology you could learn domain specific optimizations for those domain specific languages. To see how this works I'm going to focus on one very classic optimization known as loop induction variable strength reduction and what this does is it takes his program here and it translates it into this program over here, and particularly gets rid of that multiplication from inside the loop. The way it goes about doing that is it says okay there is 4 times i is going to turn this into variable j. And to accommodate that change they're going to change the increment by one, to an increment by 4 and they're going to change the bound of 10 to a bound of 40. And the reason why you might want to think about doing this is because they are useful for things such as array optimizations were typically this four times is like the size of the array elements and so you want to get rid of that multiplication from inside the loop. And so if you were to go through actually implementing this optimization for your language then there is still one more subtle issue that you have to deal with it which is called phase ordering. And the issue is that you can start off this program here and sure you can apply loop induction variable strength reduction in order to get this ideal version here, but you are also going to be writing a number of other optimizations such as the fact that four times will be replaced with << by 2 and if you apply those optimizations first, well then they will block out those loop induction variable strength reduction optimization in particular because they get rid of this multiplication from inside the loop. And so this issue of optimizations sort of conflicting with each other is what's known as phase ordering problem. Recognizing this let's consider how much work is involved in implementing an optimization for your language. With traditional techniques will you do is first identify all of the multiplications inside the loop and then filter out all of the non-inductive cases, that is variables being modified in some way besides incrementing. Then you decide which of the remaining cases you want to actually optimize because if you do too many of them you overload your registers. Then as you add new loop variables for each of the remaining cases and insert the appropriate increments for each of the remaining cases and replace the appropriate multiplications with the appropriate loop variables, and so once you've done all this, you have to integrate all of your optimizations into your pipeline and address that phase ordering issue and then you have to make sure you debug everything because there are a lot of little things that can go wrong here and again, we don't want these compiler bugs because they are very painful. You have to do all of this process basically for each optimization that you do and so this approach requires a lot of repeated manual implementation. So I have figured out a way to apply equality saturation to this problem. In particular, with my approach all you have to do to get loop induction variable strength reduction is add just three basic axioms of your language. [inaudible]. From that alone you will be able to learn the loop induction variable strength reduction automatically. More generally it helps us if you also give us estimates of what your operators cost so that we know how to prioritize them and then once you've done that we can automatically provide for a large variety of optimizations all of which are guaranteed to be correct, because we will actually be able to offer a proof of the transformed program is equivalent to the original program. So you don't have to worry about this debugging problem here. So to see how our approach works we start off with some control flow graph because, the standard procedure for imperative programs and again this isn't very good for axiomatic reasoning, so we're going to translate it to our own representation, the program expression graph that I talked about earlier. Once we have this nice mathematical representation then we move on to equality saturation and infer a bunch of equivalent ways to represent the same program by applying those algebraic identities. Once we have all of these equivalent representations, then we can incorporate what we call a global profitability heuristic which analyzes all of these equivalent optimizations and picks out what is the optimal one according to the cost model that you gave us. Once we have this optimal choice, well then we can bring it back to the control flow graph in order to get the standard representation so we can move on to other stages of the compiler like lowering down the assembly level. Now to see how this works in more detail let's consider this loop induction variable strength reduction example I showed earlier. There is a new challenge here which is that this comes in the form of a control flow graph, but again I said control flow graphs aren't very good for this kind of axiomatic reasoning, and so we want to figure out how to represent them as an expression. The issue is that this program has a loop in it and this loop really represents an infinite number of values, so how do we represent an infinite number of values as a finite expression? So to solve this problem, the idea I came up with is to use expressions with loops in them themselves, essentially recursive expressions. This 4 times i loop value here will represent this expression over here. In particular we have this theta node that says that loop variable i starts off at zero and is incremented by one in each iteration of the loop. And so once we have this nice loop, or a nice mathematical representation of this loop, we can move on to the familiar equality saturation process that I talked about earlier. We can apply an axiom that says that << by 2 is the same as multiplying by 4 and so we can add this equivalent representation here. And note that we are once again adding information. We are not throwing away the original control flow graph, and so this is very different from the prior approaches, in particular it is additive and this is how we deal with that phase ordering problem because we are still free to explore in another direction. Particularly we can apply another action that says that any operator distributes through our theta nodes and this results in this multiplication of additions here which then allows us to apply distributivity and that results in this four times theta node here, and there is already a four times theta node over here and so we can reuse that same node in order to keep our representation as compact. After we do that we can apply a zero [inaudible] in order to simplify those expressions and what we have just built here is what we call an EPEG or an equivalence program expression graph. What this does is it has all these equivalent classes of values here and it says that here are a bunch of equivalent ways to represent these various subcomponents of this program. Once we have all of these equivalence options, then we can incorporate a global profitability heuristic to tell us which of these options is in fact the best one, and so it will analyze this EPEG in order to figure out one representation for each equivalence class that optimizes the cost analysis that you gave us. Once we have this we can still divide down to this program expression graph and this gives us our optimizer result, and lastly it translates us back into control flow graph, and so this corresponds to this graph here. In particular, it says that there is a loop variable j that that starts off at zero and is incremented by four in the iteration of the loop. >>: I didn't see the 10 and 40 coming out for the [inaudible]. >> Ross Tate: Yeah, sorry I was just focused on this thing here that you start adding less than the 40 [inaudible] starts going into much bigger picture. >>: [inaudible] distribute [inaudible] a turning point saying that [inaudible]. >> Ross Tate: Yeah. We have an axiom that says… >>: [inaudible] 10 would be smaller than 40 and then four times [inaudible]. >> Ross Tate: Yeah. This is where you also have to know that the upper bound is small enough to make sure there is no overflow and stuff like that, so all of the axioms I've showed you so far hold for modular arithmetic so they are not an issue, but things like inequality axioms, the standard ones don't hold for modular arithmetic so you have to make sure that the bounds are appropriate. >>: So this representation just [inaudible] or is it [inaudible]? >> Ross Tate: It's a more complete form of it, so it's actually one that so [inaudible] you can't throw in CFG. It's not enough information. Values are different and actually will get conflated. And [inaudible] you can say this is sort of is very similar to a conversion [inaudible] where all of the loop in the c’s here are explicit. There are versions that don't have explicit loop in the c’s but you can't throw away the CFG because they actually merge values that are not the same, so this is sort of one that's been flushed out all the way to make it completely independent from the control flow graph, and so we figured out a way to convert from this representation back to the control graph and also gave [inaudible] semantics and showed that we have a transformation and we have proven costs by moving from two representation actually preserves semantics, so it's sort of a very thoroughly done as [inaudible], but along the same lines. >>: But you do have this challenge of [inaudible] loops in a way [inaudible] loops in the control program, is that right or no? And another example you had optimizations [inaudible] earlier, if it's in this form then how do you reason about those [inaudible]? >> Ross Tate: So if you had another loop down here say, and so they are effectless loops, this becomes easier because actually in our thing we built this would be one loop expression, another loop expression. They won't even be in any sequence on top of each other and so that's where something gets much easier. When there is an effect loop for [inaudible] forces them to be decentralized and that gets things a little messier. >>: Can you still reason about them figure out what kind of optimizations you want to do? >> Ross Tate: We've had [inaudible] on that one so it depends on just how complicated the story is. So if the story is simple enough it works out. If it's too complicated, then we at least automatically we won't be able to do it. Actually we can do it by hand and that's another thing. More questions? >>: [inaudible] have you ever said anything about in-line? So is it just totally trivial or is there some subtleties about it that [inaudible]? >> Ross Tate: So I've been talking about axioms here, but really our engine allows arbitrary what they call equality analysis to come in and they'll do fancier things. So one of them is an inliner and so it will say okay, here are things that have been approved for inlining because they [inaudible] inlining here and so we align him and it's fairly standard how the language works, just add the equivalence [inaudible] between the function call and the function call replaced with all the expressions [inaudible] with all the expressions replaced. Some things we found so it's very important to get rid of intermediate variables. We found things like trying to do lambdas with--and the thing that's possible to do, but in practice because of the sort of exponential kind of exploration, this additive approach, it just becomes very, very messy when you actually try to use like lambdas and try to substitute inside the lambda, it doesn't work very well. Lack of intermediate variables is actually a big thing towards getting this approach to work. Yeah? >>: [inaudible] curious you consider formalizations like skipping iteration [inaudible] zero or backwards [inaudible] up [inaudible] are replacing [inaudible]. >> Ross Tate: The first one is easy. The first one we can do. Actually that one will happen automatically. That's a loop peeling thing. The going backwards one, is not easy, so that when I wouldn't, that one requires you to know, so basically we represent these are the sort of like this data node is a, this is where the iteration thing comes in. This says iteration zero at zero, iteration one it's [inaudible] iteration zero. So as far as semantics representation incorporates iteration counts into it. So those little shifts are fine, but things like reversal, you can't even reverse an infinite sequence, right, so you have to really have to know reversal with respect to some maximal point and similarly start taking every other one you can do we have sort of an even and odd things. That's how you get the loop peeling, or the loop unrolling thing, but we haven't tried putting that into practice. Some loop things are a problem and other loop things are difficult. I could spend those up first and… >>: So this is a totally off-the-wall question, so you have [inaudible] graph… >> Ross Tate: Yeah, yeah, yeah. >>: Okay so [inaudible] how [inaudible] is it? >> Ross Tate: So they are basically very similar strategies, this equality saturation thing. The Denali is within a block and within that block having just a single block very much changes the picture of things, so you can use very different techniques for doing this process and you can, basically they work on like six instructions at a time, so the scale of things is completely different, so we are doing whole Java methods at a time or whole C methods a time, and so they don't have to worry about loops and stuff like that, so I really view them as this is more of meant to be of generic purpose compiler or smart generic purpose compiler whereas Denali is a, well here is a six, small chunk of six [inaudible] instructions that need to be optimized to all hell and go through it as detail as possible. >>: [inaudible] how much of the same [inaudible] in terms of possible. >> Ross Tate: So they had finite [inaudible]. >>: Right. >> Ross Tate: So they actually explored the entire [inaudible] case so that's why they can use a set solver and it will actually tell you this is true or not true, whereas, we can't because our state space is infinite, so that's why the breadth first search is important rather than the indepth first search because we can't explore the whole space. >>: So this is a very structured [inaudible] so you are saying that [inaudible] does that work for any [inaudible]? >> Ross Tate: It's the, it's the, I forgotten the term. [laughter]. >>: Reducible. >> Ross Tate: Yes, reducible, thank you [laughter]. Reducible CFGs we can do and there is a way to translate irreducible CFGs with some duplication. Any reducible one we can handle and in fact, something that took a lot of struggling, suffering was figuring how to make loops that came from non-structured loops and still revert them back into non-structured loops, so if they had breaks or continues actually restore them into a loop that still has breaks and continues rather than have a bunch of duplication in another one of our branches, so that was messy but it's been figured out. >>: In terms of reduction [inaudible] do you have to figure out [inaudible] or do they just pulled out [inaudible]? >> Ross Tate: This thing just happened, I mean we didn't, like a lot of optimizations are like let's just see what happens and so we threw them in and they worked just from the axioms. Some of them didn't work and it was either because there was some, sometimes it's because there is like something big like a reverse in the loop doesn't really work for this kind of representation, and other times it's because were missing an axiom and so we just add that axiom when we get it. So like an intern looping [inaudible] bound axiom actually requires something in the long run [inaudible] something like you mentioned, at the end of this loop we know that j is actually 40, right? This actually has, knowing that fact requires sort of a higher order axiom and so we added the higher order axiom into the system and then we could do some fancier stuff that way. So something I've learned in general, this has come up a few times, is that the representation that you choose for optimization is a big factor as far as which optimizations are easy to do and are not easy to do, and so I chose this one here but there are many other ones that could be valid for different kinds of programs and different kinds of things and part of that proof [inaudible] process actually, I made sure that the algorithm I came up with actually could be generalized to other kinds of representations as well, so it's not just stuck with PEG; it can actually work with other kinds of programs. >>: So this is obviously fairly akin to program verification techniques it uses. Usually in program verification when you have loops you summarize them via loop invariant and there are a variety of techniques [inaudible] invariance. I was wondering whether you think you have all the power you need or if you were to apply loop and invariant inference techniques and incorporate that into your techniques would you get more? >> Ross Tate: We found is we actually already have in our, actually in our anti-aliasing, there are a few things that are best done before you start doing optimizations in order to basically learn loop invariants and because, the reason is because it's hard, sometimes it gets possible but it's harder to do loop invariants dynamically because you have all of these equivalent representations and you are basically trying to do induction over equivalent representations and so it doesn't really work very well we found out in practice. So we do loop invariants before hand and then we use those ones and once we have a few beforehand, then we can figure out how to augment them dynamically as well, but we still need, we found out we still need some starting point to go from in order to get those. All good? We probably already have seen this but looking back where it came from we start off with four times i that got changed to j and so we have this increment by 1 and we change it to increment by 4 and there's a bound of 10 and change it to bound the 40 and this looks familiar, because this is in fact loop induction variable strength reduction, and what's cool is that I didn't program this optimization explicitly. It just sort of happened and we call this emergent optimization and we've found that there are many optimizations that will emerge just from these basic axioms automatically. So this is how we were able to get sort of this language optimizer automatically for optimizations. Now I've been talking about language optimizations here, but we also found out that we can also apply this to libraries. The reason why this came to me as something important is that I'm really bad and back in my undergraduate days I had to write a ray tracer, and I had this choice between mutable vectors and immutable vectors. Now mutable vectors meant that I had to write a big sequence of plans like this in order to implement this very basic expression with immutable vectors. Furthermore, mutable vectors if I chose them meant my code would be error-prone because I would have to track things like ownership, make sure that the wrong person doesn't modify the wrong vector at the wrong time and with immutable vectors, they don't have those kinds of problems. In light of that you might have expected me to choose immutable vectors, however, I was worried that immutable vectors would be inefficient. In particular, even this basic expression here allocates a number of intermediate objects and those objects are just made and thrown away almost immediately. Because of this I decided to go with mutable vectors in order to get better performance. I remember the decision years later when I started working with optimization, and so what I did was to see whether I could apply my optimization techniques to this library design, and so I went back to that ray tracer and re-implemented to use immutable vectors like I would've liked to have implemented and I did find out that it actually ran 7% slower, so I was justified in my performance concerns. What I want to do is apply my techniques in order to replace these very manually intensive library modules and get these sorts of nicer ones and still have the same performance guarantees as with the manual ones. So the idea I had was using these techniques where I could enable library use optimizations. In particular, this idea that if we could express the various guarantees about using your library as axioms. So for example, if I had a vector library that has these two, adds these two vectors together and gets this first component, well that first component is going to be the same thing is getting the two vectors first components and then sending them together. And so once we incorporate all these axioms into this quality saturation process, then we will automatically infer optimizations for using my library, specialized for using my library. And so I applied this to that ray tracer and was actually able to get immutable vectors to run faster than the mutable vectors because they were more axiom friendly. In particular, the algorithms we learned were able to reduce the number of allocations by 40% and get rid of all those intermediate objects. So by using this psychology we can make it to where you can design your libraries the way you would like to and still get the performance issues by taking advantage of these axioms. >>: This fellow [inaudible] axiom in his writing DSL's for compiling high-level specifications of things like [inaudible] transport down to [inaudible] ships and telling libraries, so I take it, rewriting is strictly more powerful, right? >> Ross Tate: This is in the same language, so we take Java code and we rewrote it to Java code and so this is the performance on top without even having to worry about lowering down. All these metrics I am showing you are on top of the JVM’s optimization as well, so this is optimization that JVM couldn't do and the JVM Optimizer is actually fairly advanced we found out. >>: Sure but I'm just thinking of it, I'm thinking in terms of comparison to a rewriting system, so what is your, what is the expressiveness of the things that you [inaudible]? >> Ross Tate: Well, so we can rewrite, this is essentially, you can think of this as a rewriting system but with this, that profitability heuristic that allows you to explore many re-writings simultaneously, whereas with the typical rewriting system you have to worry about only like, it basically has the phase orienting problem. If you rewrite this in the wrong order you get problems and so we sort of addressed that issue by using the quality saturation, keeping this additive approach and also we figured how to extend it through loops and stuff. Sound good? All right. >>: Sorry, the whole idea here is reading [inaudible] stuff, abstract data types and certain doing higher-level transforms, it seems quite related to… >> Ross Tate: I'm not familiar with the word so it could be… >>: Right. Essentially the idea is that if you have sort of, if you have some knowledge of higherlevel [inaudible] in your data structures you can do communitivity and things like these you know optimizations are better [inaudible] heuristics [inaudible]. >> Ross Tate: In general, the cool thing is that we've figured out, or the experiment showed that that actually makes a difference. We can do stuff automatically. All we have to know is those axioms as you're saying, and like my intent with all of this was to make it to where people could program differently, not just the existing programs and run them and make them run faster, but actually get it so people could actually write their libraries in a way that's nicer so this at least substantiates that that would work. >>: You're trusting essentially the pragmas or whatever the inability… >>: We’re trusting the axioms are correct. >>: That's right, but you're inferring them from the type signatures? >> Ross Tate: No. Like I've written the library, so as I go through it said okay, here are some axioms that I imagine would be useful and so then I okay them for optimizations. The library writer would provide the axioms. We can't infer them automatically. In fact, actually ideally the ideals… >>: Actually it wasn't so much about the axioms; it was really about the purity, like knowing that the function was pure. >> Ross Tate: And again that would be an axiom, the reason why I like the axiom thing is that it means you don't have to have the library code and so practically with 00 systems where you are dealing with interfaces… >>: Okay. But you are not addressing the problem of checking the [inaudible] of the axiom. >> Ross Tate: No. That is a whole other story. Yeah? >>: Does the library writer also have to write class functions? >> Ross Tate: We didn't have them do that, no. We had a very, well our [inaudible] were very naïve but we found out that naïve [inaudible] models actually work pretty well. It would be better, actually the one thing where it would be better though in practice is knowing that some methods are more sensitive than other ones, so right now it just reads calls uniformly and it would really be better if it would say like no, no, calling that function is nothing like calling this function. Please do more of these ones unless of these ones, but we didn't actually do that in our system. We, the techniques we used cannot accommodate that, but for the amount of optimization that we were trying to go for, that wasn't something that we would go for. All right. So we can talk about performance but everyone wants to talk about the performance of my own tool, right? Sorry? >>: [inaudible] aliasing, the last approach where you [inaudible] axiom [inaudible] new object [inaudible] aliases are there, you may want to [inaudible], I don't know. >> Ross Tate: Oh, so… >>: So do you syntactically [inaudible] objects or do you… >> Ross Tate: We are not rewriting objects or anything. We are just rewriting code, so here, I mean here you would still have, u and b would still stay around, there would just be this thing. So you don't have to worry about the aliasing or anything like that for the optimization that we are dealing with here. The only things, the only things we are aliasing that would useful, would be knowing if that state operation is going to commute and basically all of the time the alias is pretty useful in knowing when that state operation is going to commute. So aliasing isn't really too big a deal for these type of library axioms. Going onto performance, this is where that tool chain I showed you in the various stages of the compiler, and so to evaluate how effective this tool chain was, how efficient it was, we ran this on a SpecJVM 2006 Benchmark Suite and we found out that the quality saturation actually runs quite quickly. The slow part is actually the global profitability heuristic, and to figure out why we did some investigation and we found out that the equality saturation was doing such a good job that even though we stopped it off early, by the time we stopped it off we could find a trillion, trillion, trillion ways to represent a single method on average. [laughter]. So you can imagine it takes a while to figure out the best option out of a trillion, trillion, trillion options and so that's why the global profitability heuristic takes a while. In light of the fact, even though there is good reason that this takes a while we still wanted to figure out a way to get over this hurdle in order to get this technology adopted. And so we had the idea of combining this technology with the previous technology I showed you earlier in order to speed up optimization in general. So this technique I'm going to show you works with really any event optimizer and not just our own, and there are a lot of these things out there such as Denali, and these things are smart, but as a consequence they are slow and some of them are really slow and you have a lot of code that you want to compile but you don't want to wait around forever, so the other things out there are things like rewriters which are quick and efficient, but also rather naïve, and so what we would like to have is sort of the ability to combine the intelligence of these advanced optimizers with the speed of these efficient rewriters. So the idea I had for doing this was I could take off just a piece of your code base and ship it off to the advanced optimizer in order to get a very well optimized version of that code base. We could then ship that off through the optimization generalizer in order to learn optimizations from the advanced optimizer and then tack on what we call a decomposer in order to break this up proof of true equivalence up into lots of up into a bunch of little lemmas. Then from each of these lemmas we learned a bunch of mini optimizations specialized to your code base which we could then incorporate into our efficient rewriter and pass the rest of your code base through this rewriter using these lessons that we learned from this advanced optimizer. To see whether this was effective or not, we ran this on that ray tracer I talked about earlier, and we found out that the rewriter was actually able to get, produce the same high quality code as the advanced optimizer just using the stuff we learned from that part of the code base, and furthermore it was able to do so 18 times faster than the advanced optimizer, so significantly addressed that performance problem I talked about earlier. To recap, this made it to where we could refer optimizations automatically and efficiently and overall made it so we can actually use optimizers, take advantage of them even though they are occasionally broken, means that we can extend optimizers in order to address our sort of more domain specific needs and made it so the optimizers are available for new languages and also for library writers in designing their libraries. And so by continuing this line of research, what I'm hoping to do is make it so that discussions like this are no longer necessary. We can focus on the more important things like correctness, and something else I wanted to talk about is that this is only one line of my research. Another line of research that I've done here mostly is typed systems. In particular, thanks to Juan and Chris I learned all about existential types for dealing with typed assembly languages for C sharp and what Chris is working is this operating system [inaudible] that's guaranteed to be memory safe and the big issue is that they want to be able to take C sharp code either for things like a scheduler or for user programs and run them in their operating system, but C sharp code is memory safe but compilers are broken and so you want to infer types at the assembly label to make sure that even assembly code is still memory safe and so that's what I did here. This got me introduced to existential types and made this big category theoretic framework and this turned out to be very useful for dealing with Java. My students found this out for me, that Java has all sorts of problems with wildcards. They had this piece of code that they are writing for my class project and they are very frustrated because this code wasn't compiling and they had no idea why. I looked at the code and I found out that the code is actually correct; the type checker was broken. Did some further investigation and found out that the wildcards in particular, Java just does not do a good job with, so I applied this existential framework because wildcards are actually existential type. In order to prove algorithms I was using for type checking and also figure out how to refine the type system a bit and showed that it's practical to do these refinements to make sure that Java state is actually decidable. While type argument inference we can't solve, but at least the rest of it, subtyping and basic things we actually--subtyping wasn't known to be decidable before, so now it is decidable. >>: What happened there? I thought [inaudible] was on top of all that stuff… >> Ross Tate: Wildcards are subtle. >>: [inaudible] after, where did the wildcards come in? >> Ross Tate: That was the Java 5, the generics wildcards all working together. >>: [inaudible] the same time? >> Ross Tate: Yeah. >>: No, no. Generic [inaudible] wildcards. >> Ross Tate: [audio begins] for Java I'm saying, they came at the same time. >>: The generic Java prototype. >> Ross Tate: Yeah, yeah. >>: It had the wildcards [inaudible]. >>: Right. [multiple speakers]. [inaudible]. [laughter]. >>: Yeah, that's what I thought. Okay good. >> Ross Tate: There is a good reason for wildcards and what they address. There is also some messiness, so like with the Ceylon stuff, so I guess I will go on to Ceylon. >>: Okay. >> Ross Tate: Ceylon, there is this Ceylon at Red Hat making a new language, in particular there are people who worked on the hibernate project for people who are familiar with that. They've had tons of years of experience dealing with Java code and they become very frustrated with some of Java's problems like wildcards, and so what they wanted to do is sort of cleanup Java in general OO enterprise programming and so we're trying to learn old lessons from these languages, so they are aiming for things like decidability and so that's how my work got involved. We figured out that wildcards have some good aspects and had some weak aspects so we are incorporating both of them, or trying to throw away the weak aspects while incorporating the strong aspects, stuff like that. I have been working with them on making sure the type systems are nice and clean, and has all of these properties that they told me. They told me things like principal types. They want decidable types and decidable inference where they allow it, they want, stuff like that, so I have been helping them design their language and make everything look nice and pretty and meet their guidelines they have given me. So this is sort of one line, another line of research that I'm doing and sort of the longer-term track, very, very far down the line, some problems that I have seen is that if you have some tool for C sharp, well you don't really know where that tool transfers over Java. Similarly if you have some cool proof for Java you don't really know whether that proof changes over to C sharp. This is frustrating because these languages, while they have some key differences, they also have a lot of similarities and so you would expect that a lot of tools and proofs could be transferred across these languages, but we don't really have a way of formalizing that. What I want to do is make sort of a meta-language for programming languages in which we can formalize requirements of proofs and tools and formalize copies of languages in order to make it so one tool can transfer across many languages simultaneously, and similarly when you make a new language, you know what kind of properties you want to aim for so that you can have access to these existing tools and proofs that were made for all of these other languages already out there. And I've done some work on this with effects here and also looking into foundations in cut stations and mathematics and looking into resource constrained computations, so just sort of scoping the landscape right now, but I'd be happy to talk about, to any of you about that today or tomorrow, but at this point I want to thank all of the people that I have had a chance to work with at UCSD and here. I guess thank all of you for the Fellowship that let me do all of this research and then open up to any questions that you may have. [applause]. I talk too fast. >>: [inaudible] translation validation, why didn't you use [inaudible] verification on those? >> Ross Tate: What you mean by that? Sorry. >>: Translate your, seem to me like you are building your own E graphs and everything so instead let's say translate [inaudible] all of the transformations and formulas you could [inaudible] graphs from [inaudible]. >> Ross Tate: Oh, okay. We tried that. It didn't go very well. So this is also years ago when these ecologies were younger, so it may actually work now. The reason why we think it didn't work well was in particular because we had recursive expressions and the algorithms that, things for the E graphs solvers stuff in Z3 and in Simplify and stuff like that don't seem to really like recursive expressions. They just run forever, go down a rabbit hole and you never hear back from them. >>: Do you think it's because you say something [inaudible] modifier somewhere [inaudible] extension [inaudible]? >> Ross Tate: It would do so that we had that operator, that operation that distributes to the theta node that can go on forever and ever and ever. We think that if we didn't have programs that didn't have, that operate with that specific axiom or didn't have any of the loops in, then it would be fine. >>: [inaudible] have to be careful. >> Ross Tate: So it didn't like those kinds of things, and those were specifically the kinds of things that we wanted to have work. >>: You wrote your own qualifier extension? >> Ross Tate: Yeah. [inaudible] undergrad. So yeah, that's how we got that to work. >>: I have one more question. You know, those are like amazing results with the optimizer and stuff and you actually ran them in like we'll Java byte codes, could be a realistic tool, but do you feel like [inaudible] you could really incorporate it in the Java compiler, for example, is it practical enough [inaudible]. >> Ross Tate: I wouldn't say it is ready for J compiling. >>: No. I was just saying can I build, would I be able to build a compiler with no optimizer and just give it axiom and I integrate your framework… >> Ross Tate: There are certain optimizations that I wouldn't put in that class, particularly if this is rather high-level, so things like register allocation, all of those low-level things, don't really, like once you get this level of abstraction, it's, you can't see the differences between those, so I wouldn't try doing those kinds of things in this system. Also as you saw there are performance problems. Actually, they are probably better now because it's years later and the solvers that we used have probably gotten better. Actually, what I think would be better is rather a sort of a compromise of systems. Take a traditional compiler that has many of the core things on there, add the extensibility aspect to them and then make it so in particular, the strategy I showed you with the learning optimizations from the super optimizer, I think what would work best is running the super optimizer on your code base once a month, learning all these axioms once a month, applying those for the rest of the month and that way you would have the various, I mean you've seen all of these things and you have a much smaller set of axioms that run much more efficiently, and then another month later after your code changes enough, learn again and then we go again. I think that's a better approach towards going about this because it would accommodate those strengths and weaknesses of the system. >>: So I see that you have something with LLVM up there, and I was wondering if you could compare a little bit with the work on [inaudible] which formalizes the LLVM in [inaudible] representation [inaudible] and so on, what kind of semantics do you use with the LLVM? >> Ross Tate: We didn't use the formal semantics for the LLVMs. Actually so Mike Stepp is the one that took care of all of the LLVM specific stuff, so I did the general-purpose kind of stuff. So he dealt with like byte codes and bit code, byte codes and bit codes and then he would talk to me when he ran into some problems like okay, I can't figure out how to represent this, how do I represent this, and then we would talk about better ways to represent those kinds of concepts and stuff, but overall he was the more detail oriented part of that part of the project. Unfortunately I can't go into the details of the LLVM. >>: Do you have a transfer of the LLVM into your representation [inaudible] LLVM semantics [inaudible]? >> Ross Tate: Yeah, and then we recovered back to LLVM. >>: [inaudible]. >> Ross Tate: Yeah. Our translation from LLVM to our representation was very, very basic. It was like here is the LLVM expression maps to this; it was very simple ones. So it was more the axioms were really the things which incorporate the semantics of the LLVM, and they were mostly things like integer axioms and stuff like that. There was a bit of a mess with the fact that there are many different sizes of integers and all of the operations in LLVM are typed and so getting all of those to interact well with each other could have been messy, but we figured out how to deal with them. Yeah? >>: Is there a class of loop optimizations that are optimizations held to cache [inaudible]? Have you thought about any thoughts on expressing that sort of thing as a cost function? >> Ross Tate: That cache locality and the parallelism comes up a lot when I talk about this stuff and basically they have two common issues. The two things have the same issue behind them which is generally interprocedural stuff. Basically for cache locality, you typically have to know how memory is laid out and that's typically not written, not available in the function that you are working on, so it's hard to make a cost model that doesn't even know how the memory is laid out and stuff. At least that has been our, when we have done some investigations into this this is a problem that we always ran into. You didn't know how the memory was laid out, at least not automatically. If another tool came along and said here is some information, then we might try incorporating that, but that was one where we didn't have any luck because we didn't have such a tool. >> Daan Leijen: Okay. Let's thank the speaker. [applause].