>> Ben Zorn: All right. Well, welcome, everyone. It's a great pleasure to again introduce Emery Berger from the University of Massachusetts. Emery is here for a month and he's still here for another week so if you want to talk to him, you're welcome to. Emery's done a lot of interesting work in systems. We heard last time about auto man, and this time we're going to hear about a stabilizer, which is a very creative way to slow your program down but make it better. Thanks, Emery. >> Emery Berger: Yeah, turns out thanks, Ben, yeah. That was great. Yeah, it's a lot easier to slow down programs than to speed them up. So I have a rich career ahead of me. All right. So this is some work on performance evaluation that just recently appeared at ASPLOS. I actually could not attend ASPLOS this year because I was sadly in Rome. So I it was Rome, Houston. But this is joint work with yeah, right, exactly. So my student Charlie is the lead grad student on this work. The only grad student and this work, and he presented for me. And he, I hear, did a great job. So hopefully, I'll do as well. So, you know, I think most of us in this room actually care about performance, right? A lot of people in general, I think, when you write your program, you think or your optimization or your system, you say well, we really would like to show that, in fact, speeds things up, not slowing things down, right? So the problem is it turns out that largely this isn't really necessarily everybody, although I include myself here. Many of us have been doing it wrong, all right? So we've been doing performance evaluation for years and years and years, and we actually have been making some crucial errors. In particular, things that we think are best practice, like running things repeatedly and getting error bars and showing all these very pretty graphs, that's not actually enough. And the reason is that there are things that really are lying under the covers that we're not actually seeing that have a huge impact, and they have such a huge impact that it actually makes the results that we're reporting potentially meaningless. So what do I mean? So in particular, there's this problem on systems which is that the layout of the objects in memory, the objects include your data and your code, dramatically affect performance. And even very, very slight things that you might do to your program actually have a pretty serious impact on layout, right. So if you change some of your code, it moves things around. If you change some allocations and so on, it moves things around. And right now, when you run your program a bunch of times, there's really no way to isolate this effect, and this effect is super, super important. So the goal of stabilizer is to eliminate this effect. And by eliminating this effect, it's going to enable to us do what we call sound performance evaluation. I'm going to explain what that means, all right? And not only am I going to explain what it means, I'm going to show you some case studies that demonstrate the value of using stabilizer and taking this sort of sound performance evaluation approach, okay? >>: I have to argue already. >> Emery Berger: All right, just one second. slide. So the case study Finish up the >> Just like old times. >> Emery Berger: Yeah, really. The case study I'm going to present is an evaluation of LVM's optimizations, and so I'll show you what the impact is of using stabilizer and how it lets you understand things differently. Katherine, go. >>: What happens if your performance optimization is to chase a layout in order to improve performance? So then randomizing layout, like, defeats the purpose of what you're trying to do. >> Emery Berger: Yes. So I'm actually going to talk about that a little bit later. But it is true that if your goal is to do something that affects every single aspect of memory layout, not just the data placement, but also the code and also globals and also the stack frames, then stabilizer would be undoing those things. However, if it only attacks one, which would be normal, like typically data, for example, or possibly code, then stabilizer allows you to isolate all the other effects. >>: Okay. So you have a base. So I'll derail you just a little more. You have a base system that does code layout, which it's trying to give you good cache locality. But then you do a data layout optimization. And so do you want to screw up the finely tuned code layout as well to see if your data layout is independent of that? Like that's the key thing? >> Emery Berger: So here's the thing. So first, I think it is you know, there's this fear of optimizations, like the ones you're describing. >>: I'm not afraid of them. I want to make them. >> Emery Berger: No I understand. I'm not saying that they're scary, you know. Then there's many, many other things people do with their code, right. So they go and they manually try to optimize their code. Or they're writing their code and they add features, and the features make the program go slower, right. So this is a performance regression, right? So the problem is that you run your program, and we'll get to this in a second. You run your program and you see some performance change and you draw a conclusion. And that conclusion may be completely unfounded because of all of these other confounding factors, right. So these confounding factors are all of these artifacts of running on real systems today. When it comes to very, very precise things that actually depend on memory layout, the question is does it depend on every single aspect of the memory layout or not. Things like stack frames typically are not subject to these kind of optimizations. It depend. Code placement definitely. Heap object placement, iffy, right. So there are a number of these factors that we can you can sort of say, all right, I really care about these factors. Don't change those. But change everything else. All right? Okay. So I'm going to go ahead and focus first on this issue of memory layout and how this affects performance, okay? So, of course, some of you are aware of this, but that fine. So suppose I've got a program so here's some chunks of code, and let's call this program A. And then what I'm going to do is I'm going to actually make some modifications, right, so I move some stuff around, I refactor some things, I modify some functions and now I call this new program, program A’. All right? So the question is, so say I meant to do this to make it faster, all right? Like I was like oh, wait, there's something here I can really speed up. I can make this go way, way faster. Okay. So what do I do? So I run it, right. The green one is the new one, the blue one is the old one, and it's a little hard to see the numbers here. This one says it took 90 second. And this took 87.5 seconds. All right? So which one's faster? >>: [indiscernible]. >> Emery Berger: Okay. So A’ is faster. What's that? No idea? You have no idea. Man, see? All right. So this is the problem, right? Everybody's like it's a trick question. I'm scared. So this is really how we formulate this, right. We say is A’ faster than A. And you look at the time and you look at the bars, you say oh, this bar is to the left of the other bar. Fantastic, okay? So clearly, A’ is faster. And, in fact in this case, the difference is 2.8 percent faster. So you say okay, 2.8 percent faster. All right. What's the first obvious objection to this. >>: One run. >> Emery Berger: One run, okay? I ran it one time. So the question is, well, what about variance? What if there's some sort of noise in the system, is 2.8 percent really something that overcomes that noise. So now I run it 30 times. 30 is kind of a magical number. It turns out, for many of these studies, so we run it 30 times and now we look at the variance and the variance is pretty tight. Everything looks good. Looks like A’ is faster. In fact, the mean's not not just the means, but the extremes, the max of one and the min of the other, still 2.8 percent difference. >>: Why is it two bars for the green and three for >> Emery Berger: Yeah, there's actually three bars here and two bars here, but this is just the distribution, the empirical distribution of running this code. Just happened to be that way. It's really just an artifact of binning, okay. So why is A’ faster than A? Of course, it's because I'm a genius programmer and I made it 2.8 percent faster after three weeks of effort, right? Terrific. So it could be the code change. That's what we generally assume, right? Like I made this change, and good things happened. But it could just as easily have been the new layout, right. I made a bunch of changes here, moving around functions, changing the size of functions, that had a significant impact on where these things end up in memory, and one of your colleagues, Todd Mytkowicz, back in 2009, presented a paper talking about how layout biases measurement. So what kind of things can you do, and this is what Todd showed, that can have dramatic impacts on the performance of your program without changing code? So you can go into your make file and change the link order. So just changing the dot Os, like foo.O, bar.O or bar.O foo.O can have a substantial impact on layout, right. It moves all of the functions from one place to another. Environment variable size. This one is completely bizarre. So it turns out that when you run a C program it actually goes and copies in your entire environment and then starts the program after that. And that means that changing your environment and changing your environment can mean changing your current working directory, the time of day, the date, all of those things actually move all of your code in memory. So you might think, all right, come on. We're moving the stack. >>: What if your Java run time has a little C component and managed run time has the same problem? It's not just the C program. >> Emery Berger: That's true. Actually, so Katherine has a good point it goes even further, because not only is Java or .NET actually a C program, it turns out that the run times actually do import the environment. And they import it into your heap, right? And a lot of the code gets compiled on to the heap, which means that your code placement and your data placement are still effected by these environmental factors. So you might think, all right, come on. How much of a big deal is this going to be? Well, it turns out as Todd and Amer showed, and Peter Sweeney was on this paper as well, that the impact of these changes can be greater than the impact of dash 03. And these are very, very coarse grain changes, right. This is moving all of the data just on block or taking a whole bunch of functions and moving them separately. The changes that you make as an individual, your sort of very, very fine grain changes can have much more dep impact. So what's going on? So the problem here is the cache. So you have these caches. Caches are awesome. They make your programs run faster, right? You have the very, very fast memory, but it's relatively small. And in some cases, you can end one some sort of catastrophic collisions in the cache, right. So you go and you have your code here. These two piece of code are hot. So they end up, unfortunately, mapping to the same cache set, and you get conflicts. So they don't all fit in cache at the same time. It runs slower. When you change your program, by luck, you could end up actually getting an empty space in that set, right. So the code is not there anymore. The hot code has moved elsewhere, and now we have no conflicts. All right. So this can happen because of caches. Caches aren't the only source of this sort of problem. Branch predictors suffer from the same sort of phenomenon. Branch predictors depend on program counter program counters to sort of key in to hash tables. Then there's TLB, right, transition look aside buffer. Same thing. It's address based. The branch target predictor. I could fill up the slide, basically, with all of these things that use essentially addresses as hash functions. So really, anything in your hardware, it has a hash function, suffers from this problem. >>: I have a question. So the TLB is not fully associated? >> Emery Berger: So the TLB is fully associated, but it depends on where things lie, right. So if you think about it, if I have a function that spans two pages, right, then I would need two entries. But if it's all in one, then I would only need one entry. So all you're right about the address not being the issue, but it's about sort of fixed size resources as well and the placement of things in memory. All right. So now that you've seen all this, if you hadn't seen it before, let's go back and think about this. We're asking this question, is A’ faster than A, but now we know there are all these weird confounding factors. So what do we actually do? practice? In In practice, we essentially go and we find the one layout, right. We don't try all the environments. We don't try all the link orders. We just ask one person. So it's sort of like we ask this guy, and it's like hey, it seems to be faster, right? It's 2.8 percent faster. We're done, right? But here's something you would never really do in practice, right. You wouldn't go ask one person for their opinion about something, say hey, what do you think? Is this faster. You get yes, yes, yes, yes, yes, right? Do it 30 times. Oh, I have so much more confidence that it's faster, right? So what you really do, you really want is you want to sample you want to sample the space of all of these things, right, all these different layouts, all these different guys. Some are going to say it's slower, potentially some could say it's the same, but you really need to cover this space to find out what reality is. So there's an objection that people often come up with at this point, which is hey, I like the faster guy. The faster guy is my friend. I just want it to be faster. Now, this is not just like I want it to be faster because I want to publish a paper, right? Although there's that. It's also, you know, look, you ran it faster, it's like, well, I don't care about the other layouts. Maybe I just want the good layout. I want the faster layout. So what if we just sort of stick to that layout. We say, you know, this is a damn good layout. We like this layout. Let's keep it. All right? So we're only going to talk to Bob, which means here, we're only going to use this layout, okay? So that sounds good. There are problems with this approach. Suppose you upgrade your C library. You kind of have no control over this, by the way. It's like time to update, there's some bug fix. This happens all the time. Microsoft C run time gets upgraded, whatever. Libraries that you depend on get updated. That changes your layout, all right? If you change your user name not you changing your user name, but somebody else uses it, right, that changes the size of your environment. I actually had this experience myself with a student, where I proposed an optimization, and the student came back and said, doesn't work. It makes things slower. I said that can't be true. And I ran the program, and it ran faster. He ran the program, it ran slower. We ended up running it on the same machine, two different windows, and his was slower and mine was faster. And his user name is longer than mine. That was the only difference. So if you change the environment variables, then everything worked out great. So this is kind of brittle, let's say. Or if everybody could just canonicalize and have the same length user name. So we could do that. That's one possibility. So yeah? >>: Or we should find optimum user names. >> Emery Berger: We should all, exactly, do a search. That's right. For every program on earth, that's right. I agree. Well, as you've already heard, it turns out my user name is optimal. >>: You have to make sure that the program [indiscernible]. >> Emery Berger: Right. So, in fact, that's not so hard to do, right. So you can actually make it so that it doesn't depend on that. But all of these other little factors, like you change one line of code in your program, you change a library, right, your libraries are getting shifted out from under you all the time. There's all these different versions of all these different DLLs, and a slight change in one changes everybody's layout, all right? The new directory is another phenomenon, again, same thing with the environment variables, et cetera, all right? So layout is really brittle. And it's brittle not just, you know, I'm running my one program, you know, you've got the whole execution environment. You're going to make modifications to your program, it's going to go through different versions, it's always changing. So you can't really stand on firm ground and say I got a 2.8 percent performance improvement, right, if all this stuff is happening. All right? So by layout biases measurement, great. What do we do about it? So it's bad news. Can we eliminate it? And the answer is yes, all right? And I'm going to show you how we do it. So stabilizer is the system that we built that directly addresses this problem, right. Memory layout affects performance. This makes it hard to evaluate. Stabilizer eliminates the effect of layout. And I'm going to show you that by doing this, you can do what is really sound performance evaluation. So what does stabilizer do? So Katherine already spoiled the big reveal here. So stabilizer randomizes layout. What does it randomize? It randomizes pretty much everything it can. So randomizes the placement of functions, randomizes stack frames, randomizes heap allocations, and it actually doesn't do it just once. It repeatedly randomize those layout during execution. And this turns out to be crucial for reasons that will become obvious later. But basically, when you do this, you eliminate the effect of layout because you're sampling all of these different spaces of layout randomly, right? If it's completely random, it can't bias the results. So I'm going to walk you through a few the of the mechanisms at a very high level. You already know that it's possible to randomize the heap. This is something that I've been doing for a little while now. Would be happy to explain offline. Here's some of the other stuff we do. So with stack frames, we actually modify the compiler so that instead of doing this usual thing, where you call function main, then foo, the stack frames are always adjacent. What happens with stabilizer is it will actually adds these random pads. These random pads mean that the stack frame start at different positions, okay? When it comes to function placement, you have your basic function placement that the compiler gives you, where it jams everything contiguously into memory. What we do in stabilizer is we install trap instructions at the very top of these functions, and then we randomly find places to put the functions. We make copies somewhere else randomly in memory, and we change the trap into a jump. So if you go to execute that particular piece of code, you'll actually end up executing somewhere else. We keep this little relocation table off to the side because all the functions that foo may reference could also move. So this will give us their addresses. So when we go and we invoke baz, baz gets moved somewhere else, and so on and so forth. So when you're running your program, stabilizer will actually go ahead and do re randomization. So re randomizing for the stack is pretty straightforward. You just do different random pads. For functions, it's a little more subtle. So the timer goes off and it goes and reinstalls the traps in the original locations. These are the old randomized locations. These have to stay around. They're sort of potentially garbage, but somebody could be running in them right now. They could be on somebody's stack. So it goes and it generates these new versions, and periodically, after some actually, during this re randomization period, stabilizer walks the stack looking to see if these things can be reclaimed. Yes? >>: So you have a little relocation table at each function. do you need it? Because everyone us pulling the original location. Why >> Emery Berger: That's a good question. So I'm sorry I didn't make that clear. When you're in this thing right here initially, it goes to the original location. And then it overwrites it and jumps to the main location all the time. So it avoids an indirection. >>: It's just for performance? >> Emery Berger: It's just for performance, that's right, yeah. >>: Why do you need to change placement? [indiscernible] so that every time you recompile, so you have different versions of the program and then >> Emery Berger: Yeah, okay. So there's one very sort of easy answer for that, and then there's the sort of deep mathematically motivated answer. The doing it statically compile time, sort of this is the weak answer. Experimentally, it's just a pain in the ass. If I want to go ahead and test my optimization, I have to re compile all of my code, and every single one will correspond to one sample. So if I do it this way, then I actually get to sample a large space. But it turns out that there is a statistical benefit that you get, which is really huge, from doing this re randomization, which I'm going to get to in a couple slides. Okay? So just hold on, hold that thought for a few slides. >>: So one more hopefully quick question. >> Emery Berger: Yes. >>: Maybe you'll get to it. So by again the relocation table, you can change how full prime calls bar because it [indiscernible] bar indirect. It really changes, as if it could potentially predict friends prediction and can change >> Emery Berger: Right, right so when you do a jump, the branch predictor, obviously, if it's a static jump, then there's no branch prediction necessary. However if you do a jump through an address, then there is actually branch target predictor. These work very well. So it's so, I mean that part, in terms of performance, is not really as much of a hit as you might think, and I'll get to some performance numbers very soon. >>: I mean, you do change something? >> Emery Berger: >>: Yes, we're changing something, you're right. You miss initially, then, if I was [indiscernible]. >> Emery Berger: Right, right. I think that your objection is somehow that we're changing the program. >>: The randomization has overheads that will swamp the other ones. I think that's a better that's the real problem that could happen. >> Emery Berger: >>: Right. So the concern. But your process, you're just making everything slower than >> Emery Berger: Right, so the concern, that's right. So the concern that you're getting to is another concern that I'm going to address, which is whether doing these changes actually affects your analysis. Okay? So obviously, I'm going to argue that it doesn't. But we'll get there. So let's, now that we have this in place, this whole thing of and bear with me, let's everybody assume that this is all going to work out great and it's going to be reasonably fast and everything is going to be fantastic. I'm going to talk about performance evaluation. Yes? >>: How do you know that you've addressed all the things that create these problems that you're trying to >> Emery Berger: Right, right. So the question really is, all right, so there's all of these different confounding factors. How do I know that I've addressed all of them? So I haven't addressed all of them. So I can give you an example of one that we haven't addressed, which is that you could randomize the senses of branches, for example. We don't actually go into functions and change anything about the code. Inside a function, right, you could imagine that there's some impact of having things in relative locations. Inside heap objects, right, inside stack frames, there's relative positions that we aren't disrupting. So there are things we already know we aren't doing. It turns out that these, you know I don't think we're covering the whole space. We understand the space we're covering and we can address it up front, which is a huge advance over doing nothing. Or saying here are a couple selected factors that we account for. Okay? And one of the things that we observe, well, you'll see what it does to execution times and it gives us some characteristic of execution times, which gives us very, very strong reason to believe we are covering things very well, okay? All right. So let's go back to performance evaluations. We run our things 30 times. Now, when we run them, because of randomization, we have a bit broader of a distribution, okay? So now is A’ faster than A? What do you think? How many people say yeah, it's faster. I see a couple nods and smiles. >>: I'd say confidence, it looks like. >> Emery Berger: Oh, you're saying you're measuring this chunk of the curve here? It seems faster, all right. How about now? Is it faster now? People seem a little more skeptical. Looks like it. How about now? All right. Now we're all like, I'm not so sure anymore. So the problem is that this is not actually the way that one should proceed to say these two curves are different. Right? It seems like, you know, we all have kind of an eyeball metric, right? We say, oh, if the curves are really far apart, then it's good. If they're close, I'm a little less comfortable about it. But, in fact, this is not really the way that people do things from a statistical standpoint. So what people do when this is what people in the social science deuce and people in biology and all of these folks do, they're faced with these kinds of problems all the time, right? So I have a drug. Does this drug have a significant impact on this disease or not? What they don't do is eyeball it, okay? They don't say, seems good, right? Everybody use this drug, right? So what they do is they adopt this hypothesis testing approach. So the way a hypothesis test works is actually quite different so we don't ask the question are these two different, right. Is this faster than the other. We say if we assume that they're the same, okay, this is called the null hypothesis. Assuming they are, in fact, identical, what is the likelihood that you would actually observe this kind of a speedup due to random chance? That's the statistical approach. So it turns out that this sort of thing, you know, often we make these assumptions of normality in biology and in social sciences, and it's very easy to compute what the likelihood is that you're going to end up with this null hypothesis appearing to be true by random chance. You all have, I imagine, seen this curve. This is the classic normal distribution. And you say, you know, the odds of being more than three standard deviations away from the mean due to random chance are less than 0.1 percent, all right? So by giving us this situation so we do this randomization. We do all the stabilizer stuff. Now it's actually going to put us into a position where you can ask the question exactly this way. What we're going to consist what is the probability that we would observe this much difference just randomly? All right? And the argument that everybody makes statistically is if this probability is low enough for some definition of enough, right, then we argue that the speedup is real. And because we've randomized the layout, we know that the speedup is, in fact, not due to memory layout. Okay? So there was this question before about why not just use a static, one time randomization, right so what does rerandomization do for you? So this is an empirical result of execution times with exactly one random layout per run. And you can see that it spans some space, but it's fairly uneven. So this is just, you know, we start off, we do a randomization. It's a one time randomization, we run with it the whole time, okay? Here's what you get when you do it many random layouts per run. You can see that curve looks very, very different, right? It's actually a nice peak, it has a tail, it's unimodal. What's going on? So stabilizer, the way stabilizer works, it generates a new random layout every half second, all right, and if you think about it, what's happening? You've got your whole program, and it's composed of a bunch of these half second epochs. Your total execution time is the sum of these periods. So it's the sum of a sufficient number of independent, identically distributed random variables, right. Those are proximately normally distributed. This is the central limit theorem. So doing this randomization repeatedly actually means that you can use the central limit theorem. >>: So if I take this program, compile it and ship it to a customer, it's going to run on some random [indiscernible] but that remains the same for the entire run of the program? Right? Because what you generate is this [indiscernible] that is not available to anybody else. >> Emery Berger: Wait, wait, wait. what you're saying? >>: I'm assuming that >> Emery Berger: >>: Is it identically distributed? You can design a hardware that is, you know, randomly >> Emery Berger: >>: Oh, you could. That's an independent question. >> Emery Berger: >>: So I can't ship with this is You could. I have. Let's assume that that's not happening. >> Emery Berger: Okay. >> Then I would claim that what I'm actually running that is creating lots of static configurations and then running each one of them like 30 times and then doing the sum of over all that. If you do the sum over a large enough configuration, you should also get the same number distribution. >>: What is large enough? >>: I mean, you do it until you get the strongest [indiscernible], right? Central limit theorem. >> Emery Berger: So what is your >> So I think the reason why you're seeing this is just that just that you're able to explore a lot more random configuration larger configurations changing dynamically, versus what you're doing at static. >> Emery Berger: That's right. So I think you're making an argument for using stabilizer, right? So what you're saying is boy, you could get a lot more experiments done a lot faster by using stabilizer than by getting a one time randomization with a whole program recompilation and doing that over and over and over again. >>: No, no, the evaluation, whether I spend like 20 seconds or 20 days, that's not a question, right? >> Emery Berger: but okay. I think it's a question for a lot of people, >> [indiscernible]. >> Emery Berger: Yeah, but so then what do you do with that result? I think are you saying that we're not going to use this to do the hypothesis test, or you are? >>: I'm just saying that, you know, actually changing having the static configuration is similar to what the customer is going to be running so that's why we get a truer measure. >> Emery Berger: >>: So which run are we going to ship? You can randomize [indiscernible]. >> His point is only that you can eliminate some of this overhead because you're doing the re randomization, and that's closer to what the real >> Oh, well, so there's overhead and then there's randomization. I thought what you were actually going to argue when you first started talking was that you're going to say, I don't want a curve, right. I don't want a curve. I want some point in the space, right, and I want that space to be probably one of these sort of extreme to the left points in the space. >>: Right. So my point is go to the previous slide. one that had two [indiscernible]. >> Emery Berger: There is Yeah. >>: I think you're seeing this just because you're not running enough randomly. >> Emery Berger: So let me be clear. It's not just the code that's being randomized here, right? It's the code, it's the stack frames themselves, and the heap objects. Like the heap allocations are actually quite randomized. It's quite as randomized as die hard. So die hard has this very, very strong randomization property. We actually install what we call a shuffling layer on top of a conventional allocator and it generates a lot of entropy. So there's actually a lot of randomization happening here. the data space and in the code space. So in >>: But would you get maybe a different distribution if you did [indiscernible]? >> Emery Berger: So if you just randomized the code, you won't get as much of a sample of all of the layouts, because you would be still sticking with the same memory allocation. You'd still be sticking with the same stack frames. So this does more extensive randomization than just code. >>: This is something you really get, or this is just to >> Emery Berger: >>: This is an empirical result. How many runs are in the blue thing? >> Emery Berger: It's 30 for each. >>: But maybe [indiscernible] point is just that unless you ship stabilizer, unless you ship your code with a stabilizer, it may be that so many of those points are coming from the places where you're going to have you're going to have layouts that didn't that would never correspond to any real layout, because the real layouts don't have these paddings. The real layouts don't pack in the it's somebody >> Emery Berger: Well, of course, it's possible, right, because the randomization can lead to anything that's available through randomization is also a possible real state, right? It's just maybe a low probability state. But it's randomly exploring all of the state space. But I think that you're >>: Your random states always have some padding between, say, the stack >> Emery Berger: Well, but the padding can be zero. >>: But, for example, you put in new libraries, the link, the layout. So at the deployment site, like you could try to explore and move yourself into the right portion of the distribution space. It's just something that's unlikely to get randomly, but good performance, right? So you could try to do some optimizations that push you over there. >> Emery Berger: That's right. >>: But if you're just shipping something, you have to say, oh, I could be any place in this curve rather than the place I have with my one testing. Because that's just one sample. >> Emery Berger: Right, and remember >>: I think that's the most interesting thing. we're always over there on the >> Emery Berger: We'd like to say We'd always like to be here or something. >>: Their, in terms of performance and that's the distribution over time. >> Emery Berger: >>: Right. But you can't guarantee. >> Emery Berger: And it turns out, as we'll show that, in fact, there are many cases where throwing stabilizer into the mix, because of the randomization, actually turned out to improve performance. So it improved performance because the actual compiler and whatever the memory allocator and whatever the libraries and whatever the state of the world was, that was actually less than the mean, right? Less than the mean. It was higher than the mean execution time and stabilizer tends to regress towards the mean. Yes? >>: Do you happen to save the layout as you randomize [indiscernible]. >> Emery Berger: Yeah, so this is something that my student, Charlie, is actually working on right now. So Charlie is, in fact, doing more or less what Katherine described and what you're sort of suggesting, which is what if we could deploy something that did this randomization and actually deliberately targeted the left extreme, right? And so it can observe these things. Essentially, it's doing online experimentation, and then steering the execution that way. But this is unpublished work that's not out yet. >>: Is it submitted? >> Emery Berger: It is not in submission right now. So everybody jump on board, just like download stabilizer and beat us to the punch. Actually, I will say that using stabilizer to do this, because it does add some overhead, turns out to be not the best approach. So Charlie naturally wrote his own dynamic binary instrumentation system. That's not easy. >>: Yes, so good luck with that. [inaudible]. >> Emery Berger: Yeah, yeah, but it has to be anyway. I know. So let me go ahead and show you what happens when we use stabilizer. So we went and we decided to try stabilizer with spec, all right, and LLVM. And in particular, we wanted to understand what are the effect of these different optimization levels. So I think that most people have a sense that optimization, essential the layman thinks 01, pretty good, pretty fast, doesn't take long to compile. 02, takes longer to compile, but produces better code. 03 takes a long, long time to compile, produces somewhat better code, right. There's a sense it's not a linear increase in speed and certainly not a linear increase in compile time but that it does something good. So we wanted to see if that was, in fact, true. So we ran this across the spec benchmark suite. We did it on each benchmark and then we also did it across the entire suite, okay? So the first thing we did is we built benchmarks with stabilizer. Stabilizer is a plug in for LVM. You can invoke it just like an ordinary C compiler. It's SZC. If you run it like this, then it randomizes everything. However, you can optionally choose one thing that you want to randomize at a time. So this addresses Katherine's concern. What if you care about a particular thing and not all of the possible randomizations. The default is all of them are on so that corresponds to code and heap and stack frame randomization. >>: What about [indiscernible]. >> Emery Berger: So it turns out that there are good reasons not to randomize globals, and it's a pain in the neck. But >>: Addresses are in the code. >> Emery Berger: That's right, so it's actually a lot harder problem to randomize. So now we run the bench marks, okay. So we run them as usual. We run them 30 times each. But this time, we drop the results into R. So R is this statistics language all the graphs you see in this presentation were actually generated by R. R produces lovely, lovely graphs. It is the tool that statisticians and many, many social scientists and biologists and so on use to do like analysis. Statistical analysis. So we get this result, all right. Is A’ faster than A? Obviously, this is the wrong way to do things so we do the null hypothesis construction. We say if A’ equals A, right, then we're going to measure the probability of detecting a difference at least this large. So what's the way that we do this in R? We use something called the student's T test. So the student's T test, this is how you invoke it in R, pretty simple. Allows you to say, well, if the probability is low enough and the standard threshold in everywhere is this arbitrary point of five percent, you can choose whatever you like. So if this probability is below five percent, then we reject the null hypothesis. The null hypothesis is that they're the same. So that's the name of the game. The name of the game is we're going to try to reject with very, very high probability, high confidence, I should say, that the null hypothesis is true, okay? And what that means is that in this case, that whatever we're observing is not due to random chance. All right? In other words, the difference is real. So here are some run times for 02 versus 01. These are speedups on the Y axis. The X axis is ordered increase in speed of all of the spec benchmarks. And you can see that so all of these things are green. These are the error bars. These are, I think, actually don't recall the percentage around the error bars. I think it's one standard deviation. So all of these are statistically significant. And so you can see that in some cases, you got a statistically significant, huge increase in performance. That's for astar. In some cases, here, actually, I can't see if it's red or not. I think that might be the one that's not statistically significant. Not too surprising you, right, it a very small difference. But here, we actually get statistically significant performance degradation from using 02. All right. And it just turns out that the layouts that these guys ended up with were, you know, it's like, well, you know, there's this huge space. They end up in some layout, that layout turns out to be maybe it was lucky back when the person did the implementation of 02, but it turns out not so good right now. >>: No, because you have some set of benchmarks and sometimes you slow some down, right? And on average, you have all these benchmarks. >> Emery Berger: All right, so, all right. >>: But you could still choose to turn this optimization on, even though it doesn't speed up. >> Emery Berger: I see your point, so your argument is that, well, all right, across a suite, it not going to have this much impact, right? Maybe it doesn't improve everything. >>: Right. >> Emery Berger: Some things it degrades. So I think that for 02 versus 01, this is pretty surprising. Because 02 is pretty bread and butter optimizations. I think that if you presented the optimizations for 02 to almost anybody versus 01, you would think these are going to improve performance, and they don't. Yes? >>: So this graph is with stabilizer? >> Emery Berger: >>: This is all with stabilizer. How did the graph look without stabilizer? >> Emery Berger: So, yeah, so we actually ran this experiment. I think, I'm trying to remember, but I think that the speedups are actually speedups, I'm going to have to try to remember, I would have to look at the paper. I can't remember if the speedups are with respect to the stabilizer build or with respect to the actual original execution. But we definitely observed cases where these optimizations slow things down, and when you throw stabilizer at them, it makes them run faster. So it's a bit of a weird issue. But yeah, I would have to check. >>: Pretty disappointed. improvement. I thought it was much more, the >> Emery Berger: Yeah, it's not yeah, it's not a huge amount. I agree. But all right, well let's go to 03, all right? We can crank it up. >>: This is on one machine, configuration? >> Emery Berger: That's right. >>: Who knows when they wrote these and what the machine looks like. >> Emery Berger: Yeah, yeah, so that's an excellent point and that's something that's totally out of reach for stabilizer. So you could do this with a simulator in conjunction with stabilizer, something like that. But you can't, like, stabilizer is still observing the execution on your actual machine. And it's having this effect of disrupting, you know, the memory layout. But if you have a machine that has a one megabyte cache and then you go to a machine that has a 16 megabyte, L3, the performance is going to change dramatically. Right? And there's no way to account for that. >>: So 01 is sort of the debug? >> Emery Berger: 00 is actually the debug. 00 does nothing. 01 does some very simple things. 02 does more advanced things. Especially register allocation as well as [indiscernible]. >>: Did you measure 00? >> Emery Berger: Yeah, 00, well, it turns out so slow that measuring it for the entire suite would take months. So we didn't do it. Yeah, it's bad. Okay. So here's 03 versus 02. So this is the same axis as before. So I'm going to make the axis ten times larger so we can actually see what the differences are. >>: Great for research papers. >> Emery Berger: Yeah, yeah. So anyway, you can see that these performance differences are quite small. Now, one of the things that interesting is this is a very, very small difference, right? But because we're using stabilizer, we can actually say this is statistically significant and this one is not. And the eyeball test, you'd be like, you know, especially this eyeball test. Like totally insignificant, right? Because it's so small. But that's not really how these things work, right? So we actually get these statistically significant improvements and again some statistically significant degradations on the left and it's kind of a wash in the middle. It's interesting that there's this one, which appears quite large but is actually not statistically significant, okay? All right. So what do these results mean? So right now, I've presented results on a per benchmark basis, okay? But that doesn't actually tell you what the difference is between 02 and 03 because this is a point wise comparison, all right. So what we actually do is we need to go ahead and run all these things 30 different times, right, LBN, astar, et cetera, et cetera, we get a sequence of graphs so this is when we aggregate them. Before, the results I just showed you, we were actually looking at here's one benchmark. Compare it to the Bench mark with and without this treatment. I want to know something about 03 and 02 in general. What I showed you just now was actually not the way to do it, right? This is sort of what we want to know, right? Is 03 faster than 02. And, you know, looks like it's slower here, faster here, and you're like well, sometimes it's good, sometimes it's not. But this is not actually the way to do this. It's, again, we have to go back to the null hypothesis treatment approach, right? We say if these two were equal, what would be the likelihood of measuring these difference, all right? And to do this, there's a generalization of the student's T test for more than one thing, and it's called there's actually a whole family of these tests. It's analysis of variants, all right? So you can again invoke it with R in the beautiful R syntax, as you see there. And you get the same sort of results. You can say if the P value is below five percent, then we reject the null hypothesis, okay? So we're going to go ahead and say, you know, that these things are, in fact, different. So when we compare 03 and 02, we get a P value of 26.4 percent. We wanted it to be below five. That means that we're only 73.6 percent confident that these results were not due to random chance. In other words, one in four experiments will show an effect that doesn't exist. So this is a classic failing of the null hypothesis. That is, you can't reject the null hypothesis that 03 and 02 are the same. That all the effects you observe are due to randomness, okay? And, you know, colloquially, we say the effect of 03 versus 02 is indistinguishable from noise. Yes? >>: I agree with the [indiscernible] null hypothesis, but I didn't see you addressing the case where you have the cut off point that actually tells you what the sensitivity of specificity of the one test against [indiscernible]. >> Emery Berger: the question? Are you talking about the effect size? Is that >>: Where is the turning point, the cut off point? [indiscernible]. That cut off point is very important. It tests you by moving [indiscernible] your test is really sensitive enough that you [indiscernible]. I didn't see you addressing it. >> Emery Berger: Well >>: I agree [indiscernible]. But another possible is [indiscernible] too, because you are choosing >> Emery Berger: Of course. So the choice of a P value is always I mean, this is just the way it works, right? You pick some P value. Theoretically, the way this whole thing is supposed to proceed is you pick a sample size in advance, you pick a P value in advance, and then you go and you do your test, right? In fact, this like presentation of the P value here of 26.4 percent, you don't use that value. All you do is you say, we can't reject the null hypothesis, right? So that is the sort of standard statistical methodology that I'm employing. Great so you know much more about this than I do, right? You probably forgotten more about it than I've ever learned. I'd be happy I'm more than happy to talk to you about this after. Okay? Terrific. Okay. So there is this concern about this is not really a probe effect, but it's an effect of using a tool. Is there something about stabilizer that hiding the effect. So what there some systematic bias in stabilizer that's changing the is effect that we're observing, and so it turns out that, you know, we observe all of these speedups when we run it with 03, 02, 01 or 00, and stabilizer actually independently of the randomization it employs, adds the same sort of impact to all of them, so it has a fixed additive increase in the number of cycles, the number of the number of branches taken, the length of the path, et cetera, et cetera. It clearly disrupts the memory layout, but that's the point. But everywhere every other point in this space where we're counting cycles or we're counting reads and writes and so on, it stays constant. So we end up getting a basically the same sort of additive increase. >>: I'm not sure you can claim it's the same additive increase, right? So if it's not optimization saying [indiscernible] one of your big points is about jumping to the, you know, stack frame for that method. You're not doing inside method >> Emery Berger: optimization from happens after all question. So all optimizations. >>: Oh, right, right. So do we actually prevent happening? So we don't because this actually the optimizations. So it's an excellent of this stuff that happens to the code is post It doesn't change the inline decision? >> Emery Berger: Right, all the inline decisions, all of the optimization decisions have happened, and then it goes and it does this >>: And there's fewer if they're inlined the methods, then they're not jumping to the method so they're not experiencing the randomization of >> Emery Berger: That part is true. >>: So I'm just not sure about how these are constant across the different >> Emery Berger: So the question so the issue really is if we take the code that's been produced by LLVM, is stabilizer doing something to it that's disrupting the optimizations, and the answer is no. Because all of the optimizations happen first. And then it goes and it does all the stuff to instrument it with randomization. >>: But the you point is valid for inlining. You, by performing inlining, you've few reduced the potential and searching points for stabilizer. >> Emery Berger: >>: So there's less disruption. >> Emery Berger: >>: That's right. In these other >> Emery Berger: >>: That part is true. That's right. If you pair 00 with 03. >> Emery Berger: Agreed. One of the good things is if you go and you actually observe the run times of all of these executions I don't have these graphs here, but they're in the paper, for all but two of the cases, if I recall correctly, we get normal distributions of execution time. So you can still do all of your hypothesis testing. The reason that you might not get it is directly related to this problem. It's not actually so much because of code. So how could you fail to get a normal distribution? Well, if you're not actually getting any independent randomizations, then you'll get none. So how can that happen? means no randomization of any functions. >>: One giant mean. That [inaudible]. >> Emery Berger: Well, there's custom memory allocator. That's actually the big problem, okay. So some of the spec benchmarks have custom memory allocation actually, many of them have custom memory allocators. And if all you do is spend all of your time in one giant array allocated on the heap, then you actually can't randomize within it. That's the biggest problem. Luckily, in almost all the circumstances, it doesn't matter. There's enough other allocation and enough other functions to obscure this. Yeah? >>: But code LAN acquisitions, like hot code placement, are going to be totally disrupted. >> Emery Berger: That's right. This goes back to Katherine's point. So in that case, what you do is you run this without dash R code so you say dash R data, dash R stacks. >>: Do you see any effect from that? >> Emery Berger: >>: Okay. Like 02 verse 03? No, they don't actually do this. Microsoft compilers can. >> Emery Berger: I know. LLVM does not, all right? So this is just performance of stabilizer. In some cases, it slows things down considerably. Like I said, it slows it down sort of uniformly, but it does slow things down. Thank you, Katherine and Ben. PERL is a disaster. PERL is well, the benchmark is ridiculous in many ways, but PERL is a disaster because it has it's a giant switch statement with function calls. That's all it does, all right. And so if the functions are placed randomly, then they don't all fit in the TLB. That's really what we see is we see TLB pressure here. So if we had a hierarchical TLB, we had a bigger TLB, these problems would go away. But that's the bulk of where that cost comes from. And most of them are low. The average is about ten percent. >>: If it wouldn't randomize, they would fit? >> Emery Berger: Yeah, so what happens is that all the functions are getting laid out, right. They're just function, function, function, function. So they're very compact. >>: They do this on purpose already? >> Emery Berger: No, it's just an accident of the way that the code gets laid out, right. Nobody randomizes code. So the code is just there. >>: It not be an act that fits in the TLB. >> Emery Berger: Oh, there's take a look at the PERL code. >>: okay. So I highly recommend you There's not that much happening. Interpreters are structured the same, right? >> Emery Berger: right. It's a classic interpreter design. That's >>: Anytime you take something that fits in a fixed size structure and do anything to it, you now don't fit in the fixed size structure and TLBs haven't grown in years. >> Emery Berger: That's right. >>: What about cactus. GCC to be over there. Right. Like I totally expected PERL and >> Emery Berger: Yeah, so what's happening here, if I recall correctly, this one here is actually the same problem with the TLB. The TLB is what kills you, but this is for the heap. So the randomizing heap location. >>: All right. >>: I thought there was an overhead with PERL just for random die hard. >> Emery Berger: Yeah, it was >>: There was a bunch of overhead anyway. >> Emery Berger: Again, it's TLB so yeah, so every all of this overhead is basically attributable to TLB. There's a second order effect, which is like L3 misses, but TLB kills you. >>: So you need super pages? >> Emery Berger: We need super pages. We thought about that, actually. But we decided not to actually go that route for mostly because Linux makes it real, a real, real pain in the neck. It's not like you say I want to do so you have to do sort of if you boot into your system and you boot into super pages, everything is fine. If you want to actually allocate chunks of memory in super pages, it's a mess. It's really very bad. Yeah. >>: So looks like what you need to do, another alternate design would be to actually still have compact code but randomize functions within that, right? >> Emery Berger: Yes, so that's an interesting observation. So part of the problem here is that what we need to randomize. So undoubtedly, we have too much randomization, okay? And we can randomize in compact way. It is possible. I know how to do it. We didn't do it. One of the reasons is what we actually need to randomize is a certain range of addresses that are the bits that are used in these hash functions. And those are actually not the low order bits, right. So they leave off the low order bits. They leave off the very high order bits, and they grab some in the middle. So it's very important that you randomize those. And that is going to lead to things being on different pages. >>: That's for the TLB? >> Emery Berger: Yeah. >>: So what does it mine, like if you slow down 40 percent, and now you measure 02, right? >> Emery Berger: Yeah. >>: I think what people have been saying, the original one is really the one that I'm interested in. This is like, it's such a huge effect. >> Emery Berger: It has a huge effect. So it's important to understand that the effect of stabilizer has that is the dilation effect, does not actually affect its ability to discern very, very small differences. It's just a question of running it more times. Okay? So if you imagine, like all I'm doing is looking for signal to separate from the noise. So if there's a very, very small amount of signal, I will be able to discern it. I just need to run more tests. So having the test itself be slower doesn't actually alter the ability for us to do statistically sound performance evaluation. Now, the result that you get, I mean, obviously, when you run stabilizer, you probably don't want to ship with this, right. But that's a whole separate question. The question is, we're trying to understand whether these effects that we see in performance differences are affected or not. If both of them get massively dilated, that's okay, even if those effects are very small, if they're consistent, then we'll detect it. >>: So your audience is the person who is choosing 02, 03 and making the choice should I spend the time to do 03 versus 02. Is it worth it, right? It's not in some sense the end person who is running the program. >> Emery Berger: It's certainly not the end person, but I would say the audience is much broader than the people running 02, 03. So my audiences are the following. One, developers who ran their code and their code seems to run a little slower. They think oh, my God, I got a two percent performance regression, right. Well, before you go chasing that down, right, find out if that two percent matters or it's going to go away tomorrow when you modify some more code. So performance debugging. >>: Performance regression. >> Emery Berger: Well, both ways are meaningful. Certainly, if you're like I want I have this crazy, super complicated way of doing something, and if I plug this into Internet Explorer it will make it run 0.5 percent faster. But it's going to be a source of bugs and maintenance nightmare and all of this sort of thing. Do I really want to do this or not, right. Is that meaningful. You can decide whether 0.5 percent is meaningful or not as an effect size, but you need to know whether it's statistically significant or not. But the other audience is really researchers, right? So researchers publish lots of papers on performance optimizations of all stripes. Not just compiler things, right. Run time system stuff, different algorithms, different lots of things. And there's a kind of, you know, well, seems to work or the number is large. You might say oh, if it's over 20 percent, that's clearly good enough. But actually, we see larger than 20 percent swings just by doing static like link time, like change the link order, change the environment variables, right. You can easily get a 30 percent swing in performance just with that. So it's important that people, when they go ahead and produce these results, that they know that they're real. If but is it for grandma? No, it's not really for grandma, right? >>: Or grandpa. >> Emery Berger: Or grandpa. I definitely am happy to include both grandma and grandpa in this. It's definitely ageism, but certainly my mother and my father have no idea what's going on. So anyway. >>: So when the effect is so large. >> Emery Berger: Which effect? Of your optimization? >>: No, no. When the effect of adding this tool to your run time is so large, how do you control for its variation. Of course, more runs, right? >> Emery Berger: So its only source of variance is its effect on memory layout, right? It doesn't actually do anything different itself. >>: But it's the observer effect, right? Has two parts to it, right? One is because it's code and it's in the run time with your program, this is something that wouldn't have been in the run time so there's some effect. >> Emery Berger: That part is true. >>: There's the null effect where it gives you the same layout and it has some performance impact, right? Have you tested that? >> Emery Berger: So if it gives you >>: Run stabilizer and make it generate exactly what it did before and then what's the overhead there? >> Emery Berger: Yeah, so the overhead is, again, the overhead is totally swamped by these memory layout effects, right? There's almost >>: That's just interesting, how much overhead effect. >> Emery Berger: So we measured that. I think it's on the order of two percent. So this is exclusively from indirection. And actually, Charlie has a faster way of doing this that should make it go away. But taking the traps. So every time you take a trap, then, you know, you take some hit, it's tied to the frequency with which you do this relocation. The longer your time delay is, the less you see. There is a small effect on straight line execution of code. Without randomization. It's very, very small. >>: Let's let Emery >> Emery Berger: I'll wrap it up. I got to the punch line a long time ago. Ta da! The punch line is I know you've been waiting. And then the cat says to the dog. So anyway, so memory layout as I hope you all will agree, affects performance. Makes it difficult to do performance evaluation. Stabilizer controls for the effect of layout, it randomizes it. This lets you do sound performance evaluation, and we did a case study. Case study shows that, in fact 03 does not have a statistically significant impact across our benchmarks. And you can actually download this today. Is at stabilizertool.org. I wanted to thank the NSF and, sorry, guys. I want to thank Google to helping to fund this. Google also is funding Charlie's Ph.D. So he got a Ph.D. fellowship from Google. So anyway, that's it. Thanks for your attention. And I'll put up one backup slide here. So this is 02 versus 03. >>: So did you verify that stabilizer is [indiscernible] if you change the amount of variables? >> Emery Berger: You mean the performance? >>: You run that, your student runs it in his name and your name and you get exactly the same results. >> Emery Berger: So yeah. So it does, it does, because the start and the position of everything really is totally disrupted by the all the randomization, right. >>: You verified that? >> Emery Berger: We verified it. I'm pretty damn sure we verified it. I should say one of the things, so this is the little secret about stabilizer, you know, stabilizer gives you performance predictability. Right now, the cost is too high. But it gives you this performance predictability and you can argue, look, the chances of me being more than three standard deviations away from the mean are less than 0.1 percent, right. And so the fact that you get this actually gives you very, very predictable performance. So it would be completely shocking if you observed anything different. But it turns out that it's immune. So I'm pretty sure the paper actually has an experiment to this effect. Yeah. >>: Well, I mean, one thing that's interesting, and it goes to this question, which is now there are changes in the environment, and some will affect the performance and some won't. So the claim is that they are very well shouldn't have performance. If I change the library, maybe it will, maybe it won't. Might make something faster, right? But you can actually you can, you know, evaluate that. Like every different piece of hardware, you can look at now with stabilizer, you can say how do these compare with stabilizer, you know, compared to without stabilizer. So it seems like, you know, you have a better way to understand the real effect of the environment, which I'd be curious to see in terms of like Katherine said this great work, I'm looking at like 20 different architecture things that affect energy and things like that. You're doing the same thing with stabilizer to see how much difference >> Emery Berger: That's an interesting point. I mean, I would be leery of doing this well, it's an interesting question. I'm not sure whether the energy impact of the memory lay outs, I would have to imagine that that would be substantial. And but you could certainly detect whether there's a difference. I would have to think very carefully, though, about what the meaning would be of running your stabilizer results on machine one and running it on machine two and comparing them. You need to really have a pretty clear null hypothesis. So I guess the null hypothesis would be that these machines have no impact. That would be a very surprising null hypothesis. >>: For P3 and P4. I mean, the pentium 4, it wasn't clear that it was really better, right? >> Emery Berger: Oh, yeah, sure. >>: So I'm sure there are cases where you could actually confirm that. >> Emery Berger: >>: Right, right. The trace instruction go ahead, Tom. >>: The other question I had is you could adapt your sample or your where your layout. >> Emery Berger: Yes. >>: To reduce your overhead. And if you still got acceptable sort of experimental results so have you looked at that? >> Emery Berger: So we deliberately chose this number because we wanted to make sure to guarantee we got 30 samples for any reasonable execution. And so you do need to have some number of samples to make using these hypotheses tests meaningful. We are looking at altering the rate at which we do these things based op performance counters for this other tool that Charlie is working on. Yes. So you're randomizing layout stuff here. I can see, because actually that's a large. But are there other possible effects, other than layout, that could affect, you know, performance, even with small changes in your code? Non layout related. >> Emery Berger: Oh, non layout related. So I believe that there are. So this but it's tricky. So I've spoken to one of my colleagues, Daniel Jiminez, who is an expert on branch predictors and he says, in fact, there are other things you might want to do beyond just laying things out that would actually alter the way that the code behaves in the branch predictor. I'm sure there are many more very subtle things. But he also thinks that those things are second order, that the caching and the addresses themselves are going to be where the big hit is. The question, however, of randomizing things within objects, so you have a function, right, or you have especially for a function and the code, right, the relative position of them, because they all move together, means that there's a lot less randomization happening. So getting deeper inside would be a way of getting, you know, at more possible layouts. Yes. >>: You mentioned offhand that the randomizing stuff [indiscernible] which means they could be really [indiscernible] optimization you want to do to your code do you think this is probably going to speed things up because of [indiscernible] you'd like to factor out the [indiscernible] by randomizing that and making sure that your optimization is getting the speedup from the source [indiscernible]. >> Emery Berger: Sure, sure. So there's actually some work. I think it's Mark Hill, if I recall correctly. It's either Mark Hill or David Wood, somebody in Wisconsin. They actually do inject some randomization into simulation to try to get some error bar, because otherwise everything's exactly the same all the time. But they haven't done to my knowledge, nobody has done anything like this within the simulator. You know, it begs the other question, obviously, about simulators. >>: Not on purpose? >> Emery Berger: Yeah, right, not on purpose. That's right. Yeah, it begs the question of, you know, simulator fidelity to the real platform. But that's always a question. Yes? >>: My question [indiscernible]. You are optimizing the layout here. Do you consider all factor and how do you [indiscernible] availability that might have come from there. >> Emery Berger: So I can't say that we've considered every factor, right. What we did is we said, you know, we >>: You said factor you consider and how you came to choose layout. >> Emery Berger: Oh, beyond layout. So it turns out we focused on layout, actually, specifically because it has two very direct consequences on very, very performance critical pieces of your hardware. One is the cache and one is the branch predictor, right. So the cache is huge, right. If you miss in the cache, you go all the way out to ram. It's a hundred times slower than if it with are in L0. So this is a gigantic, gigantic impact. Branch prediction has a much less dramatic impact, but it also has an impact on performance. So that's why we focused on layout. >>: Okay. So within a fun the branch layout matters a lot. Like that's an optimization that has a lot of impact on performance. So how would you generalize what you did? You'd have to change the compiler? Because the offsets are baked into the code because you don't want to look up the offset of the L branch, right? >> Emery Berger: Absolutely right. So Charlie and I have talked about this. I guess we should wrap this up very quickly. Briefly, the idea is that you generate different variants of the code. You generate different functions, and then you sample those at execution time. When you're doing rerandomization, you don't just take, you know, this function, move it somewhere else. You take function version K. And so >>: K is good. >> Emery Berger: >>: K is fantastic. K is always good. Thank you. >> Ben Zorn: We want to thank Emery.