>> Ben Zorn: All right. Well, welcome, everyone. ... pleasure to again introduce Emery Berger from the University of

advertisement
>> Ben Zorn: All right. Well, welcome, everyone. It's a great
pleasure to again introduce Emery Berger from the University of
Massachusetts. Emery is here for a month and he's still here for
another week so if you want to talk to him, you're welcome to.
Emery's done a lot of interesting work in systems. We heard last
time about auto man, and this time we're going to hear about a
stabilizer, which is a very creative way to slow your program
down but make it better. Thanks, Emery.
>> Emery Berger: Yeah, turns out
thanks, Ben, yeah. That was
great. Yeah, it's a lot easier to slow down programs than to
speed them up. So I have a rich career ahead of me. All right.
So this is some work on performance evaluation that just recently
appeared at ASPLOS. I actually could not attend ASPLOS this year
because I was sadly in Rome. So I it was Rome, Houston. But
this is joint work with
yeah, right, exactly. So my student
Charlie is the lead grad student on this work. The only grad
student and this work, and he presented for me. And he, I hear,
did a great job. So hopefully, I'll do as well.
So, you know, I think most of us in this room actually care about
performance, right? A lot of people in general, I think, when
you write your program, you think
or your optimization or your
system, you say well, we really would like to show that, in fact,
speeds things up, not slowing things down, right? So the problem
is it turns out that largely
this isn't really necessarily
everybody, although I include myself here. Many of us have been
doing it wrong, all right? So we've been doing performance
evaluation for years and years and years, and we actually have
been making some crucial errors. In particular, things that we
think are best practice, like running things repeatedly and
getting error bars and showing all these very pretty graphs,
that's not actually enough.
And the reason is that there are things that really are lying
under the covers that we're not actually seeing that have a huge
impact, and they have such a huge impact that it actually makes
the results that we're reporting potentially meaningless.
So what do I mean? So in particular, there's this problem on
systems which is that the layout of the objects in memory, the
objects include your data and your code, dramatically affect
performance. And even very, very slight things that you might do
to your program actually have a pretty serious impact on layout,
right. So if you change some of your code, it moves things
around. If you change some allocations and so on, it moves
things around.
And right now, when you run your program a bunch of times,
there's really no way to isolate this effect, and this effect is
super, super important. So the goal of stabilizer is to
eliminate this effect. And by eliminating this effect, it's
going to enable to us do what we call sound performance
evaluation. I'm going to explain what that means, all right?
And not only am I going to explain what it means, I'm going to
show you some case studies that demonstrate the value of using
stabilizer and taking this sort of sound performance evaluation
approach, okay?
>>:
I have to argue already.
>> Emery Berger: All right, just one second.
slide. So the case study
Finish up the
>> Just like old times.
>> Emery Berger: Yeah, really. The case study I'm going to
present is an evaluation of LVM's optimizations, and so I'll show
you what the impact is of using stabilizer and how it lets you
understand things differently. Katherine, go.
>>: What happens if your performance optimization is to chase a
layout in order to improve performance? So then randomizing
layout, like, defeats the purpose of what you're trying to do.
>> Emery Berger: Yes. So I'm actually going to talk about that
a little bit later. But it is true that if your goal is to do
something that affects every single aspect of memory layout, not
just the data placement, but also the code and also globals and
also the stack frames, then stabilizer would be undoing those
things. However, if it only attacks one, which would be normal,
like typically data, for example, or possibly code, then
stabilizer allows you to isolate all the other effects.
>>: Okay. So you have a base. So I'll derail you just a little
more. You have a base system that does code layout, which it's
trying to give you good cache locality. But then you do a data
layout optimization. And so do you want to screw up the finely
tuned code layout as well to see if your data layout is
independent of that? Like that's the key thing?
>> Emery Berger: So here's the thing. So first, I think it is
you know, there's this fear of optimizations, like the ones
you're describing.
>>:
I'm not afraid of them.
I want to make them.
>> Emery Berger: No I understand. I'm not saying that they're
scary, you know. Then there's many, many other things people do
with their code, right. So they go and they manually try to
optimize their code. Or they're writing their code and they add
features, and the features make the program go slower, right. So
this is a performance regression, right?
So the problem is that you run your program, and we'll get to
this in a second. You run your program and you see some
performance change and you draw a conclusion. And that
conclusion may be completely unfounded because of all of these
other confounding factors, right. So these confounding factors
are all of these artifacts of running on real systems today.
When it comes to very, very precise things that actually depend
on memory layout, the question is does it depend on every single
aspect of the memory layout or not. Things like stack frames
typically are not subject to these kind of optimizations. It
depend. Code placement definitely. Heap object placement, iffy,
right. So there are a number of these factors that we can
you
can sort of say, all right, I really care about these factors.
Don't change those. But change everything else. All right?
Okay.
So I'm going to go ahead and focus first on this issue of memory
layout and how this affects performance, okay? So, of course,
some of you are aware of this, but that fine. So suppose I've
got a program so here's some chunks of code, and let's call this
program A. And then what I'm going to do is I'm going to
actually make some modifications, right, so I move some stuff
around, I refactor some things, I modify some functions and now I
call this new program, program A’. All right? So the question
is, so say I meant to do this to make it faster, all right? Like
I was like oh, wait, there's something here I can really speed
up. I can make this go way, way faster. Okay.
So what do I do? So I run it, right. The green one is the new
one, the blue one is the old one, and it's a little hard to see
the numbers here. This one says it took 90 second. And this
took 87.5 seconds. All right? So which one's faster?
>>:
[indiscernible].
>> Emery Berger: Okay. So A’ is faster. What's that? No idea?
You have no idea. Man, see? All right. So this is the problem,
right? Everybody's like it's a trick question. I'm scared. So
this is really how we formulate this, right. We say is A’ faster
than A. And you look at the time and you look at the bars, you
say oh, this bar is to the left of the other bar. Fantastic,
okay? So clearly, A’ is faster.
And, in fact in this case, the difference is 2.8 percent faster.
So you say okay, 2.8 percent faster. All right. What's the
first obvious objection to this.
>>:
One run.
>> Emery Berger: One run, okay? I ran it one time. So the
question is, well, what about variance? What if there's some
sort of noise in the system, is 2.8 percent really something that
overcomes that noise. So now I run it 30 times. 30 is kind of a
magical number. It turns out, for many of these studies, so we
run it 30 times and now we look at the variance and the variance
is pretty tight. Everything looks good. Looks like A’ is
faster. In fact, the mean's not
not just the means, but the
extremes, the max of one and the min of the other, still 2.8
percent difference.
>>:
Why is it two bars for the green and three for
>> Emery Berger: Yeah, there's actually three bars here and two
bars here, but this is just the distribution, the empirical
distribution of running this code. Just happened to be that way.
It's really just an artifact of binning, okay.
So why is A’ faster than A? Of course, it's because I'm a genius
programmer and I made it 2.8 percent faster after three weeks of
effort, right? Terrific. So it could be the code change.
That's what we generally assume, right? Like I made this change,
and good things happened. But it could just as easily have been
the new layout, right. I made a bunch of changes here, moving
around functions, changing the size of functions, that had a
significant impact on where these things end up in memory, and
one of your colleagues, Todd Mytkowicz, back in 2009, presented a
paper talking about how layout biases measurement.
So what kind of things can you do, and this is what Todd showed,
that can have dramatic impacts on the performance of your program
without changing code? So you can go into your make file and
change the link order. So just changing the dot Os, like foo.O,
bar.O or bar.O foo.O can have a substantial impact on layout,
right. It moves all of the functions from one place to another.
Environment variable size. This one is completely bizarre. So
it turns out that when you run a C program it actually goes and
copies in your entire environment and then starts the program
after that. And that means that changing your environment and
changing your environment can mean changing your current working
directory, the time of day, the date, all of those things
actually move all of your code in memory.
So you might think, all right, come on.
We're moving the stack.
>>: What if your Java run time has a little C component and
managed run time has the same problem? It's not just the C
program.
>> Emery Berger: That's true. Actually, so Katherine has a good
point it goes even further, because not only is Java or .NET
actually a C program, it turns out that the run times actually do
import the environment. And they import it into your heap,
right? And a lot of the code gets compiled on to the heap, which
means that your code placement and your data placement are still
effected by these environmental factors.
So you might think, all right, come on. How much of a big deal
is this going to be? Well, it turns out as Todd and Amer showed,
and Peter Sweeney was on this paper as well, that the impact of
these changes can be greater than the impact of dash 03. And
these are very, very coarse grain changes, right. This is moving
all of the data just on block or taking a whole bunch of
functions and moving them separately. The changes that you make
as an individual, your sort of very, very fine grain changes can
have much more dep impact.
So what's going on? So the problem here is the cache. So you
have these caches. Caches are awesome. They make your programs
run faster, right? You have the very, very fast memory, but it's
relatively small. And in some cases, you can end one some sort
of catastrophic collisions in the cache, right. So you go and
you have your code here. These two piece of code are hot. So
they end up, unfortunately, mapping to the same cache set, and
you get conflicts.
So they don't all fit in cache at the same time. It runs slower.
When you change your program, by luck, you could end up actually
getting an empty space in that set, right. So the code is not
there anymore. The hot code has moved elsewhere, and now we have
no conflicts.
All right. So this can happen because of caches. Caches aren't
the only source of this sort of problem. Branch predictors
suffer from the same sort of phenomenon. Branch predictors
depend on program counter
program counters to sort of key in
to hash tables.
Then there's TLB, right, transition look aside buffer. Same
thing. It's address based. The branch target predictor. I
could fill up the slide, basically, with all of these things that
use essentially addresses as hash functions. So really, anything
in your hardware, it has a hash function, suffers from this
problem.
>>:
I have a question.
So the TLB is not fully associated?
>> Emery Berger: So the TLB is fully associated, but it depends
on where things lie, right. So if you think about it, if I have
a function that spans two pages, right, then I would need two
entries. But if it's all in one, then I would only need one
entry. So all
you're right about the address not being the
issue, but it's about sort of fixed size resources as well and
the placement of things in memory.
All right. So now that you've seen all this, if you hadn't seen
it before, let's go back and think about this. We're asking this
question, is A’ faster than A, but now we know there are all
these weird confounding factors. So what do we actually do?
practice?
In
In practice, we essentially go and we find the one layout, right.
We don't try all the environments. We don't try all the link
orders. We just ask one person. So it's sort of like we ask
this guy, and it's like hey, it seems to be faster, right? It's
2.8 percent faster. We're done, right? But here's something you
would never really do in practice, right. You wouldn't go ask
one person for their opinion about something, say hey, what do
you think? Is this faster. You get yes, yes, yes, yes, yes,
right? Do it 30 times. Oh, I have so much more confidence that
it's faster, right?
So what you really do, you really want is you want to sample
you want to sample the space of all of these things, right, all
these different layouts, all these different guys. Some are
going to say it's slower, potentially some could say it's the
same, but you really need to cover this space to find out what
reality is.
So there's an objection that people often come up with at this
point, which is hey, I like the faster guy. The faster guy is my
friend. I just want it to be faster. Now, this is not just like
I want it to be faster because I want to publish a paper, right?
Although there's that. It's also, you know, look, you ran it
faster, it's like, well, I don't care about the other layouts.
Maybe I just want the good layout. I want the faster layout.
So what if we just sort of stick to that layout. We say, you
know, this is a damn good layout. We like this layout. Let's
keep it. All right? So we're only going to talk to Bob, which
means here, we're only going to use this layout, okay? So that
sounds good.
There are problems with this approach. Suppose you upgrade your
C library. You kind of have no control over this, by the way.
It's like time to update, there's some bug fix. This happens all
the time. Microsoft C run time gets upgraded, whatever.
Libraries that you depend on get updated. That changes your
layout, all right?
If you change your user name
not you changing your user name,
but somebody else uses it, right, that changes the size of your
environment. I actually had this experience myself with a
student, where I proposed an optimization, and the student came
back and said, doesn't work. It makes things slower. I said
that can't be true. And I ran the program, and it ran faster.
He ran the program, it ran slower. We ended up running it on the
same machine, two different windows, and his was slower and mine
was faster. And his user name is longer than mine. That was the
only difference. So if you change the environment variables,
then everything worked out great. So this is kind of brittle,
let's say. Or if everybody could just canonicalize and have the
same length user name. So we could do that. That's one
possibility.
So yeah?
>>:
Or we should find optimum user names.
>> Emery Berger: We should all, exactly, do a search. That's
right. For every program on earth, that's right. I agree.
Well, as you've already heard, it turns out my user name is
optimal.
>>:
You have to make sure that the program [indiscernible].
>> Emery Berger: Right. So, in fact, that's not so hard to do,
right. So you can actually make it so that it doesn't depend on
that. But all of these other little factors, like you change one
line of code in your program, you change a library, right, your
libraries are getting shifted out from under you all the time.
There's all these different versions of all these different DLLs,
and a slight change in one changes everybody's layout, all right?
The new directory is another phenomenon, again, same thing with
the environment variables, et cetera, all right? So layout is
really brittle. And it's brittle not just, you know, I'm running
my one program, you know, you've got the whole execution
environment. You're going to make modifications to your program,
it's going to go through different versions, it's always
changing.
So you can't really stand on firm ground and say I got a 2.8
percent performance improvement, right, if all this stuff is
happening. All right? So by layout biases measurement, great.
What do we do about it? So it's bad news. Can we eliminate it?
And the answer is yes, all right? And I'm going to show you how
we do it.
So stabilizer is the system that we built that directly addresses
this problem, right. Memory layout affects performance. This
makes it hard to evaluate. Stabilizer eliminates the effect of
layout. And I'm going to show you that by doing this, you can do
what is really sound performance evaluation. So what does
stabilizer do? So Katherine already spoiled the big reveal here.
So stabilizer randomizes layout. What does it randomize? It
randomizes pretty much everything it can. So randomizes the
placement of functions, randomizes stack frames, randomizes heap
allocations, and it actually doesn't do it just once. It
repeatedly randomize those layout during execution. And this
turns out to be crucial for reasons that will become obvious
later. But basically, when you do this, you eliminate the effect
of layout because you're sampling all of these different spaces
of layout randomly, right? If it's completely random, it can't
bias the results.
So I'm going to walk you through a few the of the mechanisms at a
very high level. You already know that it's possible to
randomize the heap. This is something that I've been doing for a
little while now. Would be happy to explain offline. Here's
some of the other stuff we do. So with stack frames, we actually
modify the compiler so that instead of doing this usual thing,
where you call function main, then foo, the stack frames are
always adjacent. What happens with stabilizer is it will
actually adds these random pads. These random pads mean that the
stack frame start at different positions, okay?
When it comes to function placement, you have your basic function
placement that the compiler gives you, where it jams everything
contiguously into memory. What we do in stabilizer is we install
trap instructions at the very top of these functions, and then we
randomly find places to put the functions. We make copies
somewhere else randomly in memory, and we change the trap into a
jump. So if you go to execute that particular piece of code,
you'll actually end up executing somewhere else. We keep this
little relocation table off to the side because all the functions
that foo may reference could also move. So this will give us
their addresses.
So when we go and we invoke baz, baz gets moved somewhere else,
and so on and so forth. So when you're running your program,
stabilizer will actually go ahead and do re randomization. So re
randomizing for the stack is pretty straightforward. You just do
different random pads. For functions, it's a little more subtle.
So the timer goes off and it goes and reinstalls the traps in the
original locations. These are the old randomized locations.
These have to stay around. They're sort of potentially garbage,
but somebody could be running in them right now. They could be
on somebody's stack. So it goes and it generates these new
versions, and periodically, after some
actually, during this
re randomization period, stabilizer walks the stack looking to
see if these things can be reclaimed. Yes?
>>: So you have a little relocation table at each function.
do you need it? Because everyone us pulling the original
location.
Why
>> Emery Berger: That's a good question. So I'm sorry I didn't
make that clear. When you're in this thing right here initially,
it goes to the original location. And then it overwrites it and
jumps to the main location all the time. So it avoids an
indirection.
>>:
It's just for performance?
>> Emery Berger:
It's just for performance, that's right, yeah.
>>: Why do you need to change placement? [indiscernible] so
that every time you recompile, so you have different versions of
the program and then
>> Emery Berger: Yeah, okay. So there's one very sort of easy
answer for that, and then there's the sort of deep mathematically
motivated answer. The doing it statically compile time, sort of
this is the weak answer. Experimentally, it's just a pain in the
ass. If I want to go ahead and test my optimization, I have to
re compile all of my code, and every single one will correspond
to one sample.
So if I do it this way, then I actually get to sample a large
space. But it turns out that there is a statistical benefit that
you get, which is really huge, from doing this re randomization,
which I'm going to get to in a couple slides. Okay? So just
hold on, hold that thought for a few slides.
>>:
So one more hopefully quick question.
>> Emery Berger:
Yes.
>>: Maybe you'll get to it. So by again the relocation table,
you can change how full prime calls bar because it
[indiscernible] bar indirect. It really changes, as if it could
potentially predict friends prediction and can change
>> Emery Berger: Right, right so when you do a jump, the branch
predictor, obviously, if it's a static jump, then there's no
branch prediction necessary. However if you do a jump through an
address, then there is actually branch target predictor. These
work very well. So it's
so, I mean that part, in terms of
performance, is not really as much of a hit as you might think,
and I'll get to some performance numbers very soon.
>>:
I mean, you do change something?
>> Emery Berger:
>>:
Yes, we're changing something, you're right.
You miss initially, then, if I was [indiscernible].
>> Emery Berger: Right, right. I think that your objection is
somehow that we're changing the program.
>>: The randomization has overheads that will swamp the other
ones. I think that's a better
that's the real problem that
could happen.
>> Emery Berger:
>>:
Right.
So the concern.
But your process, you're just making everything slower than
>> Emery Berger: Right, so the concern, that's right. So the
concern that you're getting to is another concern that I'm going
to address, which is whether doing these changes actually affects
your analysis. Okay? So obviously, I'm going to argue that it
doesn't. But we'll get there.
So let's, now that we have this in place, this whole thing of
and bear with me, let's everybody assume that this is all going
to work out great and it's going to be reasonably fast and
everything is going to be fantastic. I'm going to talk about
performance evaluation. Yes?
>>: How do you know that you've addressed all the things that
create these problems that you're trying to
>> Emery Berger: Right, right. So the question really is, all
right, so there's all of these different confounding factors.
How do I know that I've addressed all of them? So I haven't
addressed all of them. So I can give you an example of one that
we haven't addressed, which is that you could randomize the
senses of branches, for example. We don't actually go into
functions and change anything about the code.
Inside a function, right, you could imagine that there's some
impact of having things in relative locations. Inside heap
objects, right, inside stack frames, there's relative positions
that we aren't disrupting. So there are things we already know
we aren't doing. It turns out that these, you know
I don't
think we're covering the whole space. We understand the space
we're covering and we can address it up front, which is a huge
advance over doing nothing. Or saying here are a couple selected
factors that we account for. Okay?
And one of the things that we observe, well, you'll see what it
does to execution times and it gives us some characteristic of
execution times, which gives us very, very strong reason to
believe we are covering things very well, okay? All right. So
let's go back to performance evaluations. We run our things 30
times. Now, when we run them, because of randomization, we have
a bit broader of a distribution, okay? So now is A’ faster than
A? What do you think? How many people say yeah, it's faster. I
see a couple nods and smiles.
>>:
I'd say confidence, it looks like.
>> Emery Berger: Oh, you're saying you're measuring this chunk
of the curve here? It seems faster, all right. How about now?
Is it faster now? People seem a little more skeptical. Looks
like it. How about now? All right. Now we're all like, I'm not
so sure anymore.
So the problem is that this is not actually the way that one
should proceed to say these two curves are different. Right? It
seems like, you know, we all have kind of an eyeball metric,
right? We say, oh, if the curves are really far apart, then it's
good. If they're close, I'm a little less comfortable about it.
But, in fact, this is not really the way that people do things
from a statistical standpoint.
So what people do when
this is what people in the social
science deuce and people in biology and all of these folks do,
they're faced with these kinds of problems all the time, right?
So I have a drug. Does this drug have a significant impact on
this disease or not? What they don't do is eyeball it, okay?
They don't say, seems good, right?
Everybody use this drug, right? So what they do is they adopt
this hypothesis testing approach. So the way a hypothesis test
works is actually quite different so we don't ask the question
are these two different, right. Is this faster than the other.
We say if we assume that they're the same, okay, this is called
the null hypothesis. Assuming they are, in fact, identical, what
is the likelihood that you would actually observe this kind of a
speedup due to random chance?
That's the statistical approach. So it turns out that this sort
of thing, you know, often we make these assumptions of normality
in biology and in social sciences, and it's very easy to compute
what the likelihood is that you're going to end up with this null
hypothesis appearing to be true by random chance. You all have,
I imagine, seen this curve. This is the classic normal
distribution. And you say, you know, the odds of being more than
three standard deviations away from the mean due to random chance
are less than 0.1 percent, all right?
So by giving us this situation
so we do this randomization.
We do all the stabilizer stuff. Now it's actually going to put
us into a position where you can ask the question exactly this
way. What we're going to consist what is the probability that we
would observe this much difference just randomly? All right?
And the argument that everybody makes statistically is if this
probability is low enough for some definition of enough, right,
then we argue that the speedup is real. And because we've
randomized the layout, we know that the speedup is, in fact, not
due to memory layout. Okay?
So there was this question before about why not just use a
static, one time randomization, right so what does
rerandomization do for you? So this is an empirical result of
execution times with exactly one random layout per run. And you
can see that it spans some space, but it's fairly uneven. So
this is just, you know, we start off, we do a randomization.
It's a one time randomization, we run with it the whole time,
okay?
Here's what you get when you do it many random layouts per run.
You can see that curve looks very, very different, right? It's
actually a nice peak, it has a tail, it's unimodal. What's going
on? So stabilizer, the way stabilizer works, it generates a new
random layout every half second, all right, and if you think
about it, what's happening? You've got your whole program, and
it's composed of a bunch of these half second epochs. Your total
execution time is the sum of these periods. So it's the sum of a
sufficient number of independent, identically distributed random
variables, right. Those are proximately normally distributed.
This is the central limit theorem.
So doing this randomization repeatedly actually means that you
can use the central limit theorem.
>>: So if I take this program, compile it and ship it to a
customer, it's going to run on some random [indiscernible] but
that remains the same for the entire run of the program? Right?
Because what you generate is this [indiscernible] that is not
available to anybody else.
>> Emery Berger: Wait, wait, wait.
what you're saying?
>>:
I'm assuming that
>> Emery Berger:
>>:
Is it identically distributed?
You can design a hardware that is, you know, randomly
>> Emery Berger:
>>:
Oh, you could.
That's an independent question.
>> Emery Berger:
>>:
So I can't ship with this is
You could.
I have.
Let's assume that that's not happening.
>> Emery Berger:
Okay.
>> Then I would claim that what I'm actually running that is
creating lots of static configurations and then running each one
of them like 30 times and then doing the sum of over all that.
If you do the sum over a large enough configuration, you should
also get the same number distribution.
>>:
What is large enough?
>>: I mean, you do it until you get the strongest
[indiscernible], right? Central limit theorem.
>> Emery Berger:
So what is your
>> So I think the reason why you're seeing this is just that just
that you're able to explore a lot more random configuration
larger configurations changing dynamically, versus what you're
doing at static.
>> Emery Berger: That's right. So I think you're making an
argument for using stabilizer, right? So what you're saying is
boy, you could get a lot more experiments done a lot faster by
using stabilizer than by getting a one time randomization with a
whole program recompilation and doing that over and over and over
again.
>>: No, no, the evaluation, whether I spend like 20 seconds or
20 days, that's not a question, right?
>> Emery Berger:
but okay.
I think it's a question for a lot of people,
>> [indiscernible].
>> Emery Berger: Yeah, but so then what do you do with that
result? I think
are you saying that we're not going to use
this to do the hypothesis test, or you are?
>>: I'm just saying that, you know, actually changing
having
the static configuration is similar to what the customer is going
to be running so that's why we get a truer measure.
>> Emery Berger:
>>:
So which run are we going to ship?
You can randomize [indiscernible].
>> His point is only that you can eliminate some of this overhead
because you're doing the re randomization, and that's closer to
what the real
>> Oh, well, so there's overhead and then there's randomization.
I thought what you were actually going to argue when you first
started talking was that you're going to say, I don't want a
curve, right. I don't want a curve. I want some point in the
space, right, and I want that space to be probably one of these
sort of extreme to the left points in the space.
>>: Right. So my point is go to the previous slide.
one that had two [indiscernible].
>> Emery Berger:
There is
Yeah.
>>: I think you're seeing this just because you're not running
enough randomly.
>> Emery Berger: So let me be clear. It's not just the code
that's being randomized here, right? It's the code, it's the
stack frames themselves, and the heap objects. Like the heap
allocations are actually quite randomized. It's quite as
randomized as die hard. So die hard has this very, very strong
randomization property. We actually install what we call a
shuffling layer on top of a conventional allocator and it
generates a lot of entropy.
So there's actually a lot of randomization happening here.
the data space and in the code space.
So in
>>: But would you get maybe a different distribution if you did
[indiscernible]?
>> Emery Berger: So if you just randomized the code, you won't
get as much of a sample of all of the layouts, because you would
be still sticking with the same memory allocation. You'd still
be sticking with the same stack frames. So this does more
extensive randomization than just code.
>>:
This is something you really get, or this is just to
>> Emery Berger:
>>:
This is an empirical result.
How many runs are in the blue thing?
>> Emery Berger:
It's 30 for each.
>>: But maybe [indiscernible] point is just that unless you ship
stabilizer, unless you ship your code with a stabilizer, it may
be that so many of those points are coming from the places where
you're going to have
you're going to have layouts that didn't
that would never correspond to any real layout, because the real
layouts don't have these paddings. The real layouts don't pack
in the
it's somebody
>> Emery Berger: Well, of course, it's possible, right, because
the randomization can lead to anything that's available through
randomization is also a possible real state, right? It's just
maybe a low probability state. But it's randomly exploring all
of the state space. But I think that you're
>>: Your random states always have some padding between, say, the
stack
>> Emery Berger:
Well, but the padding can be zero.
>>: But, for example, you put in new libraries, the link, the
layout. So at the deployment site, like you could try to explore
and move yourself into the right portion of the distribution
space. It's just something that's unlikely to get randomly, but
good performance, right? So you could try to do some
optimizations that push you over there.
>> Emery Berger:
That's right.
>>: But if you're just shipping something, you have to say, oh,
I could be any place in this curve rather than the place I have
with my one testing. Because that's just one sample.
>> Emery Berger:
Right, and remember
>>: I think that's the most interesting thing.
we're always over there on the
>> Emery Berger:
We'd like to say
We'd always like to be here or something.
>>: Their, in terms of performance and that's the distribution
over time.
>> Emery Berger:
>>:
Right.
But you can't guarantee.
>> Emery Berger: And it turns out, as we'll show that, in fact,
there are many cases where throwing stabilizer into the mix,
because of the randomization, actually turned out to improve
performance. So it improved performance because the actual
compiler and whatever the memory allocator and whatever the
libraries and whatever the state of the world was, that was
actually less than the mean, right? Less than the mean. It was
higher than the mean execution time and stabilizer tends to
regress towards the mean. Yes?
>>: Do you happen to save the layout as you randomize
[indiscernible].
>> Emery Berger: Yeah, so this is something that my student,
Charlie, is actually working on right now. So Charlie is, in
fact, doing more or less what Katherine described and what you're
sort of suggesting, which is what if we could deploy something
that did this randomization and actually deliberately targeted
the left extreme, right? And so it can observe these things.
Essentially, it's doing online experimentation, and then steering
the execution that way. But this is unpublished work that's not
out yet.
>>:
Is it submitted?
>> Emery Berger: It is not in submission right now. So
everybody jump on board, just like download stabilizer and beat
us to the punch. Actually, I will say that using stabilizer to
do this, because it does add some overhead, turns out to be not
the best approach. So Charlie naturally wrote his own dynamic
binary instrumentation system.
That's not easy.
>>:
Yes, so good luck with that.
[inaudible].
>> Emery Berger: Yeah, yeah, but it has to be anyway. I know.
So let me go ahead and show you what happens when we use
stabilizer. So we went and we decided to try stabilizer with
spec, all right, and LLVM. And in particular, we wanted to
understand what are the effect of these different optimization
levels.
So I think that most people have a sense that optimization,
essential the layman thinks 01, pretty good, pretty fast, doesn't
take long to compile. 02, takes longer to compile, but produces
better code. 03 takes a long, long time to compile, produces
somewhat better code, right. There's a sense it's not a linear
increase in speed and certainly not a linear increase in compile
time but that it does something good.
So we wanted to see if that was, in fact, true. So we ran this
across the spec benchmark suite. We did it on each benchmark and
then we also did it across the entire suite, okay? So the first
thing we did is we built benchmarks with stabilizer. Stabilizer
is a plug in for LVM. You can invoke it just like an ordinary C
compiler. It's SZC. If you run it like this, then it randomizes
everything. However, you can optionally choose one thing that
you want to randomize at a time. So this addresses Katherine's
concern. What if you care about a particular thing and not all
of the possible randomizations. The default is all of them are
on so that corresponds to code and heap and stack frame
randomization.
>>:
What about [indiscernible].
>> Emery Berger: So it turns out that there are good reasons not
to randomize globals, and it's a pain in the neck. But
>>: Addresses are in the code.
>> Emery Berger: That's right, so it's actually a lot harder
problem to randomize. So now we run the bench marks, okay. So
we run them as usual. We run them 30 times each. But this time,
we drop the results into R. So R is this statistics language all
the graphs you see in this presentation were actually generated
by R. R produces lovely, lovely graphs. It is the tool that
statisticians and many, many social scientists and biologists and
so on use to do like analysis. Statistical analysis.
So we get this result, all right. Is A’ faster than A?
Obviously, this is the wrong way to do things so we do the null
hypothesis construction. We say if A’ equals A, right, then
we're going to measure the probability of detecting a difference
at least this large.
So what's the way that we do this in R? We use something called
the student's T test. So the student's T test, this is how you
invoke it in R, pretty simple. Allows you to say, well, if the
probability is low enough and the standard threshold in
everywhere is this arbitrary point of five percent, you can
choose whatever you like. So if this probability is below five
percent, then we reject the null hypothesis. The null hypothesis
is that they're the same. So that's the name of the game. The
name of the game is we're going to try to reject with very, very
high probability, high confidence, I should say, that the null
hypothesis is true, okay?
And what that means is that in this case, that whatever we're
observing is not due to random chance. All right? In other
words, the difference is real.
So here are some run times for 02 versus 01. These are
speedups on the Y axis. The X axis is ordered increase in speed
of all of the spec benchmarks. And you can see that so all of
these things are green. These are the error bars. These are, I
think, actually don't recall the percentage around the error
bars. I think it's one standard deviation.
So all of these are statistically significant. And so you can
see that in some cases, you got a statistically significant, huge
increase in performance. That's for astar. In some cases, here,
actually, I can't see if it's red or not. I think that might be
the one that's not statistically significant. Not too surprising
you, right, it a very small difference. But here, we actually
get statistically significant performance degradation from using
02. All right. And it just turns out that the layouts that
these guys ended up with were, you know, it's like, well, you
know, there's this huge space. They end up in some layout, that
layout turns out to be
maybe it was lucky back when the person
did the implementation of 02, but it turns out not so good right
now.
>>: No, because you have some set of benchmarks and sometimes
you slow some down, right? And on average, you have all these
benchmarks.
>> Emery Berger:
All right, so, all right.
>>: But you could still choose to turn this optimization on,
even though it doesn't speed up.
>> Emery Berger: I see your point, so your argument is that,
well, all right, across a suite, it not going to have this much
impact, right? Maybe it doesn't improve everything.
>>:
Right.
>> Emery Berger: Some things it degrades. So I think that for
02 versus 01, this is pretty surprising. Because 02 is pretty
bread and butter optimizations. I think that if you presented
the optimizations for 02 to almost anybody versus 01, you would
think these are going to improve performance, and they don't.
Yes?
>>:
So this graph is with stabilizer?
>> Emery Berger:
>>:
This is all with stabilizer.
How did the graph look without stabilizer?
>> Emery Berger: So, yeah, so we actually ran this experiment.
I think, I'm trying to remember, but I think that the speedups
are actually speedups, I'm going to have to try to remember, I
would have to look at the paper. I can't remember if the
speedups are with respect to the stabilizer build or with respect
to the actual original execution. But we definitely observed
cases where these optimizations slow things down, and when you
throw stabilizer at them, it makes them run faster. So it's a
bit of a weird issue. But yeah, I would have to check.
>>: Pretty disappointed.
improvement.
I thought it was much more, the
>> Emery Berger: Yeah, it's not
yeah, it's not a huge amount.
I agree. But all right, well let's go to 03, all right? We can
crank it up.
>>:
This is on one machine, configuration?
>> Emery Berger:
That's right.
>>: Who knows when they wrote these and what the machine looks
like.
>> Emery Berger: Yeah, yeah, so that's an excellent point and
that's something that's totally out of reach for stabilizer. So
you could do this with a simulator in conjunction with
stabilizer, something like that. But you can't, like, stabilizer
is still observing the execution on your actual machine. And
it's having this effect of disrupting, you know, the memory
layout. But if you have a machine that has a one megabyte cache
and then you go to a machine that has a 16 megabyte, L3, the
performance is going to change dramatically. Right? And there's
no way to account for that.
>>:
So
01 is sort of the debug?
>> Emery Berger: 00 is actually the debug. 00 does nothing.
01 does some very simple things.
02 does more advanced things.
Especially register allocation as well as [indiscernible].
>>:
Did you measure
00?
>> Emery Berger: Yeah, 00, well, it turns out so slow that
measuring it for the entire suite would take months. So we
didn't do it. Yeah, it's bad. Okay. So here's 03 versus 02.
So this is the same axis as before. So I'm going to make the
axis ten times larger so we can actually see what the differences
are.
>>:
Great for research papers.
>> Emery Berger: Yeah, yeah. So anyway, you can see that these
performance differences are quite small. Now, one of the things
that interesting is this is a very, very small difference, right?
But because we're using stabilizer, we can actually say this is
statistically significant and this one is not. And the eyeball
test, you'd be like, you know, especially this eyeball test.
Like totally insignificant, right? Because it's so small. But
that's not really how these things work, right?
So we actually get these statistically significant improvements
and again some statistically significant degradations on the left
and it's kind of a wash in the middle. It's interesting that
there's this one, which appears quite large but is actually not
statistically significant, okay?
All right. So what do these results mean? So right now, I've
presented results on a per benchmark basis, okay? But that
doesn't actually tell you what the difference is between 02 and
03 because this is a point wise comparison, all right. So what
we actually do is we need to go ahead and run all these things 30
different times, right, LBN, astar, et cetera, et cetera, we get
a sequence of graphs so this is when we aggregate them. Before,
the results I just showed you, we were actually looking at here's
one benchmark. Compare it to the Bench mark with and without
this treatment.
I want to know something about 03 and 02 in general. What I
showed you just now was actually not the way to do it, right?
This is sort of what we want to know, right? Is 03 faster than
02. And, you know, looks like it's slower here, faster here, and
you're like well, sometimes it's good, sometimes it's not. But
this is not actually the way to do this. It's, again, we have to
go back to the null hypothesis treatment approach, right?
We say if these two were equal, what would be the likelihood of
measuring these difference, all right? And to do this, there's a
generalization of the student's T test for more than one thing,
and it's called
there's actually a whole family of these
tests. It's analysis of variants, all right? So you can again
invoke it with R in the beautiful R syntax, as you see there.
And you get the same sort of results. You can say if the P value
is below five percent, then we reject the null hypothesis, okay?
So we're going to go ahead and say, you know, that these things
are, in fact, different. So when we compare 03 and 02, we get a
P value of 26.4 percent. We wanted it to be below five. That
means that we're only 73.6 percent confident that these results
were not due to random chance. In other words, one in four
experiments will show an effect that doesn't exist. So this is a
classic failing of the null hypothesis. That is, you can't
reject the null hypothesis that 03 and 02 are the same. That
all the effects you observe are due to randomness, okay? And,
you know, colloquially, we say the effect of 03 versus 02 is
indistinguishable from noise. Yes?
>>: I agree with the [indiscernible] null hypothesis, but I
didn't see you addressing the case where you have the cut off
point that actually tells you what the sensitivity of specificity
of the one test against [indiscernible].
>> Emery Berger:
the question?
Are you talking about the effect size?
Is that
>>: Where is the turning point, the cut off point?
[indiscernible]. That cut off point is very important. It tests
you by moving [indiscernible] your test is really sensitive
enough that you [indiscernible]. I didn't see you addressing it.
>> Emery Berger:
Well
>>: I agree [indiscernible]. But another possible is
[indiscernible] too, because you are choosing
>> Emery Berger: Of course. So the choice of a P value is
always
I mean, this is just the way it works, right? You pick
some P value. Theoretically, the way this whole thing is
supposed to proceed is you pick a sample size in advance, you
pick a P value in advance, and then you go and you do your test,
right?
In fact, this like presentation of the P value here of 26.4
percent, you don't use that value. All you do is you say, we
can't reject the null hypothesis, right? So that is the sort of
standard statistical methodology that I'm employing. Great so
you know much more about this than I do, right? You probably
forgotten more about it than I've ever learned. I'd be happy
I'm more than happy to talk to you about this after. Okay?
Terrific.
Okay. So there is this concern about
this is not really a
probe effect, but it's an effect of using a tool. Is there
something about stabilizer that hiding the effect. So what
there some systematic bias in stabilizer that's changing the
is
effect that we're observing, and so it turns out that, you know,
we observe all of these speedups when we run it with 03, 02,
01 or 00, and stabilizer actually independently of the
randomization it employs, adds the same sort of impact to all of
them, so it has a fixed additive increase in the number of
cycles, the number of
the number of branches taken, the length
of the path, et cetera, et cetera.
It clearly disrupts the memory layout, but that's the point. But
everywhere
every other point in this space where we're
counting cycles or we're counting reads and writes and so on, it
stays constant. So we end up getting a basically
the same
sort of additive increase.
>>: I'm not sure you can claim it's the same additive increase,
right? So if it's not optimization saying [indiscernible] one of
your big points is about jumping to the, you know, stack frame
for that method. You're not doing inside method
>> Emery Berger:
optimization from
happens after all
question. So all
optimizations.
>>:
Oh, right, right. So do we actually prevent
happening? So we don't because this actually
the optimizations. So it's an excellent
of this stuff that happens to the code is post
It doesn't change the inline decision?
>> Emery Berger: Right, all the inline decisions, all of the
optimization decisions have happened, and then it goes and it
does this
>>: And there's fewer
if they're inlined the methods, then
they're not jumping to the method so they're not experiencing the
randomization of
>> Emery Berger:
That part is true.
>>: So I'm just not sure about how these are constant across the
different
>> Emery Berger: So the question
so the issue really is if we
take the code that's been produced by LLVM, is stabilizer doing
something to it that's disrupting the optimizations, and the
answer is no. Because all of the optimizations happen first.
And then it goes and it does all the stuff to instrument it with
randomization.
>>: But the you point is valid for inlining. You, by performing
inlining, you've few reduced the potential and searching points
for stabilizer.
>> Emery Berger:
>>:
So there's less disruption.
>> Emery Berger:
>>:
That's right.
In these other
>> Emery Berger:
>>:
That part is true.
That's right.
If you pair 00 with 03.
>> Emery Berger: Agreed. One of the good things is if you go
and you actually observe the run times of all of these executions
I don't have these graphs here, but they're in the paper, for all
but two of the cases, if I recall correctly, we get normal
distributions of execution time. So you can still do all of your
hypothesis testing. The reason that you might not get it is
directly related to this problem. It's not actually so much
because of code.
So how could you fail to get a normal distribution? Well, if
you're not actually getting any independent randomizations, then
you'll get none. So how can that happen?
means no randomization of any functions.
>>:
One giant mean.
That
[inaudible].
>> Emery Berger: Well, there's custom memory allocator. That's
actually the big problem, okay. So some of the spec benchmarks
have custom memory allocation
actually, many of them have
custom memory allocators. And if all you do is spend all of your
time in one giant array allocated on the heap, then you actually
can't randomize within it. That's the biggest problem. Luckily,
in almost all the circumstances, it doesn't matter. There's
enough other allocation and enough other functions to obscure
this. Yeah?
>>: But code LAN acquisitions, like hot code placement, are
going to be totally disrupted.
>> Emery Berger: That's right. This goes back to Katherine's
point. So in that case, what you do is you run this without dash
R code so you say dash R data, dash R stacks.
>>:
Do you see any effect from that?
>> Emery Berger:
>>:
Okay.
Like 02 verse 03?
No, they don't actually do this.
Microsoft compilers can.
>> Emery Berger: I know. LLVM does not, all right? So this is
just performance of stabilizer. In some cases, it slows things
down considerably. Like I said, it slows it down sort of
uniformly, but it does slow things down. Thank you, Katherine
and Ben. PERL is a disaster. PERL is
well, the benchmark is
ridiculous in many ways, but PERL is a disaster because it has
it's a giant switch statement with function calls. That's all it
does, all right.
And so if the functions are placed randomly, then they don't all
fit in the TLB. That's really what we see is we see TLB pressure
here. So if we had a hierarchical TLB, we had a bigger TLB,
these problems would go away. But that's the bulk of where that
cost comes from. And most of them are low. The average is about
ten percent.
>>:
If it wouldn't randomize, they would fit?
>> Emery Berger: Yeah, so what happens is that all the functions
are getting laid out, right. They're just function, function,
function, function. So they're very compact.
>>:
They do this on purpose already?
>> Emery Berger: No, it's just an accident of the way that the
code gets laid out, right. Nobody randomizes code. So the code
is just there.
>>:
It not be an act that fits in the TLB.
>> Emery Berger: Oh, there's
take a look at the PERL code.
>>:
okay. So I highly recommend you
There's not that much happening.
Interpreters are structured the same, right?
>> Emery Berger:
right.
It's a classic interpreter design.
That's
>>: Anytime you take something that fits in a fixed size
structure and do anything to it, you now don't fit in the fixed
size structure and TLBs haven't grown in years.
>> Emery Berger:
That's right.
>>: What about cactus.
GCC to be over there.
Right.
Like I totally expected PERL and
>> Emery Berger: Yeah, so what's happening here, if I recall
correctly, this one here is actually the same problem with the
TLB. The TLB is what kills you, but this is for the heap. So
the randomizing heap location.
>>:
All right.
>>: I thought there was an overhead with PERL just for random
die hard.
>> Emery Berger:
Yeah, it was
>>: There was a bunch of overhead anyway.
>> Emery Berger: Again, it's TLB so yeah, so every
all of
this overhead is basically attributable to TLB. There's a second
order effect, which is like L3 misses, but TLB kills you.
>>:
So you need super pages?
>> Emery Berger: We need super pages. We thought about that,
actually. But we decided not to actually go that route for
mostly because Linux makes it real, a real, real pain in the
neck. It's not like you say I want to do
so you have to do
sort of
if you boot into your system and you boot into super
pages, everything is fine. If you want to actually allocate
chunks of memory in super pages, it's a mess. It's really very
bad. Yeah.
>>: So looks like what you need to do, another alternate design
would be to actually still have compact code but randomize
functions within that, right?
>> Emery Berger: Yes, so that's an interesting observation. So
part of the problem here is that what we need to randomize. So
undoubtedly, we have too much randomization, okay? And we can
randomize in compact way. It is possible. I know how to do it.
We didn't do it. One of the reasons is what we actually need to
randomize is a certain range of addresses that are the bits that
are used in these hash functions. And those are actually not the
low order bits, right. So they leave off the low order bits.
They leave off the very high order bits, and they grab some in
the middle. So it's very important that you randomize those.
And that is going to lead to things being on different pages.
>>:
That's for the TLB?
>> Emery Berger:
Yeah.
>>: So what does it mine, like if you slow down 40 percent, and
now you measure 02, right?
>> Emery Berger:
Yeah.
>>: I think what people have been saying, the original one is
really the one that I'm interested in. This is like, it's such a
huge effect.
>> Emery Berger: It has a huge effect. So it's important to
understand that the effect of stabilizer has
that is the
dilation effect, does not actually affect its ability to discern
very, very small differences. It's just a question of running it
more times. Okay? So if you imagine, like all I'm doing is
looking for signal to separate from the noise. So if there's a
very, very small amount of signal, I will be able to discern it.
I just need to run more tests. So having the test itself be
slower doesn't actually alter the ability for us to do
statistically sound performance evaluation.
Now, the result that you get, I mean, obviously, when you run
stabilizer, you probably don't want to ship with this, right.
But that's a whole separate question. The question is, we're
trying to understand whether these effects that we see in
performance differences are affected or not. If both of them get
massively dilated, that's okay, even if those effects are very
small, if they're consistent, then we'll detect it.
>>: So your audience is the person who is choosing 02, 03 and
making the choice should I spend the time to do 03 versus 02. Is
it worth it, right? It's not in some sense the end person who is
running the program.
>> Emery Berger: It's certainly not the end person, but I would
say the audience is much broader than the people running 02, 03.
So my audiences are the following. One, developers who ran their
code and their code seems to run a little slower. They think oh,
my God, I got a two percent performance regression, right. Well,
before you go chasing that down, right, find out if that two
percent matters or it's going to go away tomorrow when you modify
some more code. So performance debugging.
>>:
Performance regression.
>> Emery Berger: Well, both ways are meaningful. Certainly, if
you're like I want
I have this crazy, super complicated way of
doing something, and if I plug this into Internet Explorer it
will make it run 0.5 percent faster. But it's going to be a
source of bugs and maintenance nightmare and all of this sort of
thing. Do I really want to do this or not, right. Is that
meaningful.
You can decide whether 0.5 percent is meaningful or not as an
effect size, but you need to know whether it's statistically
significant or not. But the other audience is really
researchers, right? So researchers publish lots of papers on
performance optimizations of all stripes. Not just compiler
things, right. Run time system stuff, different algorithms,
different
lots of things.
And there's a kind of, you know, well, seems to work or the
number is large. You might say oh, if it's over 20 percent,
that's clearly good enough. But actually, we see larger than 20
percent swings just by doing static
like link time, like
change the link order, change the environment variables, right.
You can easily get a 30 percent swing in performance just with
that.
So it's important that people, when they go ahead and produce
these results, that they know that they're real. If but is it
for grandma? No, it's not really for grandma, right?
>>:
Or grandpa.
>> Emery Berger: Or grandpa. I definitely am happy to include
both grandma and grandpa in this. It's definitely ageism, but
certainly my mother and my father have no idea what's going on.
So anyway.
>>:
So when the effect is so large.
>> Emery Berger:
Which effect?
Of your optimization?
>>: No, no. When the effect of adding this tool to your run
time is so large, how do you control for its variation. Of
course, more runs, right?
>> Emery Berger: So its only source of variance is its effect on
memory layout, right? It doesn't actually do anything different
itself.
>>: But it's the observer effect, right? Has two parts to it,
right? One is because it's code and it's in the run time with
your program, this is something that wouldn't have been in the
run time so there's some effect.
>> Emery Berger:
That part is true.
>>: There's the null effect where it gives you the same layout
and it has some performance impact, right? Have you tested that?
>> Emery Berger:
So if it gives you
>>: Run stabilizer and make it generate exactly what it did
before and then what's the overhead there?
>> Emery Berger: Yeah, so the overhead is, again, the overhead
is totally swamped by these memory layout effects, right?
There's almost
>>: That's just interesting, how much overhead effect.
>> Emery Berger: So we measured that. I think it's on the order
of two percent. So this is exclusively from indirection. And
actually, Charlie has a faster way of doing this that should make
it go away. But taking the traps. So every time you take a
trap, then, you know, you take some hit, it's tied to the
frequency with which you do this relocation. The longer your
time delay is, the less you see. There is a small effect on
straight line execution of code. Without randomization. It's
very, very small.
>>:
Let's let Emery
>> Emery Berger: I'll wrap it up. I got to the punch line a
long time ago. Ta da! The punch line is
I know you've been
waiting. And then the cat says to the dog. So anyway, so memory
layout as I hope you all will agree, affects performance. Makes
it difficult to do performance evaluation. Stabilizer controls
for the effect of layout, it randomizes it. This lets you do
sound performance evaluation, and we did a case study. Case
study shows that, in fact 03 does not have a statistically
significant impact across our benchmarks.
And you can actually download this today. Is at
stabilizertool.org. I wanted to thank the NSF and, sorry, guys.
I want to thank Google to helping to fund this. Google also is
funding Charlie's Ph.D. So he got a Ph.D. fellowship from
Google. So anyway, that's it. Thanks for your attention. And
I'll put up one backup slide here. So this is 02 versus 03.
>>: So did you verify that stabilizer is [indiscernible] if you
change the amount of variables?
>> Emery Berger:
You mean the performance?
>>: You run that, your student runs it in his name and your name
and you get exactly the same results.
>> Emery Berger: So yeah. So it does, it does, because the
start
and the position of everything really is totally
disrupted by the all the randomization, right.
>>:
You verified that?
>> Emery Berger: We verified it. I'm pretty damn sure we
verified it. I should say one of the things, so this is the
little secret about stabilizer, you know, stabilizer gives you
performance predictability. Right now, the cost is too high.
But it gives you this performance predictability and you can
argue, look, the chances of me being more than three standard
deviations away from the mean are less than 0.1 percent, right.
And so the fact that you get this actually gives you very, very
predictable performance. So it would be completely shocking if
you observed anything different. But it turns out that it's
immune.
So I'm pretty sure the paper actually has an experiment to this
effect. Yeah.
>>: Well, I mean, one thing that's interesting, and it goes to
this question, which is now there are changes in the environment,
and some will affect the performance and some won't. So the
claim is that they are very well shouldn't have performance. If
I change the library, maybe it will, maybe it won't. Might make
something faster, right? But you can actually
you can, you
know, evaluate that. Like every different piece of hardware, you
can look at now with stabilizer, you can say how do these compare
with stabilizer, you know, compared to without stabilizer. So it
seems like, you know, you have a better way to understand the
real effect of the environment, which I'd be curious to see in
terms of
like Katherine said this great work, I'm looking at
like 20 different architecture things that affect energy and
things like that. You're doing the same thing with stabilizer to
see how much difference
>> Emery Berger: That's an interesting point. I mean, I would
be leery of doing this
well, it's an interesting question.
I'm not sure whether the energy impact of the memory lay outs, I
would have to imagine that that would be substantial. And but
you could certainly detect whether there's a difference. I would
have to think very carefully, though, about what the meaning
would be of running your stabilizer results on machine one and
running it on machine two and comparing them. You need to really
have a pretty clear null hypothesis.
So I guess the null hypothesis would be that these machines have
no impact. That would be a very surprising null hypothesis.
>>: For P3 and P4. I mean, the pentium 4, it wasn't clear that
it was really better, right?
>> Emery Berger:
Oh, yeah, sure.
>>: So I'm sure there are cases where you could actually confirm
that.
>> Emery Berger:
>>:
Right, right.
The trace instruction
go ahead, Tom.
>>: The other question I had is you could adapt your sample or
your
where your layout.
>> Emery Berger:
Yes.
>>: To reduce your overhead. And if you still got acceptable
sort of experimental results so have you looked at that?
>> Emery Berger: So we deliberately chose this number because we
wanted to make sure to guarantee we got 30 samples for any
reasonable execution. And so you do need to have some number of
samples to make using these hypotheses tests meaningful. We are
looking at altering the rate at which we do these things based op
performance counters for this other tool that Charlie is working
on. Yes.
So you're randomizing layout stuff here. I can see, because
actually that's a large. But are there other possible effects,
other than layout, that could affect, you know, performance, even
with small changes in your code? Non layout related.
>> Emery Berger: Oh, non layout related. So I believe that
there are. So this
but it's tricky. So I've spoken to one of
my colleagues, Daniel Jiminez, who is an expert on branch
predictors and he says, in fact, there are other things you might
want to do beyond just laying things out that would actually
alter the way that the code behaves in the branch predictor. I'm
sure there are many more very subtle things. But he also thinks
that those things are second order, that the caching and the
addresses themselves are going to be where the big hit is.
The question, however, of randomizing things within objects, so
you have a function, right, or you have
especially for a
function and the code, right, the relative position of them,
because they all move together, means that there's a lot less
randomization happening. So getting deeper inside would be a way
of getting, you know, at more possible layouts. Yes.
>>: You mentioned offhand that the randomizing stuff
[indiscernible] which means they could be really [indiscernible]
optimization you want to do to your code do you think this is
probably going to speed things up because of [indiscernible]
you'd like to factor out the [indiscernible] by randomizing that
and making sure that your optimization is getting the speedup
from the source [indiscernible].
>> Emery Berger: Sure, sure. So there's actually some work. I
think it's Mark Hill, if I recall correctly. It's either Mark
Hill or David Wood, somebody in Wisconsin. They actually do
inject some randomization into simulation to try to get some
error bar, because otherwise everything's exactly the same all
the time.
But they haven't done
to my knowledge, nobody has done
anything like this within the simulator. You know, it begs the
other question, obviously, about simulators.
>>:
Not on purpose?
>> Emery Berger: Yeah, right, not on purpose. That's right.
Yeah, it begs the question of, you know, simulator fidelity to
the real platform. But that's always a question. Yes?
>>: My question [indiscernible]. You are optimizing the layout
here. Do you consider all factor and how do you [indiscernible]
availability that might have come from there.
>> Emery Berger: So I can't say that we've considered every
factor, right. What we did is we said, you know, we
>>: You said factor you consider and how you came to choose
layout.
>> Emery Berger: Oh, beyond layout. So it turns out we focused
on layout, actually, specifically because it has two very direct
consequences on very, very performance critical pieces of your
hardware. One is the cache and one is the branch predictor,
right. So the cache is huge, right. If you miss in the cache,
you go all the way out to ram. It's a hundred times slower than
if it with are in L0. So this is a gigantic, gigantic impact.
Branch prediction has a much less dramatic impact, but it also
has an impact on performance. So that's why we focused on
layout.
>>: Okay. So within a fun the branch layout matters a lot.
Like that's an optimization that has a lot of impact on
performance. So how would you generalize what you did? You'd
have to change the compiler? Because the offsets are baked into
the code because you don't want to look up the offset of the L
branch, right?
>> Emery Berger: Absolutely right. So Charlie and I have talked
about this. I guess we should wrap this up very quickly.
Briefly, the idea is that you generate different variants of the
code. You generate different functions, and then you sample
those at execution time. When you're doing rerandomization, you
don't just take, you know, this function, move it somewhere else.
You take function version K. And so
>>: K is good.
>> Emery Berger:
>>:
K is fantastic.
K is always good.
Thank you.
>> Ben Zorn:
We want to thank Emery.
Download