A great pleasure to introduce Todd Austin

advertisement
>>: Okay. It's a great pleasure to introduce Todd Austin. Todd got his PhD at the
University of Wisconsin in 1996. He's currently a professor at the University of
Michigan in the electrical engineering and computer science department. Todd's been
doing lots of really interesting work in architecture for many years. One of his great
contributions is the simple scalar toolset, which is widely used by the entire architecture
community. Todd also got a Maurice Wilkes award for innovative contributions to
computer architecture in 2007, I think. And today he's going to be talking about using
hardware to find security bugs. So that's something of interest to many of us here, and
please welcome Todd.
>> Todd Austin: Thank you. Thanks, it's a pleasure to be here. Actually it's been four
years since my last visit. Yeah, so my background is in computer architecture. Most
people think my background is in computer architecture, but really my research is all
about finding and fixing bugs and faults. So I work a lot of fault tolerant systems; I work
in finding software bugs. And when you work in software verification, you can't help but
overlap heavily with security because a lot of security bugs--a lot of vulnerabilities are
the result of software bugs.
So today I want to talk about some work that my, two of my PhD students have been
working on. A few years ago Eric Larson, who, is now at Seattle University, worked on
some work on security vulnerability analysis of programs. I'll talk a little bit about that.
And then my current graduate student, Joe Greathouse, has been working on developing
techniques to scale the performance of super heavy weight, but super powerful security
vulnerability analyses. And I'll talk about that work. That's called Testudo.
This is also joint with other students and faculty as well. All right. So I'm sure I don't
have to spend much time on this slide but, you know security vulnerabilities are a big
problem, and I'm sure at Microsoft everybody knows this. But I will show you one thing.
Yes, I picked on Microsoft Windows, but that's not the only system that has bugs in it,
too, that people can exploit. Linux is more and more being found to have lots of security
bugs. And even simple devices, you don't even need an operating system, simple devices
like our FIDs can be attacked as well.
Many security vulnerabilities are the result of bugs in software. So if you can fix these
bugs, you can eliminate the vulnerability. As opposed to trying to detect the vulnerability
and stop it when it occurs. If you fix the bug, you don't carry the payload of trying to
find that vulnerability and fix it at runtime.
Let's take a look at an example of one security vulnerability bug. A classic one is the
buffer overflow attack. So I got this piece of code here, which it's a function that's got
some variable on the stack, some local variables as well. And it reads some input from
an external source. And the reason why this is a bug is because this particular function
here doesn't limit the amount it reads to 256. Now somebody comes and injects data into
this read input call here and it's less than 200 integers, it only partially fills this buffer and
everything's fine. The program intended--runs as it was intended.
But if somebody reconstructs the protocol, for example, and just injects more data into it,
and even violates the purpose of the protocol, say injects more than 256 integers in here,
what happens in this particular attack, is it overwrites the buffer. Then the local variables
above it and eventually gets the return address. The nature of this particular attack is
make this data that you injected into the program code, and then try to figure out the
address to jump back into your own injected data. Once you've done that you've done
what you fundamentally need to do to implement this attack, which is redirect control
from external data.
And then when that's done, you can take over the machine. There's a variety of things
you can do in this buffer to take over the machine. So how do you fix these
vulnerabilities? What's the classic approach that's used most widely today to fix these
bugs that become security vulnerabilities? And it's as follows: write your application.
Deploy your application to your customers. Let people attack your customers.
Customers get upset, complain to you. Debug the attacks. Then fix the software, and
repeat.
Now this is effective but it has some downsides. One, you get upset customers when they
get attacked. And two you get embarrassed software vendors, because some of these
bugs can be, frankly, can cause a lot of exposure.
So what I want to talk about today is a better way to attack these bugs, and that's through
security vulnerability analysis. Let's try and find the bugs that hackers love before we
release the software. But there's some challenges to this. But here's the basic approach.
Develop your program, employ security vulnerability analyses. So there's a variety of
these; there's many different techniques. I'm going to talk about one today that I helped
to develop, a variety of techniques. Employ this in the lab, and then debug whatever
exposed vulnerabilities you find, continue to develop your program, and then iterate
around this cycle until you decide to deploy. The advantage of this approach is that you'll
find those bugs before you put them out into the field.
But there's, as were going to see, there's a couple of problems with this approach. Well,
so it's good for your customers, because you get more of those bugs out. But the
downside is security vulnerability analysis tends to be extremely heavyweight analyses.
And when you're running very heavyweight analyses that slow down programs by
hundreds of times, you tend to be myopic in what you can test. Because you simply don't
have the resources to fully test the program.
What does it mean to fully test a program? It means to cover all of the feasible paths that
the program can possibly execute. And there are billions of paths in non-trivial
programs. So if this is slowing down your program hundreds of times, you probably
won't even be able to get to your own test suites, let alone a test suite that, let me give
you a specific example, a fuzz testing suite, thinking good coverage on paths in your
program.
What I am going to talk about today, is I'm going talk about a technology called Testudo,
which pushes those analyses out into the customer base. And runs them on every
machine, every time the program is run, using a data flow sampling technique that can
limit the amount of CPU and memory resources it takes to do these tests. So that forward
progress is made, but it's distributed across a large customer base. The great thing about
this approach is it works well with the hacker bug economy. Which is as follows:
Hackers want bugs that occur on many, many machines. So that when they devise an
attack, they can grab as many machines as possible. In the same scenario, our analyses
were run on many, many, many machines. So popular programs will get the most
amount of coverage with the security vulnerability analyses. And my conjecture is that
we, using this technology, we can get ahead of the attackers, and find bugs before they
do. Yes?
>>: What if you have your time? What if you're using someone else's machine
[inaudible]?
>> Todd Austin: Yeah well, so like all technologies for finding bugs, you can use them
both for good or bad. Right? So attackers could use these technologies as well. One
advantage that I have over the attackers is that I have the advantage of my entire
customer base. But an attacker could definitely use this technology also, say to run these
analyses on a botnet for example. Good point. And the key point here is, take the
criminal out of your design cycle. I think this is a really good approach to developing
software and one that we should all strive for. All right. So here's what I'm going to talk
about today, I'm going to talk about three technologies that I, that I've worked on in my
career. First I'm going to talk about Metadata. Metadata is an important aspect of
security vulnerability analysis. It is information that is attached to program data that
helps restore programmer intent in the program.
As we run programs we throw away a lot of information, and we need to re-materialize a
lot of that information. And Metadata stores locations are where we're going to store
that. Then I'm going to talk about one security vulnerability analysis that I helped to
develop a few years ago, called input bounds checking, which is going to try to determine
if the check on data coming from external sources are sufficient that you are limiting the
possibility of dangerous memory accesses or changing the control of your program
without your knowledge. And then finally I'm going talk about my most recent work
which is the Testudo project, which is this just dynamic distributed debug. How do we
take these super heavyweight analyses and push them out into the field to try and scale
the performance and see more of those feasible paths that our customers are going to
execute to try and find more bugs?
Let's take a look at Metadata. When you run a program, unfortunately the programmer
puts a lot of information at the source level about what it is that he or she wants to do.
But it really gets unceremoniously discarded by the compiler and the runtime system. To
give you an example, here's a Metadata strategy I published many, many years ago called
fat pointers. Which just tries to rematerialize intent behind how programmers use
pointers. In a language like C, for example, a pointer just contains an address to a piece
of storage. But it really is a lot more information there that would be useful to store if we
could. For example, what is the variable that was intended to be pointed to? Whether or
not the variable is live, whether or not the variable has some externally derived
information in it, and whether or not that externally derived information has been
checked. All these kind of properties, it's nice that we have a place to store them. So
what you'll see is in security vulnerability analyses, you'll see a need to declare, store and
efficiently manage Metadata. And so will see some of that later in the talk when we look
at the Testudo hardware implementation.
All right. With the Metadata we can start to do a security vulnerability analysis. And
there's many of these that are available. Let me show you one that I worked on called
input bounds checking. All right. So how do attackers look at programs? They basically
look at them like big black boxes with a bunch of knobs on them. What are the knobs?
The knobs are the inputs that I can input into the program. And for the vast majority of
attacks, it is simply finding the correct sequence of turns of these knobs to cause the
program to access memory that it didn't intend or change control in a way that it did not
originally intend. That's the reason why fuzz testing, for example, worked so effectively
at detecting programs, because with fuzz testing you’re just turning knobs randomly and
looking if the program crashes.
So a great source of bugs that we can find that can stop security vulnerabilities is just find
stuff that comes in from the outside world, these knobs, that go to potentially dangerous
operations like memory, indirect pointer accesses or changes of controlling your
program, and finding out if they haven't been properly bounded. If we could find those
bugs, we found a significant number of bugs in, in -- that hackers can exploit.
So the approach that we use with this input bounds vulnerability analysis technique, is we
take a program and at the low level it operates on, you know, integer data, and we
combine it with a symbolic copy of the same program, that does the same exact
operations, the ads, the subtracts, the loads and the stores, but instead of operating on
concrete data, operates on symbolic data. So we take every variable, say some variable X
has the value of two, a concrete value, and we clone it, we include Metadata with X,
which captures the range of X, a symbolic value of X. This doesn't declare any particular
value of X, but instead declares a predicate which describes the possible values of X.
Okay. So X can be this value in the program and through the Metadata we know what
possible values it can be and we'll see by propagating these symbolic values around and
pushing them through computation, in a symbolic fashion, we can determine the exact
ranges and values of X whenever we do a potentially dangerous operation. Now, if we
also track -- whoops a little, hello -- yeah, if we also track the type information as well,
which in some languages we may have, we may need to rematerialize this inside of our
Metadata we can -- whenever we have a potentially dangerous operation, we can apply a
proof engine at the D reference or the change in control, and try to prove, does this
predicate violate the constraints of that type or that variable? And you can see it. This is
a C array, has a value index is 01234, and this X has a range of 1 to 5, and we can see,
yeah, there is a possibility that a bug could occur here.
Now what's really powerful with this analysis is there's no attack in this 2. This 2 is
perfectly valid. But in the symbolic side of the program we can see that an attack exists.
And that's the most powerful aspect of this particular analysis. It finds attacks without an
active exploit. It just -- because the symbolic side of the computation tracks the range
over which the values could exist, and if there does exist a value that could provide an
exploit, it will find it. To find all of the exploits for improperly bounded values, we
simply have to have complete control flow coverage of the program. So we simply have
to hit all the feasible paths. And we will find all of these particular exploits.
I don't want to go into all the details of this analysis but I want to show you by example
how it works, so you can see how symbolic computation in the background is a powerful
way to find and exploit.
Here's another program which has a bug in it. We've got an array of five elements.
We’re going to get some input. We're going to check the constraint, the value of the
input to see if it's within bounds, then increment the value, that's where the bug is.
Because it no longer is within the bounds of the array, and then we have a potentially
buggy operation. And so we've run the program, and we get the following values. 2
which is legal, passes our test, increment of three, no bug. But now look at the para -- the
symbolic computation which occurs in tandem with the running program. When we get
the value of X we see that the range of X is any legal integer. When we hit the predicate
we know from the direction of the predicate here what the range of X is. X is now
greater than or equal to zero, or less than or equal to four. And we can determine this by
telling the direction of the branch, understanding the predicate that's associated with the
branch, and then using that predicate, intersecting that predicate with this symbolic value
to get the range.
Then we increment X, and there's a calculus for manipulating all of the symbolic values.
So that when I implement an X, I simply increment, according to the calculus, the range,
the lower and upper bound of X. So now I know X is between one and five. Then I go to
this operation, D reference the array. Then I run my proof engine, and I say, can this
predicate produce a value which violates the type of that array, and yeah, it can. The
value of five, so no active exploit, I find my bug.
That's a very powerful technique. The way we implemented it is we took GCC, and we
instrumented the code such that as it produced the individual operation, it also coupled
them with the symbolic version of the same, of the same low-level operation. We then
run a test suite. We look at our error reports, and then we go back, fix the code, and
iterate around the cycle. So now we’re doing that security vulnerability analysis to try
and get rid of our bugs.
And in the world of security vulnerability analysis, when you read papers in that domain,
every once in a while you see a paper that actually finds a really good bug. And our
paper found, what I consider two super high quality bugs. We found two bugs in open
SSH, that were fixed almost immediately. One was a bug; one was a buffer overflow
attack. The other was an attack that occurred where you can use integer overflow to
attack the system. But here's the downside of this analysis. Yes?
>>: So this technique can find bugs [inaudible] if you actually leech the predicate, that
would create the bug [inaudible] switch to [inaudible].
>> Todd Austin: I can combine my predicates, at the --I mean -- so I don't quite
understand your point.
>>: I think he's saying that you only find the bugs on branches that were followed during
the program execution you've analyzed?
>> Todd Austin: Yes.
>>: And that's driven by the [inaudible]? Concrete values…
>> Todd Austin: Correct. But let's say I cover all of my feasible paths. Will I find all
the bugs? So I -- good point. So that's the challenge here, is finding all the feasible
paths.
>>: But there are exponentially many paths.
>> Todd Austin: There are many paths. I mean I don't know how many feasible-nobody, nobody can, I mean we really don't know how many feasible paths are in
programs. Yeah. They’re exponential with the depth of the control graph. There's a lot.
So I need a highly scalable technique to implement this analysis.
>>: What is your [inaudible]?
>> Todd Austin: I'll get to some of that. Yes?
>>: Are you familiar with type state checking?
>> Todd Austin: Type state checking?
>>: Yes. [inaudible]. It's a static compiler that proofs programs correct with respect to
this kind of analysis.
>> Todd Austin: Yes. Yes. Yeah. I'm familiar with that work and like the prefix work,
and a variety of techniques. And those techniques are very similar to this, right? The
difference, is that I'm analyzing paths that are run in the program, and those techniques
are, are essentially symbolically running the program. Yes?
>>: No. They are static provers. They compile a program and prove it correct.
>> Todd Austin: But they have to do some level of symbolic computation to figure out
what paths are possible in the program.
>>: No. They are path insensitive.
>> Todd Austin: Well if they are path insensitive, they're going to have tons of positive,
of false positives.
>>: False positives in the program that is a type state save would be rejected as possible
I'd say.
>> Todd Austin: Okay. So we can talk more about that. But in general static techniques
cannot complete on nontrivial programs. Will you give me that one?
>>: No.
>>: I would give you that one if you admit that a dynamic can't either, right? I mean you
can…
>> Todd Austin: Exactly. A dynamic can't either. So what I'm talking about today is a
scalable dynamic technique that can go further than any past dynamic technique. And
with static techniques, we can do the same thing as well. Maybe we can someday meet in
the middle, and find all the feasible paths in the program. So good comments, thank you
very much. Yes?
>>: So his Soros the only domain use in intervals? [inaudible]
>> Todd Austin: Yeah, primarily intervals although we use a different style of analysis
for strings. [inaudible] So the downside of this approach is it's really slow. These are
some programs--these are their original runtimes, their runtimes with full
instrumentation, and this is how many times slower they run with that instrumentation
load on them. And, you know, in the best case about 43% slower, and in the worst case
about 200%, or 200 times slower, so significant payload in the background there to do
that symbolic computation. So it really limits the number of paths that you can, that you
can analyze when the program’s running hundreds of times slower. So the point of this is
that this analysis is very effective but it's extremely expensive. What I want is an
analysis that I can hit all of my feasible paths on.
So now let's take a look at the work that I've been doing recently, yes?
>>: Okay so, if it's so expensive, how, but is it still a lot cheaper than doing a full static
analysis that would, you know, find all possible paths.
>> Todd Austin: Well, I'm unaware of a full static analysis that finds all possible paths
that doesn't do enough abstraction that you have to either, sort of, have a lot of false
positives or don't find certain kind of bugs. So and I'm unaware of any dynamic analysis
that can hit all the feasible paths, because it's so expensive to do that kind of analysis. So
what I'm going for is in the middle. Dynamic analysis that scales the many, many
machines so I can hit more feasible paths than anyone has in the past.
>>: [inaudible] I mean expression generation [inaudible].
>> Todd Austin: Yes. At the very end of the talk, I'm going to talk about some very new
work that I've been working on, which is to try and take the set and the probability of
feasible paths, and help static analysis drive better with that knowledge. Because I think
ultimately the best solution is going to be partially dynamic, to find out what customers
do, and provide a lot of predicate information about feasible paths, and then use that in
the static domain to try and find paths that are almost impossible to expose without really
clever inputs. Yes?
>>: I guess one thing that's unclear is the…
>>: If he runs the static analysis and you, you know, you don't want to reject every
program and so you make some compromises. And so you're not going to find every
bug, but you're going to find some bugs. And you're going to find some bugs. And I
guess the one question that comes to mind is how much overlap? How many of the
things that you've found here would have been found by a reasonable static analysis?
Like a prefix or whatever. Maybe not just that one but there are these abstract
[inaudible].
>> Todd Austin: Yeah. I don't know the answer to that question. We actually try to
answer that for Eric Larson's thesis which was back in about 2005. And he tried to gather
a lot of tools to do this sort of overlap coverage analysis. At the point--in the end these
tools tend to be real fragile. And it takes a lot of deep knowledge to get different codes
running on them. I think today with tools like [inaudible] and stuff, we could probably
do a better job of that today. Although today and now I'm sort of stuck over in this
[inaudible].
>>: Let's say actually does all those things you say that can be done, but it only does it
on squeaky clean, clean languages, like none of the ones that we use. Are you--how
pervasive programming language, grungy code stuff can you handle?
>> Todd Austin: Well, today I do the analyses at the instruction mode. So I'm really
looking at the lowest level of the machine. I'm looking how information flows at that
level. Yes?
>>: Okay.
>> Todd Austin: You could do this analysis at higher and higher levels, and it would be
more, as you have more type information, you have to check less, because the language
itself provides you guarantees. Unless the language has bugs in it, in which case, you
don't have those guarantees.
>>: Like C.
>> Todd Austin: Exactly.
>>: You do have to preserve type information like [inaudible] and stuff like that. So the
language does have to provide enough information that you can check it.
>>: We'll all be doing binaries, there is no like…
>>: Well and that's what I'm curious about if you're doing [inaudible] binary…
>> Todd Austin: Binary is so--if we have like debug information and we have
instrumentation in the [inaudible], we can rematerialize most of the type information.
Some of the type information we generate on the demand based on how you access the
variable. Yes?
>>: I'm trying to [inaudible] seems like maybe one [inaudible] is the actual checks that
are being put in, if found what we were discussing…
>>: We are using all these techniques. They are running 100 machines as we speak in
the [inaudible] but in addition to that we address the other hard problem, which is hard
generation, how do you get basically, so that's why we regenerate [inaudible] but also we
regenerate new tests.
>>: Right.
>>: Also another thing here you can have false alarms because your simple execution of
a very hard program could basically, you could have [inaudible] so in addition you
generate the test, then and you run the program, and only then if you find the bug, then
you basically tell the developer because of the [inaudible] false alarm.
>> Todd Austin: There's a few cases of false alarms.
>>: Using these techniques quite effectively and we have been trying to extend them and
combining them to the next generation. So I mean, it's really exactly, I mean it's very
related.
>> Todd Austin: This work was done in 2002, just to put it in context. I think I'm one of
the earliest people to do this [inaudible] style of execution. I only present it here just to
sort of put context around the work that I'm doing today. But I know people have gone
way past this stuff and I'm sure people here…
>>: [inaudible] so I'm sure that you proceed [inaudible] to extend this work. I mean one
is to test the new generation in a closed lab [inaudible] what about in the field, not
guaranteed of doing so we have…
>> Todd Austin: So let's see my proposal for going to the field, okay? I definitely want-I know I'm meeting with a bunch of you out there and I definitely want to hear about
more of what's going on here. That's why I'm here. I want you guys to tell me what's
crap about this and what's good. Because I knew that I would that here.
[laughter]
>> Todd Austin: All right so, Testudo. The approach is, it's different than traditional
heavyweight analysis techniques where we're going to take a program, we're going to
send it into our installation framework, get something that's fully instrumented, send it to
the in-house server, we’re going to run those analyses. It's going to take quite a while.
We'll find some bugs, and we'll fix those bugs. With Testudo, were going to take a
program, were going to instrument it, we're going to deploy it to all the customers, and
we're going to use a control system to limit the amount of analysis that occurs to a set
amount of CPU and memory overhead. So we have to devise an analysis that we can
decompose sufficiently that we can throw away information, but guarantee forward
progress so over time but also, also limit the amount of overhead. And I'll show you that
for this particular input bounds analysis.
Over time running at virtually full speed, customers will run into bugs. The approach
we've taken today is completely uncoordinated, completely random. If people stumble
over bugs, they have the option of reporting them back and then we can push out updates
and fix the bugs, in all the customer base. And customers, of course, are never happy, but
hopefully using this technique their frustration will start subsiding.
So let's take a look at this. I want to present another piece of code that I'm going to
analyze. It's going to read some external input and then just do some computation. But
this time I'm going to present it as a data flow. And the reason I'm going to do that is
because the way we're going to optimize these analyses is by recognizing that the
analyses, a vast majority of security vulnerability analyses, are tracking data as it flows
through your program and by sampling paths on data flows, we can make forward
progress while limiting the amount of work we do.
So I read some data to X. I compute the value of Y. I compute the value of A. ZY plus,
okay. So note, this was my externally input data; these are all the particular operations I
check. But note if I just follow one of these paths from the start to the end, and ignore
everything else, I actually make forward progress on my analysis. And that's the
decomposition mechanism I am going to use. I'm going to manage my Metadata in a way
where I throw away information, if I have too much load on the system. And I make
forward progress for at least one path every once in a while, limiting CPU and memory,
overheads, so that users don't complain about our analyses. And then push that as widely
as possible. And then we are going to see how many machines I need to implement that.
So, for example, doing a sample data flow analysis, you know if I analyze X and skip Y
and A, I can no longer analyze Z because I have no Metadata. I can't analyze Z and when
I get to this Z I can analyze that Metadata if I choose to do so. The asterisks is showing
me how far I got. I can run this again. If I get the same input to the same piece of code, I
hit this data flow again, this time I go down this path. I analyze the A, I analyze the Z,
decide not to analyze Y, decide not to analyze this Z. And then in the third pass, I do it
again. Now because I'm uncoordinated with this initial approach, I can get a lot of
overlap here, I can analyze things a lot of times over and over again. But over time I
should get very good coverage on the paths that are being executed in the program.
How do I limit the cost to storage? Well I only need one Metadata value in the system to
make forward progress on analysis, one single Metadata value. So for example, if I have
a structure which I call the sample cache, which is tracking one particular variable in the
system, if I'm tracking nothing here, if I track X, I can pass--I can then overwrite that
with the Metadata Y. I no longer have this Metadata but I did analyze this node here.
Now I have Metadata here. I decide not to analyze A. I replace the Metadata Y with Z.
And now I have Metadata here and now I can't analyze Y, because I no longer have
Metadata for Y. I can't analyze Z; I no longer have Metadata for X, but with a single
location I've done one path, right? Because I just need one location to hit each one of
these paths.
Now if I build this sample cache, and I randomly replace in it, a non-deterministically
replace in it, then with a population of users I can get coverage on all of the data paths.
But it's got to be random, like different things every time, and nondeterministic, so that if
two users run the program with the same inputs, they don't select the same paths. If I
have more than one entry, I'll get better coverage on these because I can store more paths
at the same time. Each entry will allow me to track one path at a time. But if I get too
much load on the system, I can always choose to invalidate entries out of my Metadata
cache to reduce the amount of workload I have doing analysis. So I have a mechanism to
throw stuff away and see less work until I get to one entry, and I'll keep that one entry
around just to make sure that I make forward progress on my analyses. Yes?
>>: So in your example, you get these unique names for everything, right? You've got a
Z there and something else. Don't you have to track the full path to know whether you
seen this particular Z before?
>> Todd Austin: Well I know --but let's say I hold Y here, I know that I have--if I have
only one Metadata value, I know that I have reached on a path from some input that I
declared it is interesting to this particular value. And as I see Y propagate to other values
I can randomly choose whether or not to take those new values or hold onto the value of
Y.
>>: But there might be more than one way to reach the Y equals X times 1024, and you
want…
>> Todd Austin: That's true. And that's another data flow. And as I see that other
dataflow I can get coverage on that as well.
>>: I guess what I'm confused about is you say you're keeping a single value there, but it
seems to me like you have to[inaudible] path you're looking for is. And it's larger than a
single [inaudible].
>> Todd Austin: No I don't have any information about it. I'm just randomly selecting
where to go to next. And, you know, what this becomes is this classic statistical problem
called coupon collector's problem. With no state but randomly selecting where to go to
next in a graph, I always get to all the leaf nodes eventually. It's not very efficient. And
the approach I have here isn't particularly efficient. And my students are working on
better techniques, random algorithms that could do a better job of covering this, but I just
want to make the point today, one data value I can make coverage.
All right. So the point here is individual analyses are very cheap. I'm going to scale
performance on many runs with many customer machines and I'm going to increase the
size of this cache to cover more flows at the same time. And that cache, by invalidating
that cache is a powerful mechanism to reduce the amount of analysis that I have, and to
reduce the cost of analysis on any individual machine. Now there are two
implementations of Testudo. There is the published one which is a hardware
implementation; it was in micro 08, and then we're publishing a paper next year on the
software version. I'm going to talk a little bit about the software only version. But first I
just want to present the hardware approach to Testudo.
I'm just walking down the pipeline showing you what I need to implement [inaudible].
First I need Metadata for registered values. Anything that lives in a register I want to
attach Metadata to. And we don't sample the data on registers; every register gets its own
Metadata. And what this information is it's just the pointer to kernel memory that tells us
where the actual Metadata is stored. Because we don't really presuppose what the
Metadata is in the system; we just track whether any particular data value has Metadata
and what the pointer is to that value.
In the execute stage we need to have the ability to propagate Metadata and to remember
how that was propagated. So that if we get a variable A added to Y, we need to be able
to produce the Metadata for the result. Now sometimes this is very simple. For example,
if you're doing taint analysis if there's one input that's tainted, the output is tainted and
you are done. If you're doing something like symbolic analysis, you've got a go and
compute what's the calculus of A+ Y on this Metadata and what's my new Metadata.
And we'll see later in the pipeline we have a place where we can initiate a kernel call to
actually compute that data. So we don't really presuppose what that analysis says.
>>: So you're doing this at the instruction level, right? So you don't necessarily have a
connection between a register value and a variable, right? So don't you have to look up
the debug information every time you do that?
>> Todd Austin: Yes. So for example, when Metadata materializes is where we'll get
most of that information. So when we take the address of something or when we create
some new piece of storage that's where we'll get the majority of that Metadata. And then
the way--when the Metadata mixes within instructions we've got these kernel routines
that will decide how to put stuff together. And I'll invite you to look at the Usinex
security paper and you'll see that there's a big table that shows for every instruction what
that [inaudible] is.
In the [inaudible] we have a sample cache which is simply a hardware cache that holds
Metadata pointers. And they are associated with a particular address, physical address so
if there's some Metadata attached to a particular memory address, and we do a load or
store that'll materialize in the registers. And this sample cache is small. It's typically on
the order of 128 or 256 entries and it's randomly replaced. So we select the entry
randomly and non-deterministically. So when we're manipulating this we've got to, we
have to have some source of true random information in the system so that when we have
separate runs on separate machines, we don't see the same updates in that cache so we get
good coverage on the data flows.
Fortunately in many hardware today there's excellent sources of random information, for
example, Intel processors, many of them have the ability to turn thermal noise into
random numbers and those are very useful in this approach. And then finally at the end
of the pipeline when we retire the instruction, the instruction may have done something
that is beyond the scope of what the pipeline can do, so we have this policy map which
basically allows us to say, for example, if you have two pieces of Metadata on this
opcode then I want to kernel interrupt that goes to this address and allows you to emulate
more complex manipulation of Metadata as instructions arrive.
Software support for Testudo is, first we have a OS level controller which is going to
non-deterministically limit new analyses and fan-out, by watching the overheads in the
system. If the overheads are low we're going to be trying to increase flow selection so
that when we see new data created we are going to start following it, and we are going to
preserve fan-out. So that when we see new values coming out of a particular value, we
create new Metadata for it.
When the threshold gets too high we're going to decrease flow selection. It's less likely
that a new flow will get analyzed and we're going to reduce fan-out by invalidating
entries of the sample cache until we get to the point where we only have one data flow,
and we preserve that one data flow. Even to the point of violating the constraints on CPU
and memory overhead, and then once that's gone we'll wait until we get back below that
max load and go back into deciding whether to increase or reduce the flow. In addition,
there are special instructions in the architecture that let us mark things that should have
Metadata initially. And in our implementation those are in the device drivers. When
stuff comes from network or from keyboard or from external or from other external
sources it gets marked.
All right. How we do analysis of this? So we took verged check cynics and we ran a
bunch of programs that had some exploits that we could create in them. And we ran them
on our simulator. The problem with our simulator, it tends to be slow. And we wanted to
get coverage over many, many, many thousands of runs. So what we did is we wrote out
as we executed these experiments the data flows that we saw, just the data flows
themselves, and then brought them into a Monte Carlo simulator which would implement
many different analyses of those particular data flows for the sample cache. So one
particular run produces a payload of data flows that we do thousands and thousands of
analyses in this Monte Carlo simulator for the sample cache to see how we get coverage
over time.
How many runs do I need? For some programs I don't need a lot of runs to get full
coverage on the data flows I saw, on the order of hundreds of more machines. And with
a larger sample cache, 64 instead of 32 even less, because I can track more data flows at a
time. For other programs I need more like this SQL injection, I need as many as 17,000
runs to fully cover the data flows that I saw in the original run of the program. This
number right here is a 95% confidence that you've covered all of the data flow paths that
you saw in the original run of the program. So if you run your experiment, you've got a
95% probability of seeing this is the case.
>>: So what is the [inaudible] why so many in the case of [inaudible] cases?
>> Todd Austin: It has to do with the depth of the data flows and the bushiness of the
data flows and the size of the sample cache and the amount of analysis that you have to,
the payload of analysis. So if you have unlimited analysis and a huge sample cache, you
can get very good coverage. But as you start to tighten the amount of stuff you can look
at, tighten the amount of overheads that you can tolerate, you need more and more runs.
Yes?
>>: So on the initial run of PDF how many [inaudible] did you run?
>> Todd Austin: PDF, we just ran one execution of the program with one exploit.
>>: Okay.
>> Todd Austin: So this is--how many times to get coverage on that particular run?
>>: I see. Okay. But there could be other as [inaudible].
>> Todd Austin: Many, many, many, many [inaudible], right? So I want to cover many,
many, many paths.
>>: So let's just try and get through with what the explosion factor is [inaudible].
>> Todd Austin: Exactly.
>>: Okay. Thank you.
>> Todd Austin: So the leverage that I'm going to get in this system is, you know, for
example, Apache Web servers start at 570,000 times a second. So if I can put this
technology into a widely used program, over time I'm going to get a huge amount of
leverage. And over time I'm going to work to try and make this even more efficient.
>>: So you have a false positive problem though right?
>> Todd Austin: Not a very large false positive problem.
>>: But, but I think it the end, in the end the question becomes so you're having
hundreds of thousands of this data flowing in, how do you prioritize? How do you know
that something is really a good bug to go after, versus something like the other hundred
million that that you [inaudible] that are not…
>> Todd Austin: So one thing we have is we have excellent information about how
likely the path you're on is, which is a very powerful piece of information, which gives us
some priority information. Yes?
>>: One of the fundamental assumptions here seems to be that control paths don't, don't
change before the exploit would happen, whereas a lot of [inaudible] we've got i.e. 9, oh
and it still has gopher support in it, and so it turns out that no one's used gopher in 15
years and so I'm going to come up with some bug in the [inaudible] gopher parses
something? And so in that case just by sending gopher [inaudible]//, or whatever it is, the
attacker has already taken you off any path that any non-adversarial users actually take
you on.
>> Todd Austin: Excellent point. So this really only gives you good coverage on paths
that customers run, or on widely exploited bugs. So you see, you actually see good
coverage on the bugs, the attacks themselves. But I got a foot in that game as well,
because I get excellent information about what customers do and I can use that
information with static analysis to figure out feasible paths that customers don't do. And
I think ultimately I can leverage the information that I can gather in the field to really
understand what are the possible paths that are not executed, which I think are another
great source of bugs, as you point out. And then once you--then you've covered all the
paths, right?
>>: [inaudible] I mean wouldn't you say that this is not so much [inaudible] functionality
of the bugs [inaudible] look up bugs and nobody else [inaudible] do you think that
[inaudible] functionality of the bugs [inaudible].
>> Todd Austin: I guess I don't see the distinction. What's the distinction between
vulnerability and functionality bugs? I don't understand the terminology, but I do
understand that this technology is going to be limited to paths that get executed.
>>: [inaudible] the super duper say symbolic [inaudible] how the users at large are using
these specific things that a lot of them are some that you can get leverage off that. But
the security may not be what you're after for that machinery, because many secret
[inaudible] never, never, never executed, but still being shipped for all kind of reasons
and they race a signature that hitting those in a while actually can be rare in some cases
and I was going to ask you do you have evidence of that. I mean for instance to you
know of any study linking [inaudible]? And how many of them were in the 350 try verse
paths versus [inaudible]? I mean I do not know.
>> Todd Austin: I don't know that either. And I think that that's an excellent question.
>>: [inaudible] what we already have encapsulated in a test case which already existing
technology would actually...
>> Todd Austin: Well that, that's true if you're test coverage is equal to your customer
coverage.
>>: So this actually is, this is how you build a Testudo [inaudible] functionality?
>>: [inaudible] there are so many different zillions of patterns that define a [inaudible]
whatever you name it, we have, I mean it's very hard to test, to know exactly what's going
to, and so this actually is going to…
>> Todd Austin: You guys can answer this question, right? How many bugs do you fix
on popular parts of the code, and how many bugs do you fix in the crafty code? You
ought to have those stats, right?
>>: But we have agreed to [inaudible] and so that's another story.
>> Todd Austin: I want to hear about that.
>>: [inaudible]
>> Todd Austin: Pardon?
>>: What he's saying is we know where the bugs in the popular code are. So it's very
easy to fix, well it's not easy to fix those, but there's a long queue of those to get fixed.
>> Todd Austin: So all of the new bugs, all of the new bugs you are seeing are in the
crafty code?
>>: No, no…
>> Todd Austin: I'm seeing some no’s and some yes’ s. [laughter].
>>: Well part of this goes back to dissension [inaudible] and we're trying to find
functionality bugs an and so if vulnerability [inaudible] computers usually running code
[inaudible] functionality bug could be something as simple as a misspelling on this form.
It's a bad customer impact because it makes us look like idiots? But you know it's not
going to let an adversary do it straight to the computer.
>>: So it's important to understand a little but about [inaudible] is basically is an online
crash analysis so whenever something crashes it will send something back to Microsoft
and if you say yes, okay. So what you see there is you see inputs that are going to
actually cause the thing to crash. You're seeing things that aren't the ones that cause the
crash but could, and we're seeing the ones that really did. I think what you see from this,
from the ones that actually caused it to crash, you see a distribution and so we see we can
see the numbers just like you will see the numbers. And so inside this one seems to curl
off and this one doesn't [inaudible] and that allows us to prioritize. But given that we
have real crashing things and we know how frequently they happen. We can't fix all of
them. So we're already well aware, so here's where, we have 10,000. Here's the top 100
that we’re going to fix. I guess my question is, if we had this initial information to feed
into that pool…
>> Todd Austin: I see. I see your point.
>>: Yeah, really would it help…
>> Todd Austin: That's an interesting thing to think about. That's a very good piece of
advice.
>>: He's got all the information about, you know he's doing [inaudible] analysis then
he'll see inputs that you never saw in practice that are still are causing [inaudible].
>>: Something we may be wanting to think about [inaudible] streaming them back to the
mothership over time [inaudible] build up a bug symbolic sketch [inaudible] over time
[inaudible] and maybe that'll help you find…
>> Todd Austin: That's something I'm definitely moving towards here. Okay. Good
thank you very much. Excellent feedback, excellent feedback, that's why I'm here. All
right. How much does it cost? The hardware for this isn't that expensive so we looked at
the cost of 256 for 1k caches on, relative to the size of AMD phenom or and
UltraSPARC, doesn't really hit cycle time that much, and fairly low in terms of
percentage in terms of area cost. I was in Intel Israel two weeks ago telling them about
this technology. You know Intel; all of the action is in Israel. That's where they do the
processors now. And you know I was trying to push this technology and somebody
pulled me aside and said, you know, we're building this already. For software-based
transaction [inaudible] so there may be, there may be some synergies between that and
this that will allow technologies like this to be rolled out [inaudible] pretty excited about
it, that possibility.
Let me talk to you a little bit about future work on this. Just recently got a paper into
CG-011, which is 100% software implementation of this technology. One of the
challenges of seeing how well this works is you got to really roll it out and see how well
it works. It's hard to do it with just Monte Carlo analysis. So what we built is a lamp
stack that uses this technology. So we've got Lynx OS, we're connected into the lamp
stack through analysis aware drivers that can mark data, and then this runs on top of the
Zen hyper visor which has a sample cache implemented with shadow paging. So I just
use a virtual memory system to track my data. And it's a little more cumbersome,
because I have to throw away whole pages of data when I won a throwaway analysis, or
pare down a page to a single value if it's my last one. My load controller will decide
when to throw away data, when not to initiate new analyses to control my loads. And
then when I want to do analyses, I shift over to QEMU demand driven analysis, so
actually going to do interpretation under the kernel to figure out how to propagate that
information through registers.
The downside of this is, it’s about, you know, quite a bit more expensive and I'm going to
need more runs to do the same amount of coverage. The advantage is eventually I'll be
able to deploy this. Another thing that I'm very interested in engaging on and I'm
currently starting work on this, is using the technology that we can harvest in the field to
find unlikely feasible paths using static analysis. Because I think I can gain a lot of very
good information about how the program is used and a lot of symbolic information to
help you find feasible paths in the program.
And then generally what I need to implement this technology, and generally what you
need to implement a variety of vulnerability analysis, is you need efficient fine grain
memory protection. What we really need is to abandon pages of 4K and I want to be able
to mark bytes as code and data, and I want an efficient mechanism. So my student Joe
Greathouse is at the end of his thesis is really working on coming up with efficient fine
grain memory protection techniques. And there's just a variety of things you can
implement. You can do garbage collection. You can do software-based transactional
memory. You can do security analyses. You can do security attack prevention, varieties
of things.
So I think there's a huge benefit to try and revisit the virtual memory protection system
and try and make it more fine grain. Yes?
>>: Yes, there was a recent Intel workshop where the discussion was around whether
Intel should support some kind of [inaudible] and it was sort of more broad [inaudible]
work. But the bottom line is yeah, any interaction that you can have to give Intel reasons
to do this, would be great. I mean I think that the fact that they sponsored the workshops
means that basically some believe internally that they need to be looking at [inaudible].
But it's been a hard sell. I mean this same story was true 15 years ago, right? And
they've heard it. It's not like they don't know that fine grain memory actually [inaudible]
mechanisms are going to give you, give some [inaudible] software, but I think the
economic argument around how it helps customers is still not there.
>>: Well it is true that all the benefits were there 15 years ago but this is unfair and
exaggerated [inaudible]. They no longer have anything better to do.
[laughter]
>>: But given that, they're still asking the question: who's going to buy this? I mean
fundamentally, and in fact the discussion at this workshop was more on could we create a
skew of our processor that we sold for more money and only targeted developers. In
other words, because developers are the ones that really want this and your average
customer doesn't see the value. So how much more would a developer pay, or would
Microsoft pay…
>> Todd Austin: You know, you know Mark…
>>: He is not the Microsoft employee…
>> Todd Austin: I know who he is. I know who is.
>>: [inaudible multiple speakers]
[laughter]
>>: Just kidding. This is being recorded here, right?
[laughter]
>>: Who cares about Intel, right? [inaudible]
>> Todd Austin: I know, I know, I know. So when I presented this to Microsoft, they,
they felt uncomfortable. I don't think we put a security, something to help find security
bugs in our processor, because that would just imply that we have a lot of security bugs
on the platforms that we use our processors in. Which seems like a very, seems like an
odd thing to say. But from a marketing standpoint, you know, people might fear a system
that has to fix bugs, right? So my idea is you should just make it, what you, what you
should do is say this is branded extra security, extra safety technologies.
>>: I think that's the fundamental issue, is that if you could show a benefit to an
individual customer, that's really good. And they’ll put it in, and they could go to market
with that. But you what you're talking about are developer tools. So what makes the
software better, but it's only indirect. It's not like if I pay more for my processor I don't
get the benefit of this. Everyone gets it.
>> Todd Austin: Here's the way you get the benefit of the customer: we've got great
information on the likelihood of the path after the bug is found, the potential bug. And
what you do is if the likelihood drops below a certain level, you just, you say, hey, here's
a $50 off coupon if you hit send for your next Microsoft product. And then it becomes, it
becomes a valuable [inaudible].
>>: So you're saying [inaudible multiple speakers].
>> Todd Austin: The problem with the attackers, they're just one, right? The attacker
isn't the hive. They're just one person in the hive. So their probability of finding these
are very, very [inaudible].
>>: [inaudible] talking about the software [inaudible] and what you think it could get
down to if you really [inaudible]?
>> Todd Austin: So I'm not going to go into details on this yet, but it's about three times
as many runs to reach the same level of coverage.
>>: So for me as someone running on top of this? Do I see a 2X slowdown or 5X
slowdown?
>> Todd Austin: All of our experiments are limited to 5% memory and 5% CPU.
>>: Okay great. Thank you.
>> Todd Austin: So it's just a shaving.
>>: Great. Thank you.
>>: Okay so over the last five, I don't know, seven years or so we've been receiving
[inaudible] these low-level memory [inaudible] an increase in vulnerabilities like well
[inaudible] problems now and so on and so forth and keeps evolving, so are your
thoughts on applying techniques that [inaudible] ?
>> Todd Austin: So anywhere--this work is valuable anywhere where you have data
flow. You have invariance that you can either infer or exist. In my case they exist.
Maybe you can infer those invariance. I know there's a lot of work in that domain. And
overheads are a concern. But then I think these kind of sampling techniques can work at
anything. And it's a dynamic analysis.
>>: [inaudible] it is the same story and actually in particular the users have any
[inaudible] they will go to the homepage they will go to the Gmail homepage. They're
not going to explore the preferences [inaudible]. It is the same story.
>>: So can you say anything about the relationship between [inaudible] and this kind of
stuff? [inaudible] security specifically but just in a general way. So I guess the question
could you implement much of what you are talking about the same mechanism…
>> Todd Austin: Not 100% familiar with lifeguards. What is--give me just a…
>>: So the high level take away is that it's a hardware channel that allows information
about things that are happening, instructions, you know things like which addresses are
being accessed etc., could be sent over to a separate core on the multi-core, and be
handled there essentially. So it's, you know, a high-bandwidth way to collect data in the
running processor…
>> Todd Austin: So that would be a good mechanism to implement this kind of analysis,
right? So in a sense all those policy map calls could go over to another, could go over to
another [inaudible]. Okay. And then lastly, I'm really interested in adapting this and
other analysis techniques. The core of the Testudo work is not really security. We've
applied it to security, but it's really data flow analysis. I'm currently Joe my student is
applying this to finding Reese bugs in parallel programs, which is another very good data
flow analysis that's super heavyweight. Yes?
>>: So let's bring this full circle. [inaudible] so what are your thoughts on moving this
into the data center and trade the cc…?
>> Todd Austin: I mean it's even more ideal, right? Because you have lots of systems
running in many times your software, and in addition it's in your own domain so that you
have more ability to provide information that's necessary to improve the analysis, reduce
the cost of analysis. For example, I can provide a lot of type information, a lot of
invariance that I've collected over time that I wouldn't push out to a customer because
that information might be privileged or important and I don't release that intellectual
property, but now since it's local I can do even better. So I see this in the data center as
being even more powerful technology.
All right to conclude, if you want to beat the hackers, find the bugs they love and fix
them before they do. Get rid of those zero day exploits. I talked about three
technologies, Metadata restores programmer’s intent, and we need a mechanism to
manage this efficiently. Input balance checking is an example of a security vulnerability
analysis that finds those bugs before you release software. The advantage of this
particular technology is that it finds exploits without an active attack. The problem with
these technologies is they're too expensive. They really slow down programs. So
Testudo is a technology that can roll these, roll these analyses out into the field and use
the customer base as a massively parallel system to find these bugs, in an uncoordinated
random way. So thank you very much. By the way I want to tell you with the Testudo
is. This is a Testudo. It's a Roman legion formation, where they take each of their
shields and lock them together to form one single protection for the entire mass. So that's
why we call it Testudo.
>>: [inaudible]
[laughter]
>> Todd Austin: A yeah. Of course. This is, yeah, these are your device drivers and
your hypervisor, which is unprotected. Thank you very much.
>>: Yes, so I just have a question. So you, you're doing a lot of buffer over close and
certain array out of bounds errors, but use after free is actually a major source of attack.
In fact, it's one of the things [inaudible] even if you completely remove all the buffer
overloads you still have a lot of vulnerabilities [inaudible].
>> Todd Austin: I mean that would fit great into this framework.
>>: My question is, can you do anything about that?
>> Todd Austin: That's a data flow analysis right there. The Metadata I need is whether
storage is alive or not. Like some capability associated with the variable. Let's say I did
it like we did it when I worked on SASE years ago. We generated capabilities for all
heap storage. We attached to the pointers and, we only propagated those capabilities.
And those capabilities were destroyed when the data was free. It's a data flow analysis
that I could sample in the system to try and find where when I dereference is that
particular capability is that mark still existing? That would fit into this style of analysis.
Thank you very much.
[applause]
Download