>>: Okay. It's a great pleasure to introduce Todd Austin. Todd got his PhD at the University of Wisconsin in 1996. He's currently a professor at the University of Michigan in the electrical engineering and computer science department. Todd's been doing lots of really interesting work in architecture for many years. One of his great contributions is the simple scalar toolset, which is widely used by the entire architecture community. Todd also got a Maurice Wilkes award for innovative contributions to computer architecture in 2007, I think. And today he's going to be talking about using hardware to find security bugs. So that's something of interest to many of us here, and please welcome Todd. >> Todd Austin: Thank you. Thanks, it's a pleasure to be here. Actually it's been four years since my last visit. Yeah, so my background is in computer architecture. Most people think my background is in computer architecture, but really my research is all about finding and fixing bugs and faults. So I work a lot of fault tolerant systems; I work in finding software bugs. And when you work in software verification, you can't help but overlap heavily with security because a lot of security bugs--a lot of vulnerabilities are the result of software bugs. So today I want to talk about some work that my, two of my PhD students have been working on. A few years ago Eric Larson, who, is now at Seattle University, worked on some work on security vulnerability analysis of programs. I'll talk a little bit about that. And then my current graduate student, Joe Greathouse, has been working on developing techniques to scale the performance of super heavy weight, but super powerful security vulnerability analyses. And I'll talk about that work. That's called Testudo. This is also joint with other students and faculty as well. All right. So I'm sure I don't have to spend much time on this slide but, you know security vulnerabilities are a big problem, and I'm sure at Microsoft everybody knows this. But I will show you one thing. Yes, I picked on Microsoft Windows, but that's not the only system that has bugs in it, too, that people can exploit. Linux is more and more being found to have lots of security bugs. And even simple devices, you don't even need an operating system, simple devices like our FIDs can be attacked as well. Many security vulnerabilities are the result of bugs in software. So if you can fix these bugs, you can eliminate the vulnerability. As opposed to trying to detect the vulnerability and stop it when it occurs. If you fix the bug, you don't carry the payload of trying to find that vulnerability and fix it at runtime. Let's take a look at an example of one security vulnerability bug. A classic one is the buffer overflow attack. So I got this piece of code here, which it's a function that's got some variable on the stack, some local variables as well. And it reads some input from an external source. And the reason why this is a bug is because this particular function here doesn't limit the amount it reads to 256. Now somebody comes and injects data into this read input call here and it's less than 200 integers, it only partially fills this buffer and everything's fine. The program intended--runs as it was intended. But if somebody reconstructs the protocol, for example, and just injects more data into it, and even violates the purpose of the protocol, say injects more than 256 integers in here, what happens in this particular attack, is it overwrites the buffer. Then the local variables above it and eventually gets the return address. The nature of this particular attack is make this data that you injected into the program code, and then try to figure out the address to jump back into your own injected data. Once you've done that you've done what you fundamentally need to do to implement this attack, which is redirect control from external data. And then when that's done, you can take over the machine. There's a variety of things you can do in this buffer to take over the machine. So how do you fix these vulnerabilities? What's the classic approach that's used most widely today to fix these bugs that become security vulnerabilities? And it's as follows: write your application. Deploy your application to your customers. Let people attack your customers. Customers get upset, complain to you. Debug the attacks. Then fix the software, and repeat. Now this is effective but it has some downsides. One, you get upset customers when they get attacked. And two you get embarrassed software vendors, because some of these bugs can be, frankly, can cause a lot of exposure. So what I want to talk about today is a better way to attack these bugs, and that's through security vulnerability analysis. Let's try and find the bugs that hackers love before we release the software. But there's some challenges to this. But here's the basic approach. Develop your program, employ security vulnerability analyses. So there's a variety of these; there's many different techniques. I'm going to talk about one today that I helped to develop, a variety of techniques. Employ this in the lab, and then debug whatever exposed vulnerabilities you find, continue to develop your program, and then iterate around this cycle until you decide to deploy. The advantage of this approach is that you'll find those bugs before you put them out into the field. But there's, as were going to see, there's a couple of problems with this approach. Well, so it's good for your customers, because you get more of those bugs out. But the downside is security vulnerability analysis tends to be extremely heavyweight analyses. And when you're running very heavyweight analyses that slow down programs by hundreds of times, you tend to be myopic in what you can test. Because you simply don't have the resources to fully test the program. What does it mean to fully test a program? It means to cover all of the feasible paths that the program can possibly execute. And there are billions of paths in non-trivial programs. So if this is slowing down your program hundreds of times, you probably won't even be able to get to your own test suites, let alone a test suite that, let me give you a specific example, a fuzz testing suite, thinking good coverage on paths in your program. What I am going to talk about today, is I'm going talk about a technology called Testudo, which pushes those analyses out into the customer base. And runs them on every machine, every time the program is run, using a data flow sampling technique that can limit the amount of CPU and memory resources it takes to do these tests. So that forward progress is made, but it's distributed across a large customer base. The great thing about this approach is it works well with the hacker bug economy. Which is as follows: Hackers want bugs that occur on many, many machines. So that when they devise an attack, they can grab as many machines as possible. In the same scenario, our analyses were run on many, many, many machines. So popular programs will get the most amount of coverage with the security vulnerability analyses. And my conjecture is that we, using this technology, we can get ahead of the attackers, and find bugs before they do. Yes? >>: What if you have your time? What if you're using someone else's machine [inaudible]? >> Todd Austin: Yeah well, so like all technologies for finding bugs, you can use them both for good or bad. Right? So attackers could use these technologies as well. One advantage that I have over the attackers is that I have the advantage of my entire customer base. But an attacker could definitely use this technology also, say to run these analyses on a botnet for example. Good point. And the key point here is, take the criminal out of your design cycle. I think this is a really good approach to developing software and one that we should all strive for. All right. So here's what I'm going to talk about today, I'm going to talk about three technologies that I, that I've worked on in my career. First I'm going to talk about Metadata. Metadata is an important aspect of security vulnerability analysis. It is information that is attached to program data that helps restore programmer intent in the program. As we run programs we throw away a lot of information, and we need to re-materialize a lot of that information. And Metadata stores locations are where we're going to store that. Then I'm going to talk about one security vulnerability analysis that I helped to develop a few years ago, called input bounds checking, which is going to try to determine if the check on data coming from external sources are sufficient that you are limiting the possibility of dangerous memory accesses or changing the control of your program without your knowledge. And then finally I'm going talk about my most recent work which is the Testudo project, which is this just dynamic distributed debug. How do we take these super heavyweight analyses and push them out into the field to try and scale the performance and see more of those feasible paths that our customers are going to execute to try and find more bugs? Let's take a look at Metadata. When you run a program, unfortunately the programmer puts a lot of information at the source level about what it is that he or she wants to do. But it really gets unceremoniously discarded by the compiler and the runtime system. To give you an example, here's a Metadata strategy I published many, many years ago called fat pointers. Which just tries to rematerialize intent behind how programmers use pointers. In a language like C, for example, a pointer just contains an address to a piece of storage. But it really is a lot more information there that would be useful to store if we could. For example, what is the variable that was intended to be pointed to? Whether or not the variable is live, whether or not the variable has some externally derived information in it, and whether or not that externally derived information has been checked. All these kind of properties, it's nice that we have a place to store them. So what you'll see is in security vulnerability analyses, you'll see a need to declare, store and efficiently manage Metadata. And so will see some of that later in the talk when we look at the Testudo hardware implementation. All right. With the Metadata we can start to do a security vulnerability analysis. And there's many of these that are available. Let me show you one that I worked on called input bounds checking. All right. So how do attackers look at programs? They basically look at them like big black boxes with a bunch of knobs on them. What are the knobs? The knobs are the inputs that I can input into the program. And for the vast majority of attacks, it is simply finding the correct sequence of turns of these knobs to cause the program to access memory that it didn't intend or change control in a way that it did not originally intend. That's the reason why fuzz testing, for example, worked so effectively at detecting programs, because with fuzz testing you’re just turning knobs randomly and looking if the program crashes. So a great source of bugs that we can find that can stop security vulnerabilities is just find stuff that comes in from the outside world, these knobs, that go to potentially dangerous operations like memory, indirect pointer accesses or changes of controlling your program, and finding out if they haven't been properly bounded. If we could find those bugs, we found a significant number of bugs in, in -- that hackers can exploit. So the approach that we use with this input bounds vulnerability analysis technique, is we take a program and at the low level it operates on, you know, integer data, and we combine it with a symbolic copy of the same program, that does the same exact operations, the ads, the subtracts, the loads and the stores, but instead of operating on concrete data, operates on symbolic data. So we take every variable, say some variable X has the value of two, a concrete value, and we clone it, we include Metadata with X, which captures the range of X, a symbolic value of X. This doesn't declare any particular value of X, but instead declares a predicate which describes the possible values of X. Okay. So X can be this value in the program and through the Metadata we know what possible values it can be and we'll see by propagating these symbolic values around and pushing them through computation, in a symbolic fashion, we can determine the exact ranges and values of X whenever we do a potentially dangerous operation. Now, if we also track -- whoops a little, hello -- yeah, if we also track the type information as well, which in some languages we may have, we may need to rematerialize this inside of our Metadata we can -- whenever we have a potentially dangerous operation, we can apply a proof engine at the D reference or the change in control, and try to prove, does this predicate violate the constraints of that type or that variable? And you can see it. This is a C array, has a value index is 01234, and this X has a range of 1 to 5, and we can see, yeah, there is a possibility that a bug could occur here. Now what's really powerful with this analysis is there's no attack in this 2. This 2 is perfectly valid. But in the symbolic side of the program we can see that an attack exists. And that's the most powerful aspect of this particular analysis. It finds attacks without an active exploit. It just -- because the symbolic side of the computation tracks the range over which the values could exist, and if there does exist a value that could provide an exploit, it will find it. To find all of the exploits for improperly bounded values, we simply have to have complete control flow coverage of the program. So we simply have to hit all the feasible paths. And we will find all of these particular exploits. I don't want to go into all the details of this analysis but I want to show you by example how it works, so you can see how symbolic computation in the background is a powerful way to find and exploit. Here's another program which has a bug in it. We've got an array of five elements. We’re going to get some input. We're going to check the constraint, the value of the input to see if it's within bounds, then increment the value, that's where the bug is. Because it no longer is within the bounds of the array, and then we have a potentially buggy operation. And so we've run the program, and we get the following values. 2 which is legal, passes our test, increment of three, no bug. But now look at the para -- the symbolic computation which occurs in tandem with the running program. When we get the value of X we see that the range of X is any legal integer. When we hit the predicate we know from the direction of the predicate here what the range of X is. X is now greater than or equal to zero, or less than or equal to four. And we can determine this by telling the direction of the branch, understanding the predicate that's associated with the branch, and then using that predicate, intersecting that predicate with this symbolic value to get the range. Then we increment X, and there's a calculus for manipulating all of the symbolic values. So that when I implement an X, I simply increment, according to the calculus, the range, the lower and upper bound of X. So now I know X is between one and five. Then I go to this operation, D reference the array. Then I run my proof engine, and I say, can this predicate produce a value which violates the type of that array, and yeah, it can. The value of five, so no active exploit, I find my bug. That's a very powerful technique. The way we implemented it is we took GCC, and we instrumented the code such that as it produced the individual operation, it also coupled them with the symbolic version of the same, of the same low-level operation. We then run a test suite. We look at our error reports, and then we go back, fix the code, and iterate around the cycle. So now we’re doing that security vulnerability analysis to try and get rid of our bugs. And in the world of security vulnerability analysis, when you read papers in that domain, every once in a while you see a paper that actually finds a really good bug. And our paper found, what I consider two super high quality bugs. We found two bugs in open SSH, that were fixed almost immediately. One was a bug; one was a buffer overflow attack. The other was an attack that occurred where you can use integer overflow to attack the system. But here's the downside of this analysis. Yes? >>: So this technique can find bugs [inaudible] if you actually leech the predicate, that would create the bug [inaudible] switch to [inaudible]. >> Todd Austin: I can combine my predicates, at the --I mean -- so I don't quite understand your point. >>: I think he's saying that you only find the bugs on branches that were followed during the program execution you've analyzed? >> Todd Austin: Yes. >>: And that's driven by the [inaudible]? Concrete values… >> Todd Austin: Correct. But let's say I cover all of my feasible paths. Will I find all the bugs? So I -- good point. So that's the challenge here, is finding all the feasible paths. >>: But there are exponentially many paths. >> Todd Austin: There are many paths. I mean I don't know how many feasible-nobody, nobody can, I mean we really don't know how many feasible paths are in programs. Yeah. They’re exponential with the depth of the control graph. There's a lot. So I need a highly scalable technique to implement this analysis. >>: What is your [inaudible]? >> Todd Austin: I'll get to some of that. Yes? >>: Are you familiar with type state checking? >> Todd Austin: Type state checking? >>: Yes. [inaudible]. It's a static compiler that proofs programs correct with respect to this kind of analysis. >> Todd Austin: Yes. Yes. Yeah. I'm familiar with that work and like the prefix work, and a variety of techniques. And those techniques are very similar to this, right? The difference, is that I'm analyzing paths that are run in the program, and those techniques are, are essentially symbolically running the program. Yes? >>: No. They are static provers. They compile a program and prove it correct. >> Todd Austin: But they have to do some level of symbolic computation to figure out what paths are possible in the program. >>: No. They are path insensitive. >> Todd Austin: Well if they are path insensitive, they're going to have tons of positive, of false positives. >>: False positives in the program that is a type state save would be rejected as possible I'd say. >> Todd Austin: Okay. So we can talk more about that. But in general static techniques cannot complete on nontrivial programs. Will you give me that one? >>: No. >>: I would give you that one if you admit that a dynamic can't either, right? I mean you can… >> Todd Austin: Exactly. A dynamic can't either. So what I'm talking about today is a scalable dynamic technique that can go further than any past dynamic technique. And with static techniques, we can do the same thing as well. Maybe we can someday meet in the middle, and find all the feasible paths in the program. So good comments, thank you very much. Yes? >>: So his Soros the only domain use in intervals? [inaudible] >> Todd Austin: Yeah, primarily intervals although we use a different style of analysis for strings. [inaudible] So the downside of this approach is it's really slow. These are some programs--these are their original runtimes, their runtimes with full instrumentation, and this is how many times slower they run with that instrumentation load on them. And, you know, in the best case about 43% slower, and in the worst case about 200%, or 200 times slower, so significant payload in the background there to do that symbolic computation. So it really limits the number of paths that you can, that you can analyze when the program’s running hundreds of times slower. So the point of this is that this analysis is very effective but it's extremely expensive. What I want is an analysis that I can hit all of my feasible paths on. So now let's take a look at the work that I've been doing recently, yes? >>: Okay so, if it's so expensive, how, but is it still a lot cheaper than doing a full static analysis that would, you know, find all possible paths. >> Todd Austin: Well, I'm unaware of a full static analysis that finds all possible paths that doesn't do enough abstraction that you have to either, sort of, have a lot of false positives or don't find certain kind of bugs. So and I'm unaware of any dynamic analysis that can hit all the feasible paths, because it's so expensive to do that kind of analysis. So what I'm going for is in the middle. Dynamic analysis that scales the many, many machines so I can hit more feasible paths than anyone has in the past. >>: [inaudible] I mean expression generation [inaudible]. >> Todd Austin: Yes. At the very end of the talk, I'm going to talk about some very new work that I've been working on, which is to try and take the set and the probability of feasible paths, and help static analysis drive better with that knowledge. Because I think ultimately the best solution is going to be partially dynamic, to find out what customers do, and provide a lot of predicate information about feasible paths, and then use that in the static domain to try and find paths that are almost impossible to expose without really clever inputs. Yes? >>: I guess one thing that's unclear is the… >>: If he runs the static analysis and you, you know, you don't want to reject every program and so you make some compromises. And so you're not going to find every bug, but you're going to find some bugs. And you're going to find some bugs. And I guess the one question that comes to mind is how much overlap? How many of the things that you've found here would have been found by a reasonable static analysis? Like a prefix or whatever. Maybe not just that one but there are these abstract [inaudible]. >> Todd Austin: Yeah. I don't know the answer to that question. We actually try to answer that for Eric Larson's thesis which was back in about 2005. And he tried to gather a lot of tools to do this sort of overlap coverage analysis. At the point--in the end these tools tend to be real fragile. And it takes a lot of deep knowledge to get different codes running on them. I think today with tools like [inaudible] and stuff, we could probably do a better job of that today. Although today and now I'm sort of stuck over in this [inaudible]. >>: Let's say actually does all those things you say that can be done, but it only does it on squeaky clean, clean languages, like none of the ones that we use. Are you--how pervasive programming language, grungy code stuff can you handle? >> Todd Austin: Well, today I do the analyses at the instruction mode. So I'm really looking at the lowest level of the machine. I'm looking how information flows at that level. Yes? >>: Okay. >> Todd Austin: You could do this analysis at higher and higher levels, and it would be more, as you have more type information, you have to check less, because the language itself provides you guarantees. Unless the language has bugs in it, in which case, you don't have those guarantees. >>: Like C. >> Todd Austin: Exactly. >>: You do have to preserve type information like [inaudible] and stuff like that. So the language does have to provide enough information that you can check it. >>: We'll all be doing binaries, there is no like… >>: Well and that's what I'm curious about if you're doing [inaudible] binary… >> Todd Austin: Binary is so--if we have like debug information and we have instrumentation in the [inaudible], we can rematerialize most of the type information. Some of the type information we generate on the demand based on how you access the variable. Yes? >>: I'm trying to [inaudible] seems like maybe one [inaudible] is the actual checks that are being put in, if found what we were discussing… >>: We are using all these techniques. They are running 100 machines as we speak in the [inaudible] but in addition to that we address the other hard problem, which is hard generation, how do you get basically, so that's why we regenerate [inaudible] but also we regenerate new tests. >>: Right. >>: Also another thing here you can have false alarms because your simple execution of a very hard program could basically, you could have [inaudible] so in addition you generate the test, then and you run the program, and only then if you find the bug, then you basically tell the developer because of the [inaudible] false alarm. >> Todd Austin: There's a few cases of false alarms. >>: Using these techniques quite effectively and we have been trying to extend them and combining them to the next generation. So I mean, it's really exactly, I mean it's very related. >> Todd Austin: This work was done in 2002, just to put it in context. I think I'm one of the earliest people to do this [inaudible] style of execution. I only present it here just to sort of put context around the work that I'm doing today. But I know people have gone way past this stuff and I'm sure people here… >>: [inaudible] so I'm sure that you proceed [inaudible] to extend this work. I mean one is to test the new generation in a closed lab [inaudible] what about in the field, not guaranteed of doing so we have… >> Todd Austin: So let's see my proposal for going to the field, okay? I definitely want-I know I'm meeting with a bunch of you out there and I definitely want to hear about more of what's going on here. That's why I'm here. I want you guys to tell me what's crap about this and what's good. Because I knew that I would that here. [laughter] >> Todd Austin: All right so, Testudo. The approach is, it's different than traditional heavyweight analysis techniques where we're going to take a program, we're going to send it into our installation framework, get something that's fully instrumented, send it to the in-house server, we’re going to run those analyses. It's going to take quite a while. We'll find some bugs, and we'll fix those bugs. With Testudo, were going to take a program, were going to instrument it, we're going to deploy it to all the customers, and we're going to use a control system to limit the amount of analysis that occurs to a set amount of CPU and memory overhead. So we have to devise an analysis that we can decompose sufficiently that we can throw away information, but guarantee forward progress so over time but also, also limit the amount of overhead. And I'll show you that for this particular input bounds analysis. Over time running at virtually full speed, customers will run into bugs. The approach we've taken today is completely uncoordinated, completely random. If people stumble over bugs, they have the option of reporting them back and then we can push out updates and fix the bugs, in all the customer base. And customers, of course, are never happy, but hopefully using this technique their frustration will start subsiding. So let's take a look at this. I want to present another piece of code that I'm going to analyze. It's going to read some external input and then just do some computation. But this time I'm going to present it as a data flow. And the reason I'm going to do that is because the way we're going to optimize these analyses is by recognizing that the analyses, a vast majority of security vulnerability analyses, are tracking data as it flows through your program and by sampling paths on data flows, we can make forward progress while limiting the amount of work we do. So I read some data to X. I compute the value of Y. I compute the value of A. ZY plus, okay. So note, this was my externally input data; these are all the particular operations I check. But note if I just follow one of these paths from the start to the end, and ignore everything else, I actually make forward progress on my analysis. And that's the decomposition mechanism I am going to use. I'm going to manage my Metadata in a way where I throw away information, if I have too much load on the system. And I make forward progress for at least one path every once in a while, limiting CPU and memory, overheads, so that users don't complain about our analyses. And then push that as widely as possible. And then we are going to see how many machines I need to implement that. So, for example, doing a sample data flow analysis, you know if I analyze X and skip Y and A, I can no longer analyze Z because I have no Metadata. I can't analyze Z and when I get to this Z I can analyze that Metadata if I choose to do so. The asterisks is showing me how far I got. I can run this again. If I get the same input to the same piece of code, I hit this data flow again, this time I go down this path. I analyze the A, I analyze the Z, decide not to analyze Y, decide not to analyze this Z. And then in the third pass, I do it again. Now because I'm uncoordinated with this initial approach, I can get a lot of overlap here, I can analyze things a lot of times over and over again. But over time I should get very good coverage on the paths that are being executed in the program. How do I limit the cost to storage? Well I only need one Metadata value in the system to make forward progress on analysis, one single Metadata value. So for example, if I have a structure which I call the sample cache, which is tracking one particular variable in the system, if I'm tracking nothing here, if I track X, I can pass--I can then overwrite that with the Metadata Y. I no longer have this Metadata but I did analyze this node here. Now I have Metadata here. I decide not to analyze A. I replace the Metadata Y with Z. And now I have Metadata here and now I can't analyze Y, because I no longer have Metadata for Y. I can't analyze Z; I no longer have Metadata for X, but with a single location I've done one path, right? Because I just need one location to hit each one of these paths. Now if I build this sample cache, and I randomly replace in it, a non-deterministically replace in it, then with a population of users I can get coverage on all of the data paths. But it's got to be random, like different things every time, and nondeterministic, so that if two users run the program with the same inputs, they don't select the same paths. If I have more than one entry, I'll get better coverage on these because I can store more paths at the same time. Each entry will allow me to track one path at a time. But if I get too much load on the system, I can always choose to invalidate entries out of my Metadata cache to reduce the amount of workload I have doing analysis. So I have a mechanism to throw stuff away and see less work until I get to one entry, and I'll keep that one entry around just to make sure that I make forward progress on my analyses. Yes? >>: So in your example, you get these unique names for everything, right? You've got a Z there and something else. Don't you have to track the full path to know whether you seen this particular Z before? >> Todd Austin: Well I know --but let's say I hold Y here, I know that I have--if I have only one Metadata value, I know that I have reached on a path from some input that I declared it is interesting to this particular value. And as I see Y propagate to other values I can randomly choose whether or not to take those new values or hold onto the value of Y. >>: But there might be more than one way to reach the Y equals X times 1024, and you want… >> Todd Austin: That's true. And that's another data flow. And as I see that other dataflow I can get coverage on that as well. >>: I guess what I'm confused about is you say you're keeping a single value there, but it seems to me like you have to[inaudible] path you're looking for is. And it's larger than a single [inaudible]. >> Todd Austin: No I don't have any information about it. I'm just randomly selecting where to go to next. And, you know, what this becomes is this classic statistical problem called coupon collector's problem. With no state but randomly selecting where to go to next in a graph, I always get to all the leaf nodes eventually. It's not very efficient. And the approach I have here isn't particularly efficient. And my students are working on better techniques, random algorithms that could do a better job of covering this, but I just want to make the point today, one data value I can make coverage. All right. So the point here is individual analyses are very cheap. I'm going to scale performance on many runs with many customer machines and I'm going to increase the size of this cache to cover more flows at the same time. And that cache, by invalidating that cache is a powerful mechanism to reduce the amount of analysis that I have, and to reduce the cost of analysis on any individual machine. Now there are two implementations of Testudo. There is the published one which is a hardware implementation; it was in micro 08, and then we're publishing a paper next year on the software version. I'm going to talk a little bit about the software only version. But first I just want to present the hardware approach to Testudo. I'm just walking down the pipeline showing you what I need to implement [inaudible]. First I need Metadata for registered values. Anything that lives in a register I want to attach Metadata to. And we don't sample the data on registers; every register gets its own Metadata. And what this information is it's just the pointer to kernel memory that tells us where the actual Metadata is stored. Because we don't really presuppose what the Metadata is in the system; we just track whether any particular data value has Metadata and what the pointer is to that value. In the execute stage we need to have the ability to propagate Metadata and to remember how that was propagated. So that if we get a variable A added to Y, we need to be able to produce the Metadata for the result. Now sometimes this is very simple. For example, if you're doing taint analysis if there's one input that's tainted, the output is tainted and you are done. If you're doing something like symbolic analysis, you've got a go and compute what's the calculus of A+ Y on this Metadata and what's my new Metadata. And we'll see later in the pipeline we have a place where we can initiate a kernel call to actually compute that data. So we don't really presuppose what that analysis says. >>: So you're doing this at the instruction level, right? So you don't necessarily have a connection between a register value and a variable, right? So don't you have to look up the debug information every time you do that? >> Todd Austin: Yes. So for example, when Metadata materializes is where we'll get most of that information. So when we take the address of something or when we create some new piece of storage that's where we'll get the majority of that Metadata. And then the way--when the Metadata mixes within instructions we've got these kernel routines that will decide how to put stuff together. And I'll invite you to look at the Usinex security paper and you'll see that there's a big table that shows for every instruction what that [inaudible] is. In the [inaudible] we have a sample cache which is simply a hardware cache that holds Metadata pointers. And they are associated with a particular address, physical address so if there's some Metadata attached to a particular memory address, and we do a load or store that'll materialize in the registers. And this sample cache is small. It's typically on the order of 128 or 256 entries and it's randomly replaced. So we select the entry randomly and non-deterministically. So when we're manipulating this we've got to, we have to have some source of true random information in the system so that when we have separate runs on separate machines, we don't see the same updates in that cache so we get good coverage on the data flows. Fortunately in many hardware today there's excellent sources of random information, for example, Intel processors, many of them have the ability to turn thermal noise into random numbers and those are very useful in this approach. And then finally at the end of the pipeline when we retire the instruction, the instruction may have done something that is beyond the scope of what the pipeline can do, so we have this policy map which basically allows us to say, for example, if you have two pieces of Metadata on this opcode then I want to kernel interrupt that goes to this address and allows you to emulate more complex manipulation of Metadata as instructions arrive. Software support for Testudo is, first we have a OS level controller which is going to non-deterministically limit new analyses and fan-out, by watching the overheads in the system. If the overheads are low we're going to be trying to increase flow selection so that when we see new data created we are going to start following it, and we are going to preserve fan-out. So that when we see new values coming out of a particular value, we create new Metadata for it. When the threshold gets too high we're going to decrease flow selection. It's less likely that a new flow will get analyzed and we're going to reduce fan-out by invalidating entries of the sample cache until we get to the point where we only have one data flow, and we preserve that one data flow. Even to the point of violating the constraints on CPU and memory overhead, and then once that's gone we'll wait until we get back below that max load and go back into deciding whether to increase or reduce the flow. In addition, there are special instructions in the architecture that let us mark things that should have Metadata initially. And in our implementation those are in the device drivers. When stuff comes from network or from keyboard or from external or from other external sources it gets marked. All right. How we do analysis of this? So we took verged check cynics and we ran a bunch of programs that had some exploits that we could create in them. And we ran them on our simulator. The problem with our simulator, it tends to be slow. And we wanted to get coverage over many, many, many thousands of runs. So what we did is we wrote out as we executed these experiments the data flows that we saw, just the data flows themselves, and then brought them into a Monte Carlo simulator which would implement many different analyses of those particular data flows for the sample cache. So one particular run produces a payload of data flows that we do thousands and thousands of analyses in this Monte Carlo simulator for the sample cache to see how we get coverage over time. How many runs do I need? For some programs I don't need a lot of runs to get full coverage on the data flows I saw, on the order of hundreds of more machines. And with a larger sample cache, 64 instead of 32 even less, because I can track more data flows at a time. For other programs I need more like this SQL injection, I need as many as 17,000 runs to fully cover the data flows that I saw in the original run of the program. This number right here is a 95% confidence that you've covered all of the data flow paths that you saw in the original run of the program. So if you run your experiment, you've got a 95% probability of seeing this is the case. >>: So what is the [inaudible] why so many in the case of [inaudible] cases? >> Todd Austin: It has to do with the depth of the data flows and the bushiness of the data flows and the size of the sample cache and the amount of analysis that you have to, the payload of analysis. So if you have unlimited analysis and a huge sample cache, you can get very good coverage. But as you start to tighten the amount of stuff you can look at, tighten the amount of overheads that you can tolerate, you need more and more runs. Yes? >>: So on the initial run of PDF how many [inaudible] did you run? >> Todd Austin: PDF, we just ran one execution of the program with one exploit. >>: Okay. >> Todd Austin: So this is--how many times to get coverage on that particular run? >>: I see. Okay. But there could be other as [inaudible]. >> Todd Austin: Many, many, many, many [inaudible], right? So I want to cover many, many, many paths. >>: So let's just try and get through with what the explosion factor is [inaudible]. >> Todd Austin: Exactly. >>: Okay. Thank you. >> Todd Austin: So the leverage that I'm going to get in this system is, you know, for example, Apache Web servers start at 570,000 times a second. So if I can put this technology into a widely used program, over time I'm going to get a huge amount of leverage. And over time I'm going to work to try and make this even more efficient. >>: So you have a false positive problem though right? >> Todd Austin: Not a very large false positive problem. >>: But, but I think it the end, in the end the question becomes so you're having hundreds of thousands of this data flowing in, how do you prioritize? How do you know that something is really a good bug to go after, versus something like the other hundred million that that you [inaudible] that are not… >> Todd Austin: So one thing we have is we have excellent information about how likely the path you're on is, which is a very powerful piece of information, which gives us some priority information. Yes? >>: One of the fundamental assumptions here seems to be that control paths don't, don't change before the exploit would happen, whereas a lot of [inaudible] we've got i.e. 9, oh and it still has gopher support in it, and so it turns out that no one's used gopher in 15 years and so I'm going to come up with some bug in the [inaudible] gopher parses something? And so in that case just by sending gopher [inaudible]//, or whatever it is, the attacker has already taken you off any path that any non-adversarial users actually take you on. >> Todd Austin: Excellent point. So this really only gives you good coverage on paths that customers run, or on widely exploited bugs. So you see, you actually see good coverage on the bugs, the attacks themselves. But I got a foot in that game as well, because I get excellent information about what customers do and I can use that information with static analysis to figure out feasible paths that customers don't do. And I think ultimately I can leverage the information that I can gather in the field to really understand what are the possible paths that are not executed, which I think are another great source of bugs, as you point out. And then once you--then you've covered all the paths, right? >>: [inaudible] I mean wouldn't you say that this is not so much [inaudible] functionality of the bugs [inaudible] look up bugs and nobody else [inaudible] do you think that [inaudible] functionality of the bugs [inaudible]. >> Todd Austin: I guess I don't see the distinction. What's the distinction between vulnerability and functionality bugs? I don't understand the terminology, but I do understand that this technology is going to be limited to paths that get executed. >>: [inaudible] the super duper say symbolic [inaudible] how the users at large are using these specific things that a lot of them are some that you can get leverage off that. But the security may not be what you're after for that machinery, because many secret [inaudible] never, never, never executed, but still being shipped for all kind of reasons and they race a signature that hitting those in a while actually can be rare in some cases and I was going to ask you do you have evidence of that. I mean for instance to you know of any study linking [inaudible]? And how many of them were in the 350 try verse paths versus [inaudible]? I mean I do not know. >> Todd Austin: I don't know that either. And I think that that's an excellent question. >>: [inaudible] what we already have encapsulated in a test case which already existing technology would actually... >> Todd Austin: Well that, that's true if you're test coverage is equal to your customer coverage. >>: So this actually is, this is how you build a Testudo [inaudible] functionality? >>: [inaudible] there are so many different zillions of patterns that define a [inaudible] whatever you name it, we have, I mean it's very hard to test, to know exactly what's going to, and so this actually is going to… >> Todd Austin: You guys can answer this question, right? How many bugs do you fix on popular parts of the code, and how many bugs do you fix in the crafty code? You ought to have those stats, right? >>: But we have agreed to [inaudible] and so that's another story. >> Todd Austin: I want to hear about that. >>: [inaudible] >> Todd Austin: Pardon? >>: What he's saying is we know where the bugs in the popular code are. So it's very easy to fix, well it's not easy to fix those, but there's a long queue of those to get fixed. >> Todd Austin: So all of the new bugs, all of the new bugs you are seeing are in the crafty code? >>: No, no… >> Todd Austin: I'm seeing some no’s and some yes’ s. [laughter]. >>: Well part of this goes back to dissension [inaudible] and we're trying to find functionality bugs an and so if vulnerability [inaudible] computers usually running code [inaudible] functionality bug could be something as simple as a misspelling on this form. It's a bad customer impact because it makes us look like idiots? But you know it's not going to let an adversary do it straight to the computer. >>: So it's important to understand a little but about [inaudible] is basically is an online crash analysis so whenever something crashes it will send something back to Microsoft and if you say yes, okay. So what you see there is you see inputs that are going to actually cause the thing to crash. You're seeing things that aren't the ones that cause the crash but could, and we're seeing the ones that really did. I think what you see from this, from the ones that actually caused it to crash, you see a distribution and so we see we can see the numbers just like you will see the numbers. And so inside this one seems to curl off and this one doesn't [inaudible] and that allows us to prioritize. But given that we have real crashing things and we know how frequently they happen. We can't fix all of them. So we're already well aware, so here's where, we have 10,000. Here's the top 100 that we’re going to fix. I guess my question is, if we had this initial information to feed into that pool… >> Todd Austin: I see. I see your point. >>: Yeah, really would it help… >> Todd Austin: That's an interesting thing to think about. That's a very good piece of advice. >>: He's got all the information about, you know he's doing [inaudible] analysis then he'll see inputs that you never saw in practice that are still are causing [inaudible]. >>: Something we may be wanting to think about [inaudible] streaming them back to the mothership over time [inaudible] build up a bug symbolic sketch [inaudible] over time [inaudible] and maybe that'll help you find… >> Todd Austin: That's something I'm definitely moving towards here. Okay. Good thank you very much. Excellent feedback, excellent feedback, that's why I'm here. All right. How much does it cost? The hardware for this isn't that expensive so we looked at the cost of 256 for 1k caches on, relative to the size of AMD phenom or and UltraSPARC, doesn't really hit cycle time that much, and fairly low in terms of percentage in terms of area cost. I was in Intel Israel two weeks ago telling them about this technology. You know Intel; all of the action is in Israel. That's where they do the processors now. And you know I was trying to push this technology and somebody pulled me aside and said, you know, we're building this already. For software-based transaction [inaudible] so there may be, there may be some synergies between that and this that will allow technologies like this to be rolled out [inaudible] pretty excited about it, that possibility. Let me talk to you a little bit about future work on this. Just recently got a paper into CG-011, which is 100% software implementation of this technology. One of the challenges of seeing how well this works is you got to really roll it out and see how well it works. It's hard to do it with just Monte Carlo analysis. So what we built is a lamp stack that uses this technology. So we've got Lynx OS, we're connected into the lamp stack through analysis aware drivers that can mark data, and then this runs on top of the Zen hyper visor which has a sample cache implemented with shadow paging. So I just use a virtual memory system to track my data. And it's a little more cumbersome, because I have to throw away whole pages of data when I won a throwaway analysis, or pare down a page to a single value if it's my last one. My load controller will decide when to throw away data, when not to initiate new analyses to control my loads. And then when I want to do analyses, I shift over to QEMU demand driven analysis, so actually going to do interpretation under the kernel to figure out how to propagate that information through registers. The downside of this is, it’s about, you know, quite a bit more expensive and I'm going to need more runs to do the same amount of coverage. The advantage is eventually I'll be able to deploy this. Another thing that I'm very interested in engaging on and I'm currently starting work on this, is using the technology that we can harvest in the field to find unlikely feasible paths using static analysis. Because I think I can gain a lot of very good information about how the program is used and a lot of symbolic information to help you find feasible paths in the program. And then generally what I need to implement this technology, and generally what you need to implement a variety of vulnerability analysis, is you need efficient fine grain memory protection. What we really need is to abandon pages of 4K and I want to be able to mark bytes as code and data, and I want an efficient mechanism. So my student Joe Greathouse is at the end of his thesis is really working on coming up with efficient fine grain memory protection techniques. And there's just a variety of things you can implement. You can do garbage collection. You can do software-based transactional memory. You can do security analyses. You can do security attack prevention, varieties of things. So I think there's a huge benefit to try and revisit the virtual memory protection system and try and make it more fine grain. Yes? >>: Yes, there was a recent Intel workshop where the discussion was around whether Intel should support some kind of [inaudible] and it was sort of more broad [inaudible] work. But the bottom line is yeah, any interaction that you can have to give Intel reasons to do this, would be great. I mean I think that the fact that they sponsored the workshops means that basically some believe internally that they need to be looking at [inaudible]. But it's been a hard sell. I mean this same story was true 15 years ago, right? And they've heard it. It's not like they don't know that fine grain memory actually [inaudible] mechanisms are going to give you, give some [inaudible] software, but I think the economic argument around how it helps customers is still not there. >>: Well it is true that all the benefits were there 15 years ago but this is unfair and exaggerated [inaudible]. They no longer have anything better to do. [laughter] >>: But given that, they're still asking the question: who's going to buy this? I mean fundamentally, and in fact the discussion at this workshop was more on could we create a skew of our processor that we sold for more money and only targeted developers. In other words, because developers are the ones that really want this and your average customer doesn't see the value. So how much more would a developer pay, or would Microsoft pay… >> Todd Austin: You know, you know Mark… >>: He is not the Microsoft employee… >> Todd Austin: I know who he is. I know who is. >>: [inaudible multiple speakers] [laughter] >>: Just kidding. This is being recorded here, right? [laughter] >>: Who cares about Intel, right? [inaudible] >> Todd Austin: I know, I know, I know. So when I presented this to Microsoft, they, they felt uncomfortable. I don't think we put a security, something to help find security bugs in our processor, because that would just imply that we have a lot of security bugs on the platforms that we use our processors in. Which seems like a very, seems like an odd thing to say. But from a marketing standpoint, you know, people might fear a system that has to fix bugs, right? So my idea is you should just make it, what you, what you should do is say this is branded extra security, extra safety technologies. >>: I think that's the fundamental issue, is that if you could show a benefit to an individual customer, that's really good. And they’ll put it in, and they could go to market with that. But you what you're talking about are developer tools. So what makes the software better, but it's only indirect. It's not like if I pay more for my processor I don't get the benefit of this. Everyone gets it. >> Todd Austin: Here's the way you get the benefit of the customer: we've got great information on the likelihood of the path after the bug is found, the potential bug. And what you do is if the likelihood drops below a certain level, you just, you say, hey, here's a $50 off coupon if you hit send for your next Microsoft product. And then it becomes, it becomes a valuable [inaudible]. >>: So you're saying [inaudible multiple speakers]. >> Todd Austin: The problem with the attackers, they're just one, right? The attacker isn't the hive. They're just one person in the hive. So their probability of finding these are very, very [inaudible]. >>: [inaudible] talking about the software [inaudible] and what you think it could get down to if you really [inaudible]? >> Todd Austin: So I'm not going to go into details on this yet, but it's about three times as many runs to reach the same level of coverage. >>: So for me as someone running on top of this? Do I see a 2X slowdown or 5X slowdown? >> Todd Austin: All of our experiments are limited to 5% memory and 5% CPU. >>: Okay great. Thank you. >> Todd Austin: So it's just a shaving. >>: Great. Thank you. >>: Okay so over the last five, I don't know, seven years or so we've been receiving [inaudible] these low-level memory [inaudible] an increase in vulnerabilities like well [inaudible] problems now and so on and so forth and keeps evolving, so are your thoughts on applying techniques that [inaudible] ? >> Todd Austin: So anywhere--this work is valuable anywhere where you have data flow. You have invariance that you can either infer or exist. In my case they exist. Maybe you can infer those invariance. I know there's a lot of work in that domain. And overheads are a concern. But then I think these kind of sampling techniques can work at anything. And it's a dynamic analysis. >>: [inaudible] it is the same story and actually in particular the users have any [inaudible] they will go to the homepage they will go to the Gmail homepage. They're not going to explore the preferences [inaudible]. It is the same story. >>: So can you say anything about the relationship between [inaudible] and this kind of stuff? [inaudible] security specifically but just in a general way. So I guess the question could you implement much of what you are talking about the same mechanism… >> Todd Austin: Not 100% familiar with lifeguards. What is--give me just a… >>: So the high level take away is that it's a hardware channel that allows information about things that are happening, instructions, you know things like which addresses are being accessed etc., could be sent over to a separate core on the multi-core, and be handled there essentially. So it's, you know, a high-bandwidth way to collect data in the running processor… >> Todd Austin: So that would be a good mechanism to implement this kind of analysis, right? So in a sense all those policy map calls could go over to another, could go over to another [inaudible]. Okay. And then lastly, I'm really interested in adapting this and other analysis techniques. The core of the Testudo work is not really security. We've applied it to security, but it's really data flow analysis. I'm currently Joe my student is applying this to finding Reese bugs in parallel programs, which is another very good data flow analysis that's super heavyweight. Yes? >>: So let's bring this full circle. [inaudible] so what are your thoughts on moving this into the data center and trade the cc…? >> Todd Austin: I mean it's even more ideal, right? Because you have lots of systems running in many times your software, and in addition it's in your own domain so that you have more ability to provide information that's necessary to improve the analysis, reduce the cost of analysis. For example, I can provide a lot of type information, a lot of invariance that I've collected over time that I wouldn't push out to a customer because that information might be privileged or important and I don't release that intellectual property, but now since it's local I can do even better. So I see this in the data center as being even more powerful technology. All right to conclude, if you want to beat the hackers, find the bugs they love and fix them before they do. Get rid of those zero day exploits. I talked about three technologies, Metadata restores programmer’s intent, and we need a mechanism to manage this efficiently. Input balance checking is an example of a security vulnerability analysis that finds those bugs before you release software. The advantage of this particular technology is that it finds exploits without an active attack. The problem with these technologies is they're too expensive. They really slow down programs. So Testudo is a technology that can roll these, roll these analyses out into the field and use the customer base as a massively parallel system to find these bugs, in an uncoordinated random way. So thank you very much. By the way I want to tell you with the Testudo is. This is a Testudo. It's a Roman legion formation, where they take each of their shields and lock them together to form one single protection for the entire mass. So that's why we call it Testudo. >>: [inaudible] [laughter] >> Todd Austin: A yeah. Of course. This is, yeah, these are your device drivers and your hypervisor, which is unprotected. Thank you very much. >>: Yes, so I just have a question. So you, you're doing a lot of buffer over close and certain array out of bounds errors, but use after free is actually a major source of attack. In fact, it's one of the things [inaudible] even if you completely remove all the buffer overloads you still have a lot of vulnerabilities [inaudible]. >> Todd Austin: I mean that would fit great into this framework. >>: My question is, can you do anything about that? >> Todd Austin: That's a data flow analysis right there. The Metadata I need is whether storage is alive or not. Like some capability associated with the variable. Let's say I did it like we did it when I worked on SASE years ago. We generated capabilities for all heap storage. We attached to the pointers and, we only propagated those capabilities. And those capabilities were destroyed when the data was free. It's a data flow analysis that I could sample in the system to try and find where when I dereference is that particular capability is that mark still existing? That would fit into this style of analysis. Thank you very much. [applause]