22882 >> Weidong Cui: All right. It is my pleasure today to introduce Brendan Dolan-Gavitt, who is visiting us this summer from Georgia Tech, where he works with Vinke Lee [phonetic]. And Weidong Cui and myself have been extremely pleased to mentor him in what I hope you'll agree is a nice piece of systems hacking and the support of catching evil behavior on phones in the wild. Thank you. >> Brendan Dolan-Gavitt: Thanks. All right. So thank you for the excellent introduction. So, yes, we're going to be talking about monitoring untrusted modern applications with collective record and replay. So the setting for this is essentially this new emergence of a new kind of model for distributing and using applications where you have sort of an app store model. And so Windows Phone 7, iPhone, Android all have this sort of centralized marketplace where you can go and download applications and they're assumed to come from sort of known trusted providers. And even now some of the desktops are starting to get this 2 OS 10 line release with an app store that has many of the same features. And so one of the interesting points about these is they come as sort of complete packages. So in general if you have two people running the same application you can assume that it's going to have the same sort of code and data associated with it between two different people, which presents an opportunity for us, you'll see. So, of course ->>: [inaudible]. >> Brendan Dolan-Gavitt: Because if I've got Office 2010 and you've got Office 2010 64 bit, there's just a lot more variance between different people. >>: [inaudible]. >> Brendan Dolan-Gavitt: Right. But it tends to be ->>: [inaudible] and you don't. >> Brendan Dolan-Gavitt: It's true. It's not completely uniform. But it's much more uniform than it is on the desktop space right now. So, of course, with modern apps you get modern malware. And so, for example, on iPhone there is this case of it turned out that Pandora was sending off people's private data to their advertising networks. Also, there's been sort of an up surge in Android-based malware that can do things like recording conversations. The same sort of private data leakage and things like that. So all of this is certainly something that we'd like to try and take care of. So we want to protect people from this kind of mobile or modern malware. And the current sort of state of the art is this kind of gated model where you have a bunch of developers and they submit their applications to one central point. And that central position makes the decision whether or not you're going to allow it into the marketplace. Now, of course, there's a few problems with this. One is that this sort of review process is currently very manual. You actually have someone sit down and you click through the application and try and figure out if it's good or bad. And so, of course, doing this costs a lot of money. Another issue is that, of course, even if you are very, very careful in how you review these applications, eventually you're going to let something through that turns out not to be legitimate and turns out to be malicious. You get an angry pig instead of an angry bird. So because this sort of idea that trying to focus strictly on prevention is probably eventually going to fail and allow malware into the marketplace, we want some way of being able to monitor these actual applications in the wild. And there are other reasons aside from malicious code that you might want to monitor these applications in the wild. For example, you might want to see if they're crashing too often or you might want to be able to detect that it enables some behavior that's not necessarily malicious but the platform provider doesn't really want you to be able to do. Like turning on a wireless access point on the phone. So our options right now are sort of divided into static analysis, which is useful in general but it's kind of limited in the malicious case where the developer can apply sort of arbitrary obfuscations that can make it difficult for static analysis to get any sort of foothold. There's also dynamic analysis. And there you can do sort of do online monitoring where you do all of your security checks as the application runs. But, of course, these can be fairly resource constrained devices, and so the overhead of that can limit the sort of analysis that you can do at runtime. So you could also instead record what's going on with these applications at runtime and then essentially do off-line analysis later on. And so this is actually being done right now with platforms like Windows Phone recording certain amounts of information at runtime, for example, crash dumps and internal users at Microsoft who are dog fooding the OS actually are collecting information on battery life and things like that. So this is very useful information, but, of course, you have to sort of know in advance what you're looking for. And, again, it limits what you can actually look at later on. So it would be nice to be able to record everything, but, of course, this is somewhat expensive. So the idea behind recording everything is an interesting one, though, and so you should take a look at it a little more closely. So one of the traditional approaches for doing this is essentially called record and replay. This is where when you run a program you take all of these sort of nondeterministic input to the program, anything that's going to make it behave differently from run to run, this could be, for example, the date and time. It could be user input. The state of the random number generator and things like that. And then the idea is that if you reexecute that application later on, and feed it those same nondeterministic inputs you recorded earlier it should behave in the exact same way. But unfortunately this is actually still a little bit too expensive. So there was a paper at Exec a couple of years ago, I think, called Paranoid Android, where they essentially did this and recorded a few applications like the Web browser, the e-mail client and so on, and they found that this imposed essentially a 30 percent overhead on battery usage. So it ran out 30 percent faster. >>: At what level were they doing the logging? >> Brendan Dolan-Gavitt: System call interface. >>: I'm sorry? >> Brendan Dolan-Gavitt: System call interface. >>: Okay. >> Brendan Dolan-Gavitt: So this is the kind of thing that I'm already not exactly thrilled with the fact that my phone runs out of battery by the end of the day or after two days. I really don't want it to run out any faster. So this is kind of unattractive for users to be running all the time. So we'd really like to be able to reduce overhead here. So in an ideal world, or at least an ideal sort of pentoctogan world, we'd be able to get full execution traces of every execution of every app every time so then we could, for any analysis we could think of, we could sort of go back to these execution traces, reproduce them exactly, and record any information we wanted do any sort of heavy weight analysis and so on. So what we really want to do is find some way to get the benefits of this kind of recording but without this extra overhead. So -- yes? >>: So the previous works like clone cloud works [inaudible] Android I guess their goal is to do analysis in the emulator of the virtual machinery of the call so that before that thing happens on your machine, your device, the clubs can help you make a decision so that the consumer, whatever, the people who are providing this, the trace, execution trace, can also be protected. But it looks like in your case, I'm providing trace to your machine or your analysis, but I won't get protected, right? Because you do off-line analysis later on. So if my data is lost then ->> Brendan Dolan-Gavitt: It can be seen as kind of complementary, right? Well, being able to do this sort of proactively in the cloud is great. We feel unless you actually can see in the wild behavior actual users interacting with these applications you're not going to be able to expose the range of behaviors that are seen in the wild either. And there are a few examples later on. But you can consider something like something where it's a puzzle game and once the user solves the puzzle, it unlocks this malicious functionality and starts sending the data away. In that case, you know, an automated analysis up in the cloud might not be able to solve that puzzle. Might be sort of too many paths in the program to explore. But a user just playing around on their phone at the bus stop is going to expose that malicious functionality and if we can record the sequence of inputs that led to there, then we can actually have a chance at seeing that malicious behavior, detecting it and pulling the app. >>: Another thing is let me add a little bit. Here's a crowd sourcing, have a current AV model. If there are one million users running this, there's ten guys that compromised it first. Then the signal is sent back to the server and the feedback loop is fast enough, then the one millionth man will still be protected. We're not trying to protect one million people right at a point because we think that's not sustainable given all the overhead that people, all the performance limitation. But by leveraging this crowd, we can still protect the 900,000 people instead of -- one million people, which is 15 percent overhead, and 900,000 people with very low overhead. I think that's the way to look at the model to compare with this model. >>: And all I wanted to add is if it's [inaudible] so it will be -- so at that time the user might not have the right bandwidth. So that bandwidth will be very important, also like how the bandwidth is charging the money at that time, if you go to the right. So we consider those. So this I'm thinking will be another perspective as well. And also like if you need to do the analysis later on, like the bad app can be revoked from a licensed perspective. If the app can get revoked, that will damage the phone again. >> Brendan Dolan-Gavitt: Okay. I think he was first. >>: So you mentioned like as opposed to manual review, which is very expensive, the record and playback functionality would help you catch this kind of behavior once you have a sufficient set of host population or let's say even if it's out in the wild. My concern is you have to be able to identify what constitutes malicious behavior, means that you can pick it up in an automated fashion very easily. This assumes you know all the ways that an app can behave maliciously given that they're all written very differently. >> Brendan Dolan-Gavitt: So that's actually more of a benefit of record and replay, because it doesn't assume that you know the malicious behaviors beforehand. Rather, if you discover that some kind of behavior is malicious later, you can go back and look at your saved traces and re-examine them for evidence of this behavior. Right? So maybe you didn't know that invoking this API with these arguments would jail break the phone. But now you discover a month or so down the line, if it does, you can go back and look for applications that have been using that API that way earlier. So in some ways we're sort of trying to separate out this idea of what constitutes malicious behavior and what data we can give you that will allow you to detect that malicious behavior. >>: Follow-up with that, I understand the self-posting mechanism is great for this gives you a huge amount of trace information. Collecting trace information, as she said, RB chooses box and sending such a huge amount of data up can be very expensive in a caustic network. So are you limiting this to only free -- like Wi-Fi networks or something like that? >> Brendan Dolan-Gavitt: So our approach here is actually ->>: Storing data as well. >> Brendan Dolan-Gavitt: Our approach is actually going to be try to really minimize both the amount of data and the runtime overhead that's going to be recorded on any individual phone. We're hoping that won't be an issue. We'll also do the usual trick of taking the logs and only up loading them when you're on a Wi-Fi network, say so you don't eat into their 3G bandwidth. You had a question? >>: At somewhere are you going to talk about what happens if the malware writer is aware of the framework and tries to, for example, create polymorphic applications whereby essentially try to get so every individual user runs a different application, therefore makes it more difficult to merge this stuff? Because it seems like in the scenario that we almost described where you try to do this thing where maybe ten people get avoided we get an idea and help everyone else out. If you can't do the merge in the first place to say these are similar traces, it seems like ->> Brendan Dolan-Gavitt: Yeah, so for the current Windows Phone and other kind of models, it's not as much of an issue because things like managed code and Silverlight don't actually allow you to generate any code at runtime. So the code that you start with is the code you get and you can't do self-modification or things like that. Going forward ->>: [inaudible]. [MULTIPLE SPEAKERS] . >>: But you can use self-modifying as a way to detect this is a malicious behavior. So adding more to the app involvement I would imagine they would allow you to do a self-modifying ->>: [inaudible]. >>: Self modifying code. >>: Coding objection [phonetic]. >>: But going back to that point you're going to allow code ->>: [inaudible] use the code, yes. >>: You can write your own interpreter, right? You can write your own ->>: And people have. >>: So I think one of the things that we'll get to is the level at which we record things. We're not recording full instruction traces we'll be recording things like touch events. The kind of polymorphism you have to pull off to make that happen is we'll go out and give you a different UI, that's an excellent question about how people will react and we'll see how that happens. But I think we're going to -- it will be harder for them to react than in the traditional malware case where it's trivial to get polymorphic shellcode that does exactly the same thing but here's some transformation. >> Brendan Dolan-Gavitt: Good questions. >>: How do you collate these for applications? >>: We know it's from the same application because it will be signed by the same key, this is application through [phonetic]. >>: You're saying that can be signed ->>: I understand your question, correct me if I'm wrong, so you have the application but for different users. Like it looks at IE on the phone or something and it has like a different path that it takes dependent on that. So we always know it's data evil application ->>: [MULTIPLE SPEAKERS] . >>: Sorry? >>: You don't need code imaging for that. >>: That's correct. >>: We're treating it as an input. Code is something as input, then you take different behaviors. >>: Sure. But the issue is if depending on the explosion of the range it's taken that could make the analysis very easy. >>: Yes. >> Brendan Dolan-Gavitt: Okay. So getting on to sort of how we might want to think about reducing the overhead of this. One approach might be to sort of say that we're going to throw away this idea of having exact reproduction of instruction by instruction reproduction of a given execution trace. So given this sort of space of possible executions that are seen in the wild on clients, we can think about a sort of neighborhood around that, which is sort of the ones that we want to be able to reproduce on the server. And we certainly want that to include the set of things that are seen in the wild but it can also include some things that weren't actually seen in the wild but are still feasible that could have happened. And then, of course, this is within the space of total executions for an application which is a much larger space and much harder to explore. So our sort of key idea here is again we're going to take advantage of the fact that many clients are all going to be running the same applications in the same way, and we're going to try and distribute the logging work among them. So this idea is conceptually similar to this cooperative log isolation where essentially if you have lots of people running the same thing and you think the same things are going to happening to them, you can record some events with some probability on each one and then expectation over a large number of users, you're going to see all the events that happened. And so then by each person by uploading these small pieces we're going to try to recombine them later on the server. >>: So as a motivating example they use is like the malware that's triggers this malicious behavior at the very rare events. Only when you solve this puzzle. But if you do this probability probablistic logging then you maybe likely are to miss those events. >> Brendan Dolan-Gavitt: Right. So the point of describing the kind of puzzle scenario wasn't that a puzzle is so difficult that it's rarely going to be solved by any person, more that a puzzle is it's going to be difficult for an automated system to go through and solve. So if you have ->>: They're rare events, right? >> Brendan Dolan-Gavitt: So if you have something that's like two people -yeah. >>: Exactly. That's the trade off. >> Brendan Dolan-Gavitt: It is a trade-off. >>: So how long you want to wait. How many devices you are running, how often the event happens, put together you will decide. >>: So think back to the ideal world where you talked about privacy wasn't an issue and [inaudible] was an issue you record the entire execution of every phone everywhere in all apps. So now the question is you can trade off and back off. And you say I want .001 percent battery overhead no 3G network usage we can give you that. That's kind of what we're looking for. But the trade-off is you might miss some things are very ->>: So if you have a lot of people ->>: Yes. >>: Make that up. >>: Yes. So this is better for the Angry Birds of the world than for my grand app. >>: Yes. >> Brendan Dolan-Gavitt: Right. And the other thing is you can also think about doing different strategies for how you weight the probabilities. Right? You could say try and weight the probabilities so that events that you haven't seen before are going to be weighted more heavily than events you've seen very frequently based on historical information from the server. So you might be able to push down these kinds of profiles for how you want to report. So just to give a quick example. So say you've got some sort of application and it gets some private data about the user. And then gets where they are and what the current date and time are, and if they're in a particular place, say San Francisco, and it's Friday the 13th, then it's going to trigger this malicious behavior and send it out. So if we can record on say a bunch of different devices that we get Alice's private data, Bob's private data. We get GPS data that someone's in San Francisco, and we record the date and time from two different phones on two different days, then now we can recombine these and try and get replays of different scenarios, like Alice San Francisco, Friday the 6th, Bob, San Francisco, Friday the 6th, or maybe Alice San Francisco Friday the 13th, and that's when we're actually going to be able to see this malicious behavior in action. So that's sort of the high level goal. And, again, as we were just mentioning, the idea here is to be able to control these kinds of trade-offs. Where if you have some sort of target performance goals that you want to hit, we can now tune this logging and it's going to take you longer to be able to see the full range of behaviors or at least the full range of behaviors you're interested in, but at least you have that sort of knob that you can turn. >>: So going back to earlier, sorry, because of the way you want to replay none of these users actually were actually affected by this right here, because none of them actually saw all the three conditions being satisfied. >> Brendan Dolan-Gavitt: Right. Potentially. So in fact this could enable this kind of proactive detection. It's sort of random chance, but it's because we have this sort of bubble around actual in the wild executions that we're trying to reproduce. >>: I think it's important though that every execution is absolutely feasible, right? This is not like stag analysis for false positive. We see an evil execution we go to the app writer your app is evil here's the execution, here's why it's evil. >>: The nods, none of them may have actually seen this. >>: That's correct. >>: Yes. >> Brendan Dolan-Gavitt: Okay. And so we've been talking a lot about sort of different applications that you can now build on top of -- yeah. >>: So what if this malicious behavior depends on concurrency issues. For example, I have a dot violation and I assume you're not [inaudible]. >> Brendan Dolan-Gavitt: Right, at the moment that's a fairly tough problem that we haven't tackled essentially. It's the kind of thing that traditional record and replay is spent lots of time on. But it's not something that we've necessarily considered in the sort of collective case. >>: On the other hand, concurrency just explores -- just makes the space call execution space bigger. So with just more execution you still have a chance to reproduce concurrency issue. In something. >>: Yes. >> Brendan Dolan-Gavitt: And I guess, again, there's sort of a trade off here. Depending on how delicate the concurrency issue is, the less likely it is. So now we're authored that you're going to be able to successfully execute it on actually people's phones, right? So there's a trade-off there, too. So back to sort of applications. You know, we can again try and do this kind of detection and reconstruction and malicious behavior. We can try and do things like tinkle analysis to detect these kind of privacy leaks that got Pandora in trouble. We can start doing things like trying to expand that space of executions that we see on the server side by actually mutating the inputs to things that no user saw. You know, trying to, of course, keep them constrained to things that might realistically have come from the environment so you're not going to have February 31st, but you could have you might try other different dates and so on and try and expand this neighborhood around the in the wild execution and discover these kinds of hidden behaviors. So you could also be a little smarter than just pure fuzzing about that, you can actually try and solve for inputs that are going to get you into new parts of the program using something like symbolic execution, this would be mixed symbolic concrete because of course we're starting from an initial set of inputs that came from real users. And the hope is that if you have executions close to what users are seeing in the wild they'll be more relevant than ones that are just sort of pulled out of the ether. >>: I'm confused about the puzzle thing, suppose to solve a puzzle it's a simple puzzle. >>: Paparazzi. >>: Whatever, I don't know. Some puzzle, it's like 15 minutes where you move things. If you just collect a little piece from me and a little piece from him and I'm going to swipe right and you're going to swipe left. But no amount of putting that together will solve the puzzle. >> Brendan Dolan-Gavitt: Right. That's still something that we have to work on and address and figure out, you know, how we can put these pieces back together and something that's coherent. And maybe in some cases we won't be able to. >>: Have to know how to solve a puzzle. >>: Exactly. >>: As we described, we haven't done this yet. We have thought about and I know G2 has thought about [inaudible] sort of adaptively changing the amount of instrumentation we do depending on the application. So if it's a puzzle application and we know we need this much contiguous touch to get through it we can adaptively push that down. And we would see that because we'd see oh we're in this place of the set state of the puzzle we'll never get that. >>: That would adaptively feedback and know you'll collect longer sequence ->>: In principle. We haven't done that yet. >>: But also following on this puzzle thing. So the puzzle that the client has at that moment wouldn't, would not be the same as the puzzle that you pulled out when you do replay, right? So currently you're ignoring all these server inputs. >> Brendan Dolan-Gavitt: Not the case. So input from the network would also be considered in input potentially, right? So the idea here is I think it's similar to the question you may have coherency issues in that one user's input on this part of the program doesn't correspond to another user's input on this other part of the program. But things like you know pulling down the puzzle from the server would be recorded with some probability. >>: That's a lot of information, if you record a lot of things from the servers? >> Brendan Dolan-Gavitt: Again, eventually ->>: Mobile applications there they're in a cloud-based. >> Brendan Dolan-Gavitt: Again, potentially. But the hope it's going to be less than recording literally everything. And that you can still get enough without having too much of a performance impact, if you have a lot of users. They're in racking, cloud-based applications. So, okay, good example with that 15 puzzle. I guess this is ->>: [inaudible]. >> Brendan Dolan-Gavitt: Have you seen this before? [laughter]. >>: I have a plan. >>: He just walked by [inaudible]. >> Brendan Dolan-Gavitt: So you might have a 15 puzzle which is a bit of a misnomer here because it looked likes a nine puzzle on to me. If you solved it it's going to go turn your phone into an access point and that's going to violate all sorts of agreements with cell phone providers and we don't want that app to be able to do that. Now, this is the kind of thing we're hoping to be able to reconstruct. So, okay, these sort of, this sort of model where we want two probabilistically record and replay led us to some design principles. One is that recording needs to be as inexpensive as possible in terms of the time it takes to do the logging, the amount of space that the logs take up and the amount of extra battery that you're using up by turning on this logging. On the other hand, the replay side can be not quite arbitrarily expensive, but it can be much more expensive than recording. You're going to be recording on a dinky little phone and replaying potentially on a huge datacenter. So if there are trade-offs to be made where, oh, you know we have to sacrifice and make replaying a little bit more expensive but it's going to make our recording a lot smaller, then you should definitely take that trade-off. And finally, this goes to sort of what level, the level of abstraction the events you are going to record are, is you need the ability to take events recorded on one device and events recorded on another device and be able to recombine them in a meaningful way. Doing this at too low a level, for example, would make that very difficult. You know, if you have say MMAP write to this device region at this time on one device versus another device, that's going to be very hard to interpret and reconstruct in a meaningful way. >>: All this stuff [inaudible] form factor things like that. It may be that some phone has a display region where you touch it but another phone physically does not have that there. >> Brendan Dolan-Gavitt: Yeah. Or if, for example, even in this picture one of these phones has a keyboard. One does not. So that kind of input is something that you want to consider when doing your record and replay. >>: The only replay for devices for mobile? >>: Or you can either do that or you can try and move your recording to a level where those differences are abstracted away. You don't actually write a Silverlight application one for the Samsung focus, one for your HTC, one for your other manufacturer. The idea of these APIs is that they try and abstract away that, and that's the point of having OSs and APIs, right? So essentially gets to a question of where do we want to instrument? So we're looking specifically at Windows phone here. And the stack, the software stack is actually pretty deep. So starting down at the bottom we have the actual physical hardware, and then the phone OS runs on top of that. That's essentially windows CE. Not essentially, it actually is. And then on top of that you have your native libraries. You know, these are the classic WIN 32 libraries that you're familiar with. But versions that are made for CE and run on ARM. A level above that now and sort of still keeping the focus on sort of managed applications, we've got the runtime libraries for .NET and Silverlight and XNA, which have sort of, split into two pieces they have a managed side which is what applications respond to, and a lot of performance because they need access to the OS large portions of them are implemented in native code. You have this sort of split personality for each one of them. And finally way up at the top you've got the actual third party applications which are the ones we're trying to monitor. So there are trade-offs at each level. As I mentioned if you start down at the hardware and try to do recording there you can imagine something like taking the approach that's been taken in the security world where you do record and replay by throwing a VM at it and doing all your recording at the layer of the VM. You just record all inputs and outputs to the CPU and you virtualize the devices and you can replay at that level. The problem, of course, is that this is going to end up being fairly device-specific and very hard to compare events that happen between different phones. Going up it, between the OS and native libraries, this is sort of what the paranoid Android case was where they did the recording at the system call level. We just sort of what want to avoid that while it's a small nice well defined interface with clear separation between state and kernel user mode, it's not guaranteed to be stable. The one in Windows NT can change between releases. So Windows CE and so on. And the way that this has gotten around is you have these user level libraries that everyone is supposed to be using that define the stable WIN 32 API. So you could also imagine trying to instrument between the managed code layer and the native code layer. This, again, is kind of problematic because again it's undocumented. It's not exposed to developers. It's not guaranteed to be stable. And you also get some issues because here you're really mixing managed code objects, native code objects, and it can be kind of unclear how you should reconstruct these objects when you're doing replay. So now let's talk about sort of two more viable possibilities. So one that sort of seems really intuitively promising is sort of moving to the highest semantic level possible, just below the application, where now the application's interactions with the .NET runtime and .NET runtime libraries are going to be recorded and replayed. And this seemed really cool. So we actually went ahead and took CCI metadata, which is a very nice project that allows you to essentially disassemble, make modifications to and rewrite .NET assemblies. And we instrumented the .NET standard library to do record and replay. Things did not quite go as we planned. The biggest problem that we encountered was that objects that are passed between managed the standard library and the application end up being sort of shared state between the two layers. And you want a more clear separation between these two. Because if you have an object that's passed back in the application you now also have to start reporting not just the one object you passed back but also potentially other objects that it referenced. So you have something that's intuitively simple like a touch event at XY, right? You might think that this is sort of all you need to capture the semantics of that particular event. But when it actually reaches the application events in .NET have, tell the receiver about who the sender was. So now the sender is the phone application frame. And the phone application frame might have references to other UI objects. The buttons, the text boxes and so on that exist inside it. And so the application by following references from the object that was given back to it you know may end up accessing a fair bit of the application's state. Another issue is that these touch events may not be pure managed code. They may also have large native portions that are attached to them that you would also potentially have to reproduce. >>: So I am totally confused about why you have to log that entire graph. Are you using it -- because it seems to me the system where you say let's say you were just going to instrument the whole execution run we're not doing this whole thing, just a single execution, a single program, it seems like all you would have to log is just the touch next to Y and the rest of the graph would be induced by the fact that you know what the program's data is, right? >> Brendan Dolan-Gavitt: Right. So when I'm talking about a touch at XY here I'm not talking about the sort of, the theoretical event, but the actual .NET object that is given to the application. And that .NET object, when it's given to the application, contains these references to other objects. >>: I see. So this is essentially, if I heard it right it's essentially a serialization that you're trying to record the event but it turns out this event has some nasty object graph and is a little difficult to graph. >> Brendan Dolan-Gavitt: Right. >>: And, in fact, as you will see, because we knew it should be XY, it should be able to allow us to touch it or induce it because of XY. >> Brendan Dolan-Gavitt: This led us to move down ->>: Instrumented every store to that object or everywhere from that object? >> Brendan Dolan-Gavitt: That was kind of the path that we were going down, because, again, if you're replaying this from a log and the application may want some data that you get to from that object, you have to be able to provide that data. So this led us to move down a layer. And essentially instrument at the second of the two stable APIs in the system, which is the WIN 32 layer. >>: This is [inaudible]. >> Brendan Dolan-Gavitt: Yes. >>: Close second? >> Brendan Dolan-Gavitt: This is specific to Windows phone. >>: [inaudible] instrument ->> Brendan Dolan-Gavitt: Right. I mean, for example on Linux the system call API is stable. They may add later ones, but the APIs you have is call layer is in fact stable. Show this is specific to Windows Phone. But in general you can expect that if the platform provider cares at all about backwards compatibility, that there's going to be some stable API. Okay. So with this approach, the benefit is that as compared to going up to managed code, the interface is a bit cleaner. It's a CAPI, and the amount of shared state between the two layers is much smaller. There's still some and so you have to reproduce side effects and things like that in some cases. But it's, in general, much simpler. So when we're down at this level I should mention some previous work, which is R2, which was a system by Goed, et al, at OSDI 2008. And they're from MSR Asia. It's very good work. And but there's some different differentiators between the design of the two systems that reflect the fact that they have different goals that we sort of want to move to this collective scenario. So R2 is sort of pure application. You tell it the application and the interfaces you want it to interpose on and it gives you a record-and-replay library for that. Whereas we want to really have this be sort of generic, we instrument the whole platform and then as you add new applications on top of it, they're going to be recorded and replayed and supported. They also are very concerned with exact replay, because they're trying to reproduce these kinds of concurrency related issues, difficult bugs, whereas we're more concerned with monitoring and trying to get sort of this kind of approximate replay we talked about. And finally their recordings are very instant specific. They include details like the exact address given back by the memory allocator and trying to make sure those stay stable across record and replay and so on. And whereas we're trying to really go for comparable recordings. So for current progress, so this summer we focused really on implementing sort of traditional record and replay for Windows Phone 7. This turned out to be a fair amount of engineering challenge, because this is a new platform and so on. And, of course, because we built one design and threw it away. So we haven't yet extended it to the collective case but we hope to be able to continue work on this project and actually move forward with this. So our next steps are to actually dog food this on real phones. Maybe give them out to some people at MSR. Support some more complex applications. I'll show you a demo of one of the things we have right now in a second. And extend this to the sort of collective scenario and, of course, run experiments to verify that all this does actually work. So getting to the details of the implementation. Essentially the way we do it is you have an application and it makes them call into our native library. At this point we've placed hooks in the native library, and so it's going to redirect into our library, at which point we can record whatever we need to, return control to that native library, and pass it back to the application so that we can record whatever information needs to be recorded. Once we get to replay, when the application calls into the native library, again it's going to redirect to our code, but now instead of invoking the native library again to get the results, we're going to read the results from the log file and return it back to the application. So as for how we actually do that, we make use of detours to do this actual hooking. And I say make use of, but in fact we had to port detours over to Windows CE. And there were some interesting changes that had to be made because CE is a different beast than Windows MT. So, for example, in Windows NT you when have a library that's shared across multiple applications, and someone goes and makes a change to that library, say by inserting a hook. Windows NT is kind enough to give you copy on write and give you copies of that library in different process address spaces. In Windows CE, when someone makes a write, that write is suddenly visible across all processes across the system. This is because they don't actually want you to make changes in user libraries we had to modify the kernel to allow us to do that. So it's understandable that this doesn't work. But we needed it to work. So, of course, now if we have this hook and it tries to call into our code and our code isn't in the other process, we're going to get a crash. So essentially we had to work around this by making sure that whenever some library we were going to hook was loaded into a process, that our recording library was also loaded into every process and then it would decide dynamically depending on what process it was whether or not it was going to be recording or replaying that particular process. So stop briefly here for a demo. And that looks like me. Also looks like I have timed out. But -- that's exciting. Caps lock on. That's the problem. >>: Are you in caps lock. >> Brendan Dolan-Gavitt: No, when you turn cap locks on that happens. So.... it's not letting me in. All right. Yeah, I don't know. Let me try it with just starting from you -- wonder if they realized I was leaving next week and just preemptively -- all right. So okay. So what we're going to do here is take a third-party application, in this case Draw Free, do some interactions with it and record them and then try and replay them later on. So I'm going to go in here. Small application to allow you to record and replay. Record and replay it turns out also works on the settings application. And the last thing I did before this was set it to replay mode so it replayed me setting it to replay mode. The system works a little too well. So anyway it's now set to record mode. And so we're going to start up Draw Free. And so once it's loaded we're going to take it, make a little drawing. You can all stand in awe at my artistic abilities. >>: Solve this by ->> Brendan Dolan-Gavitt: Yeah. This is what you're getting for your money. >>: Based on arbitrary third-party application. >> Brendan Dolan-Gavitt: We downloaded this from the marketplace last night. >>: It's not a special filtering done. >> Brendan Dolan-Gavitt: Nothing up my sleeve. >>: [inaudible]. >> Brendan Dolan-Gavitt: Yep. So now setting it back to replay. Going to go ahead and start the application up again. And if all goes according to plan, we should see it draw a smiley face. >>: Very nice. >> Brendan Dolan-Gavitt: All right. >>: Another one. >> Brendan Dolan-Gavitt: Yes, we can also -- we recently got recording and replaying of the HTP layer working. So Weidong managed to log in better than I can. >>: Have you tried this on a camera? >> Brendan Dolan-Gavitt: I haven't tried it on a camera. That's a very good question. We ought to be able to replay the same kind of level of abstraction because it's presumably a native library in there that's giving you access to the camera. But we haven't looked at it yet. Hmmm, that raises a whole raft of privacy issues, doesn't it? [laughter] all right. So starting up this and so right now it's in neither mode. Hopefully I can actually go back. And so when we start this little toy app, what this did is it went and made an http request out to a server. And pulled back the current date and time. >>: [inaudible] packet. >> Brendan Dolan-Gavitt: Yes, right. So our record and replay system is not so good that it can record events in the future. Future work. >>: Right. >> Brendan Dolan-Gavitt: So now we can go and set it over to record mode. And you should notice that the time is different, right? So ->>: [inaudible]. >> Brendan Dolan-Gavitt: So it's now 14:49 and 27 seconds. So now we'll set this back to replay mode. And 14:49 and 27 seconds. All right. So that's the demos for today. And can go back to ->>: What happened with the switch applications between one and another, does that work? >> Brendan Dolan-Gavitt: That doesn't work now essentially because when it tunes one and comes back to the other it detects task codes launching again and assumes that it's a new application being recorded and replayed. So that's the kind of thing that we would have to essentially build in some support for this kind of save and restore. All right. So to finish up, the sort of overall goal of this project is really to provide this kind of usefulness of full trace recording but be able to minimize the overhead there. So right now we've got an implementation of traditional record and replay for Windows Phone 7 along the way we learned some interesting things that were somewhat unexpected. For example, this idea of doing recording up at the application level doesn't seem to work out very well with at least with Windows Phone 7 and the .NET framework. And the other is that depending on our design goals, it's going to really influence where we want to do our record and replay. And so that for this application we ended up settling on the native interface, but we had to go through a few other designs before we got there. So that's it for me, and I'll take any more questions that we've got. [applause]. >> Brendan Dolan-Gavitt: Yeah. >>: So the problems you ran into by trying to do the application mode, because potentially there's these paths to arbitrary knowledge, and so you felt that you needed to preserve all of those just in case. Did you look at any -- did people tend to do that or maybe it's the case that all you really needed was the touch XY [inaudible]. >> Brendan Dolan-Gavitt: So this actually kind of gets to a question that was asked earlier, which is what do you do if the applications you're trying to monitor are aware they might be monitored. So even if in the benign case you can assume that maybe no one's really going to go and look at that deep part of the object graph, if it's not there, then a malicious application could use that as a way of detecting it's being monitored and disable its malicious functionality. Yes? >>: So possibly another way of producing the amount of recording that you can do is just to look at the program usage and then identify what are the input that's hard to reverse that just [inaudible] and that smart causing the usage ->> Brendan Dolan-Gavitt: That's the sort of preprocess it with some sort of static analysis to really figure out what the bits you want to record are. So that's interesting. >>: Are there native functions that are deterministic so you don't even have to ->>: Another way to reduce the recording overhead is to say these native libraries are deterministic. >> Brendan Dolan-Gavitt: We actually ran into that when we were doing the HTP recording replaying. There's one function Internet crack URL that takes in a URL and splits it up into its different portions. And while it was used by the application, and we had sort of tagged it as something that we might have to support, we ended up not having to. At the moment, you know, the only way we have of doing it is manually, but it's possible there's some automated process that might be able to identify the ones that are deterministic. Another question? Yep? >>: So still a little bit confused about why you would have to record that entire autograph at all. Seems that the object graph for at any given point for touch event -- the object graph that that event will touch will be induced by whatever the program state is. So, in other words, it seems you've got your server over here, currently understanding what you're doing. You've got the server over here. You're essentially collecting all those logs from the wild, and you're going to synthesize a couple of state traces, basically feed those state traces to this real running version of the app on the server. So that server side app boots up. It boots up with an object graph. Then you happen to touch event, and whatever happens to be the state of that server application at that point, well, those will end up being the object graphs that it will touch. So why do you have to record that if you want to be consistent? >> Brendan Dolan-Gavitt: If you have some extra information about when the touch, who the sender is then you could try to go through like at runtime and say this object that's already in the application memory is the one that should be the logical sender. This also does mean that you need some way of identifying objects the same one over time. Which isn't completely trivial. >>: But what you said is actually what we are doing. If we do it at the native layer all you need to do is set it as attack event XY, and Silverlight and application will construct the object graph. But if we see on top of Silverlight, then we don't see the attack event XY anymore. What we see is a big object passing from Silverlight to the application. And the object and graph, for instance, without analyzing the application we don't know what references application would use. Nor how to give everything. >>: I see. So is there any way, is there [inaudible] I don't know this very well. Is there a way to generate let's say a fake touch event whereby all you would specify is the X and Y? Because that seems if that exists that ->>: We could. There is a sort of a [inaudible] test framework. [inaudible] however touch is just one type of event. What we were trying to do was original design engineering object logging layer for Silverlight. So while you're exactly right touch we can't actually sort of imagine sort of a better way of doing it. That would be a special case. And we found that, hey, we would have all these special cases, we don't want to deal with all these special cases. It looks like it would be just easier to move down to a level really generic solution that wouldn't run into this problem. >>: We don't want to put our ->>: [inaudible] solutions but that's the trade off. >>: Silverlight would be specific to Windows as well, right? So... >>: Have you considered covered channels? Are they nonessential. >> Brendan Dolan-Gavitt: That's not something that we have considered. So it's a good question, though. If you could have -- I'm trying to think what covert channel in this space would look like. So now you're recording and the covert channel would be something where maybe based on -- actually, I'm not seeing it. What sort of covert channel were you thinking of? >>: Go through one of four different URLs for the same site depending on the two bits you want to ship now. >>: He's talking about information leakage and using by covert channel [inaudible]. >> Brendan Dolan-Gavitt: Oh, okay. I see what you mean. >>: That's one of the challenges for our applications. That's them to worry about it. >>: You can just delay your request and delayed can contend with whatever info ->> Brendan Dolan-Gavitt: That sort of gets more on the side of detecting malicious behavior after the fact. So if you want to think about it like this, we're sort of assuming we have malicious behavior oracles that someone else is going to apply to the data that we collect. >>: You mentioned the privacy concern, do you have something else going on? >> Brendan Dolan-Gavitt: Yeah. So in somewhat -- in some ways it's mitigated in that by recording only little bits from individual people, it's better than doing full trace recording. That may not be something that users actually buy. So there are a few things. One is maybe make it something that they can opt into, but then, of course, this can hurt our coverage because we do need lots of people to opt in. The other is there has been some work on essentially trying to mask the kind of inputs in such a way that they still produce the same effect on the program but they're not the original input. Right? So there's some work on ->>: Program encryption. >> Brendan Dolan-Gavitt: So perhaps the program execution is in itself the private information potentially. >>: Password, we don't save it. The credit card number we don't save it. Credit card number let's just ignore it. >>: That plays into our trade-offs saying here's some abstraction of the actual execution to protect people's privacy [inaudible]. >>: So let him answer the question about the camera as I said earlier. I imagine you would have something along the lines of the user choosing what type of events they'd be willing to record, for example, [inaudible] click events but not necessarily the camera recording. So maybe if you had a broad spectrum use [inaudible] so say more paranoid people. >> Brendan Dolan-Gavitt: That is a good idea. There's always a danger of giving people lots of configuration options because they may decide it's too much trouble to actually set them up. >>: Especially when there's no immediate benefit to them. >>: Yeah. >> Brendan Dolan-Gavitt: Right. It's more of a -- it's sort of a social benefit, right? So on the other hand, there is what's the application for Android that you were mentioning. >>: Look Out. >> Brendan Dolan-Gavitt: That apparently again it's security software that runs all the time and has had many users actually download and start using it. Some people do opt into it. >>: Lookout. An antivirus -- so it scans all the apps running. It does the full functionality and sort of checks to see if [inaudible]. >>: Send the information back. >>: [inaudible]. >> Brendan Dolan-Gavitt: Yeah. So you know people aren't totally [inaudible] anyway, thank you. >> Weidong Cui: Thank you. [applause]