22882 >> Weidong Cui: All right. It is my... Dolan-Gavitt, who is visiting us this summer from Georgia Tech,...

advertisement
22882
>> Weidong Cui: All right. It is my pleasure today to introduce Brendan
Dolan-Gavitt, who is visiting us this summer from Georgia Tech, where he works
with Vinke Lee [phonetic]. And Weidong Cui and myself have been extremely
pleased to mentor him in what I hope you'll agree is a nice piece of systems
hacking and the support of catching evil behavior on phones in the wild. Thank
you.
>> Brendan Dolan-Gavitt: Thanks. All right. So thank you for the excellent
introduction. So, yes, we're going to be talking about monitoring untrusted
modern applications with collective record and replay.
So the setting for this is essentially this new emergence of a new kind of model
for distributing and using applications where you have sort of an app store model.
And so Windows Phone 7, iPhone, Android all have this sort of centralized
marketplace where you can go and download applications and they're assumed
to come from sort of known trusted providers.
And even now some of the desktops are starting to get this 2 OS 10 line release
with an app store that has many of the same features.
And so one of the interesting points about these is they come as sort of complete
packages. So in general if you have two people running the same application
you can assume that it's going to have the same sort of code and data
associated with it between two different people, which presents an opportunity for
us, you'll see.
So, of course ->>: [inaudible].
>> Brendan Dolan-Gavitt: Because if I've got Office 2010 and you've got Office
2010 64 bit, there's just a lot more variance between different people.
>>: [inaudible].
>> Brendan Dolan-Gavitt: Right. But it tends to be ->>: [inaudible] and you don't.
>> Brendan Dolan-Gavitt: It's true. It's not completely uniform. But it's much
more uniform than it is on the desktop space right now. So, of course, with
modern apps you get modern malware. And so, for example, on iPhone there is
this case of it turned out that Pandora was sending off people's private data to
their advertising networks. Also, there's been sort of an up surge in
Android-based malware that can do things like recording conversations. The
same sort of private data leakage and things like that.
So all of this is certainly something that we'd like to try and take care of. So we
want to protect people from this kind of mobile or modern malware.
And the current sort of state of the art is this kind of gated model where you have
a bunch of developers and they submit their applications to one central point.
And that central position makes the decision whether or not you're going to allow
it into the marketplace.
Now, of course, there's a few problems with this. One is that this sort of review
process is currently very manual. You actually have someone sit down and you
click through the application and try and figure out if it's good or bad. And so, of
course, doing this costs a lot of money.
Another issue is that, of course, even if you are very, very careful in how you
review these applications, eventually you're going to let something through that
turns out not to be legitimate and turns out to be malicious.
You get an angry pig instead of an angry bird. So because this sort of idea that
trying to focus strictly on prevention is probably eventually going to fail and allow
malware into the marketplace, we want some way of being able to monitor these
actual applications in the wild. And there are other reasons aside from malicious
code that you might want to monitor these applications in the wild.
For example, you might want to see if they're crashing too often or you might
want to be able to detect that it enables some behavior that's not necessarily
malicious but the platform provider doesn't really want you to be able to do. Like
turning on a wireless access point on the phone.
So our options right now are sort of divided into static analysis, which is useful in
general but it's kind of limited in the malicious case where the developer can
apply sort of arbitrary obfuscations that can make it difficult for static analysis to
get any sort of foothold. There's also dynamic analysis. And there you can do
sort of do online monitoring where you do all of your security checks as the
application runs.
But, of course, these can be fairly resource constrained devices, and so the
overhead of that can limit the sort of analysis that you can do at runtime.
So you could also instead record what's going on with these applications at
runtime and then essentially do off-line analysis later on.
And so this is actually being done right now with platforms like Windows Phone
recording certain amounts of information at runtime, for example, crash dumps
and internal users at Microsoft who are dog fooding the OS actually are collecting
information on battery life and things like that.
So this is very useful information, but, of course, you have to sort of know in
advance what you're looking for. And, again, it limits what you can actually look
at later on.
So it would be nice to be able to record everything, but, of course, this is
somewhat expensive. So the idea behind recording everything is an interesting
one, though, and so you should take a look at it a little more closely.
So one of the traditional approaches for doing this is essentially called record and
replay. This is where when you run a program you take all of these sort of
nondeterministic input to the program, anything that's going to make it behave
differently from run to run, this could be, for example, the date and time. It could
be user input.
The state of the random number generator and things like that. And then the
idea is that if you reexecute that application later on, and feed it those same
nondeterministic inputs you recorded earlier it should behave in the exact same
way. But unfortunately this is actually still a little bit too expensive. So there was
a paper at Exec a couple of years ago, I think, called Paranoid Android, where
they essentially did this and recorded a few applications like the Web browser,
the e-mail client and so on, and they found that this imposed essentially a
30 percent overhead on battery usage. So it ran out 30 percent faster.
>>: At what level were they doing the logging?
>> Brendan Dolan-Gavitt: System call interface.
>>: I'm sorry?
>> Brendan Dolan-Gavitt: System call interface.
>>: Okay.
>> Brendan Dolan-Gavitt: So this is the kind of thing that I'm already not exactly
thrilled with the fact that my phone runs out of battery by the end of the day or
after two days. I really don't want it to run out any faster. So this is kind of
unattractive for users to be running all the time.
So we'd really like to be able to reduce overhead here. So in an ideal world, or at
least an ideal sort of pentoctogan world, we'd be able to get full execution traces
of every execution of every app every time so then we could, for any analysis we
could think of, we could sort of go back to these execution traces, reproduce
them exactly, and record any information we wanted do any sort of heavy weight
analysis and so on.
So what we really want to do is find some way to get the benefits of this kind of
recording but without this extra overhead. So -- yes?
>>: So the previous works like clone cloud works [inaudible] Android I guess their
goal is to do analysis in the emulator of the virtual machinery of the call so that
before that thing happens on your machine, your device, the clubs can help you
make a decision so that the consumer, whatever, the people who are providing
this, the trace, execution trace, can also be protected. But it looks like in your
case, I'm providing trace to your machine or your analysis, but I won't get
protected, right? Because you do off-line analysis later on. So if my data is lost
then ->> Brendan Dolan-Gavitt: It can be seen as kind of complementary, right? Well,
being able to do this sort of proactively in the cloud is great. We feel unless you
actually can see in the wild behavior actual users interacting with these
applications you're not going to be able to expose the range of behaviors that are
seen in the wild either.
And there are a few examples later on. But you can consider something like
something where it's a puzzle game and once the user solves the puzzle, it
unlocks this malicious functionality and starts sending the data away.
In that case, you know, an automated analysis up in the cloud might not be able
to solve that puzzle. Might be sort of too many paths in the program to explore.
But a user just playing around on their phone at the bus stop is going to expose
that malicious functionality and if we can record the sequence of inputs that led to
there, then we can actually have a chance at seeing that malicious behavior,
detecting it and pulling the app.
>>: Another thing is let me add a little bit. Here's a crowd sourcing, have a
current AV model. If there are one million users running this, there's ten guys
that compromised it first. Then the signal is sent back to the server and the
feedback loop is fast enough, then the one millionth man will still be protected.
We're not trying to protect one million people right at a point because we think
that's not sustainable given all the overhead that people, all the performance
limitation. But by leveraging this crowd, we can still protect the 900,000 people
instead of -- one million people, which is 15 percent overhead, and 900,000
people with very low overhead. I think that's the way to look at the model to
compare with this model.
>>: And all I wanted to add is if it's [inaudible] so it will be -- so at that time the
user might not have the right bandwidth. So that bandwidth will be very
important, also like how the bandwidth is charging the money at that time, if you
go to the right. So we consider those. So this I'm thinking will be another
perspective as well. And also like if you need to do the analysis later on, like the
bad app can be revoked from a licensed perspective. If the app can get revoked,
that will damage the phone again.
>> Brendan Dolan-Gavitt: Okay. I think he was first.
>>: So you mentioned like as opposed to manual review, which is very
expensive, the record and playback functionality would help you catch this kind of
behavior once you have a sufficient set of host population or let's say even if it's
out in the wild. My concern is you have to be able to identify what constitutes
malicious behavior, means that you can pick it up in an automated fashion very
easily. This assumes you know all the ways that an app can behave maliciously
given that they're all written very differently.
>> Brendan Dolan-Gavitt: So that's actually more of a benefit of record and
replay, because it doesn't assume that you know the malicious behaviors
beforehand. Rather, if you discover that some kind of behavior is malicious later,
you can go back and look at your saved traces and re-examine them for
evidence of this behavior. Right? So maybe you didn't know that invoking this
API with these arguments would jail break the phone. But now you discover a
month or so down the line, if it does, you can go back and look for applications
that have been using that API that way earlier.
So in some ways we're sort of trying to separate out this idea of what constitutes
malicious behavior and what data we can give you that will allow you to detect
that malicious behavior.
>>: Follow-up with that, I understand the self-posting mechanism is great for this
gives you a huge amount of trace information. Collecting trace information, as
she said, RB chooses box and sending such a huge amount of data up can be
very expensive in a caustic network.
So are you limiting this to only free -- like Wi-Fi networks or something like that?
>> Brendan Dolan-Gavitt: So our approach here is actually ->>: Storing data as well.
>> Brendan Dolan-Gavitt: Our approach is actually going to be try to really
minimize both the amount of data and the runtime overhead that's going to be
recorded on any individual phone. We're hoping that won't be an issue. We'll
also do the usual trick of taking the logs and only up loading them when you're on
a Wi-Fi network, say so you don't eat into their 3G bandwidth.
You had a question?
>>: At somewhere are you going to talk about what happens if the malware writer
is aware of the framework and tries to, for example, create polymorphic
applications whereby essentially try to get so every individual user runs a
different application, therefore makes it more difficult to merge this stuff?
Because it seems like in the scenario that we almost described where you try to
do this thing where maybe ten people get avoided we get an idea and help
everyone else out. If you can't do the merge in the first place to say these are
similar traces, it seems like ->> Brendan Dolan-Gavitt: Yeah, so for the current Windows Phone and other
kind of models, it's not as much of an issue because things like managed code
and Silverlight don't actually allow you to generate any code at runtime. So the
code that you start with is the code you get and you can't do self-modification or
things like that.
Going forward ->>: [inaudible].
[MULTIPLE SPEAKERS] .
>>: But you can use self-modifying as a way to detect this is a malicious
behavior. So adding more to the app involvement I would imagine they would
allow you to do a self-modifying ->>: [inaudible].
>>: Self modifying code.
>>: Coding objection [phonetic].
>>: But going back to that point you're going to allow code ->>: [inaudible] use the code, yes.
>>: You can write your own interpreter, right? You can write your own ->>: And people have.
>>: So I think one of the things that we'll get to is the level at which we record
things. We're not recording full instruction traces we'll be recording things like
touch events. The kind of polymorphism you have to pull off to make that
happen is we'll go out and give you a different UI, that's an excellent question
about how people will react and we'll see how that happens. But I think we're
going to -- it will be harder for them to react than in the traditional malware case
where it's trivial to get polymorphic shellcode that does exactly the same thing
but here's some transformation.
>> Brendan Dolan-Gavitt: Good questions.
>>: How do you collate these for applications?
>>: We know it's from the same application because it will be signed by the same
key, this is application through [phonetic].
>>: You're saying that can be signed ->>: I understand your question, correct me if I'm wrong, so you have the
application but for different users. Like it looks at IE on the phone or something
and it has like a different path that it takes dependent on that.
So we always know it's data evil application ->>:
[MULTIPLE SPEAKERS] .
>>: Sorry?
>>: You don't need code imaging for that.
>>: That's correct.
>>: We're treating it as an input. Code is something as input, then you take
different behaviors.
>>: Sure. But the issue is if depending on the explosion of the range it's taken
that could make the analysis very easy.
>>: Yes.
>> Brendan Dolan-Gavitt: Okay. So getting on to sort of how we might want to
think about reducing the overhead of this. One approach might be to sort of say
that we're going to throw away this idea of having exact reproduction of
instruction by instruction reproduction of a given execution trace. So given this
sort of space of possible executions that are seen in the wild on clients, we can
think about a sort of neighborhood around that, which is sort of the ones that we
want to be able to reproduce on the server. And we certainly want that to include
the set of things that are seen in the wild but it can also include some things that
weren't actually seen in the wild but are still feasible that could have happened.
And then, of course, this is within the space of total executions for an application
which is a much larger space and much harder to explore.
So our sort of key idea here is again we're going to take advantage of the fact
that many clients are all going to be running the same applications in the same
way, and we're going to try and distribute the logging work among them. So this
idea is conceptually similar to this cooperative log isolation where essentially if
you have lots of people running the same thing and you think the same things
are going to happening to them, you can record some events with some
probability on each one and then expectation over a large number of users,
you're going to see all the events that happened. And so then by each person by
uploading these small pieces we're going to try to recombine them later on the
server.
>>: So as a motivating example they use is like the malware that's triggers this
malicious behavior at the very rare events. Only when you solve this puzzle. But
if you do this probability probablistic logging then you maybe likely are to miss
those events.
>> Brendan Dolan-Gavitt: Right. So the point of describing the kind of puzzle
scenario wasn't that a puzzle is so difficult that it's rarely going to be solved by
any person, more that a puzzle is it's going to be difficult for an automated
system to go through and solve.
So if you have ->>: They're rare events, right?
>> Brendan Dolan-Gavitt: So if you have something that's like two people -yeah.
>>: Exactly. That's the trade off.
>> Brendan Dolan-Gavitt: It is a trade-off.
>>: So how long you want to wait. How many devices you are running, how
often the event happens, put together you will decide.
>>: So think back to the ideal world where you talked about privacy wasn't an
issue and [inaudible] was an issue you record the entire execution of every
phone everywhere in all apps. So now the question is you can trade off and back
off. And you say I want .001 percent battery overhead no 3G network usage we
can give you that. That's kind of what we're looking for. But the trade-off is you
might miss some things are very ->>: So if you have a lot of people ->>: Yes.
>>: Make that up.
>>: Yes. So this is better for the Angry Birds of the world than for my grand app.
>>: Yes.
>> Brendan Dolan-Gavitt: Right. And the other thing is you can also think about
doing different strategies for how you weight the probabilities. Right? You could
say try and weight the probabilities so that events that you haven't seen before
are going to be weighted more heavily than events you've seen very frequently
based on historical information from the server. So you might be able to push
down these kinds of profiles for how you want to report.
So just to give a quick example. So say you've got some sort of application and
it gets some private data about the user. And then gets where they are and what
the current date and time are, and if they're in a particular place, say San
Francisco, and it's Friday the 13th, then it's going to trigger this malicious
behavior and send it out.
So if we can record on say a bunch of different devices that we get Alice's private
data, Bob's private data. We get GPS data that someone's in San Francisco,
and we record the date and time from two different phones on two different days,
then now we can recombine these and try and get replays of different scenarios,
like Alice San Francisco, Friday the 6th, Bob, San Francisco, Friday the 6th, or
maybe Alice San Francisco Friday the 13th, and that's when we're actually going
to be able to see this malicious behavior in action.
So that's sort of the high level goal. And, again, as we were just mentioning, the
idea here is to be able to control these kinds of trade-offs. Where if you have
some sort of target performance goals that you want to hit, we can now tune this
logging and it's going to take you longer to be able to see the full range of
behaviors or at least the full range of behaviors you're interested in, but at least
you have that sort of knob that you can turn.
>>: So going back to earlier, sorry, because of the way you want to replay none
of these users actually were actually affected by this right here, because none of
them actually saw all the three conditions being satisfied.
>> Brendan Dolan-Gavitt: Right. Potentially. So in fact this could enable this
kind of proactive detection. It's sort of random chance, but it's because we have
this sort of bubble around actual in the wild executions that we're trying to
reproduce.
>>: I think it's important though that every execution is absolutely feasible, right?
This is not like stag analysis for false positive. We see an evil execution we go to
the app writer your app is evil here's the execution, here's why it's evil.
>>: The nods, none of them may have actually seen this.
>>: That's correct.
>>: Yes.
>> Brendan Dolan-Gavitt: Okay. And so we've been talking a lot about sort of
different applications that you can now build on top of -- yeah.
>>: So what if this malicious behavior depends on concurrency issues. For
example, I have a dot violation and I assume you're not [inaudible].
>> Brendan Dolan-Gavitt: Right, at the moment that's a fairly tough problem that
we haven't tackled essentially. It's the kind of thing that traditional record and
replay is spent lots of time on. But it's not something that we've necessarily
considered in the sort of collective case.
>>: On the other hand, concurrency just explores -- just makes the space call
execution space bigger. So with just more execution you still have a chance to
reproduce concurrency issue. In something.
>>: Yes.
>> Brendan Dolan-Gavitt: And I guess, again, there's sort of a trade off here.
Depending on how delicate the concurrency issue is, the less likely it is. So now
we're authored that you're going to be able to successfully execute it on actually
people's phones, right? So there's a trade-off there, too.
So back to sort of applications. You know, we can again try and do this kind of
detection and reconstruction and malicious behavior. We can try and do things
like tinkle analysis to detect these kind of privacy leaks that got Pandora in
trouble. We can start doing things like trying to expand that space of executions
that we see on the server side by actually mutating the inputs to things that no
user saw. You know, trying to, of course, keep them constrained to things that
might realistically have come from the environment so you're not going to have
February 31st, but you could have you might try other different dates and so on
and try and expand this neighborhood around the in the wild execution and
discover these kinds of hidden behaviors. So you could also be a little smarter
than just pure fuzzing about that, you can actually try and solve for inputs that are
going to get you into new parts of the program using something like symbolic
execution, this would be mixed symbolic concrete because of course we're
starting from an initial set of inputs that came from real users. And the hope is
that if you have executions close to what users are seeing in the wild they'll be
more relevant than ones that are just sort of pulled out of the ether.
>>: I'm confused about the puzzle thing, suppose to solve a puzzle it's a simple
puzzle.
>>: Paparazzi.
>>: Whatever, I don't know. Some puzzle, it's like 15 minutes where you move
things. If you just collect a little piece from me and a little piece from him and I'm
going to swipe right and you're going to swipe left. But no amount of putting that
together will solve the puzzle.
>> Brendan Dolan-Gavitt: Right. That's still something that we have to work on
and address and figure out, you know, how we can put these pieces back
together and something that's coherent. And maybe in some cases we won't be
able to.
>>: Have to know how to solve a puzzle.
>>: Exactly.
>>: As we described, we haven't done this yet. We have thought about and I
know G2 has thought about [inaudible] sort of adaptively changing the amount of
instrumentation we do depending on the application. So if it's a puzzle
application and we know we need this much contiguous touch to get through it
we can adaptively push that down. And we would see that because we'd see oh
we're in this place of the set state of the puzzle we'll never get that.
>>: That would adaptively feedback and know you'll collect longer sequence ->>: In principle. We haven't done that yet.
>>: But also following on this puzzle thing. So the puzzle that the client has at
that moment wouldn't, would not be the same as the puzzle that you pulled out
when you do replay, right? So currently you're ignoring all these server inputs.
>> Brendan Dolan-Gavitt: Not the case. So input from the network would also
be considered in input potentially, right? So the idea here is I think it's similar to
the question you may have coherency issues in that one user's input on this part
of the program doesn't correspond to another user's input on this other part of the
program.
But things like you know pulling down the puzzle from the server would be
recorded with some probability.
>>: That's a lot of information, if you record a lot of things from the servers?
>> Brendan Dolan-Gavitt: Again, eventually ->>: Mobile applications there they're in a cloud-based.
>> Brendan Dolan-Gavitt: Again, potentially. But the hope it's going to be less
than recording literally everything. And that you can still get enough without
having too much of a performance impact, if you have a lot of users. They're in
racking, cloud-based applications. So, okay, good example with that 15 puzzle.
I guess this is ->>: [inaudible].
>> Brendan Dolan-Gavitt: Have you seen this before? [laughter].
>>: I have a plan.
>>: He just walked by [inaudible].
>> Brendan Dolan-Gavitt: So you might have a 15 puzzle which is a bit of a
misnomer here because it looked likes a nine puzzle on to me. If you solved it
it's going to go turn your phone into an access point and that's going to violate all
sorts of agreements with cell phone providers and we don't want that app to be
able to do that.
Now, this is the kind of thing we're hoping to be able to reconstruct. So, okay,
these sort of, this sort of model where we want two probabilistically record and
replay led us to some design principles. One is that recording needs to be as
inexpensive as possible in terms of the time it takes to do the logging, the
amount of space that the logs take up and the amount of extra battery that you're
using up by turning on this logging.
On the other hand, the replay side can be not quite arbitrarily expensive, but it
can be much more expensive than recording. You're going to be recording on a
dinky little phone and replaying potentially on a huge datacenter. So if there are
trade-offs to be made where, oh, you know we have to sacrifice and make
replaying a little bit more expensive but it's going to make our recording a lot
smaller, then you should definitely take that trade-off.
And finally, this goes to sort of what level, the level of abstraction the events you
are going to record are, is you need the ability to take events recorded on one
device and events recorded on another device and be able to recombine them in
a meaningful way. Doing this at too low a level, for example, would make that
very difficult. You know, if you have say MMAP write to this device region at this
time on one device versus another device, that's going to be very hard to
interpret and reconstruct in a meaningful way.
>>: All this stuff [inaudible] form factor things like that. It may be that some
phone has a display region where you touch it but another phone physically does
not have that there.
>> Brendan Dolan-Gavitt: Yeah. Or if, for example, even in this picture one of
these phones has a keyboard. One does not. So that kind of input is something
that you want to consider when doing your record and replay.
>>: The only replay for devices for mobile?
>>: Or you can either do that or you can try and move your recording to a level
where those differences are abstracted away. You don't actually write a
Silverlight application one for the Samsung focus, one for your HTC, one for your
other manufacturer. The idea of these APIs is that they try and abstract away
that, and that's the point of having OSs and APIs, right?
So essentially gets to a question of where do we want to instrument? So we're
looking specifically at Windows phone here. And the stack, the software stack is
actually pretty deep. So starting down at the bottom we have the actual physical
hardware, and then the phone OS runs on top of that. That's essentially windows
CE. Not essentially, it actually is. And then on top of that you have your native
libraries. You know, these are the classic WIN 32 libraries that you're familiar
with. But versions that are made for CE and run on ARM.
A level above that now and sort of still keeping the focus on sort of managed
applications, we've got the runtime libraries for .NET and Silverlight and XNA,
which have sort of, split into two pieces they have a managed side which is what
applications respond to, and a lot of performance because they need access to
the OS large portions of them are implemented in native code. You have this
sort of split personality for each one of them. And finally way up at the top you've
got the actual third party applications which are the ones we're trying to monitor.
So there are trade-offs at each level. As I mentioned if you start down at the
hardware and try to do recording there you can imagine something like taking the
approach that's been taken in the security world where you do record and replay
by throwing a VM at it and doing all your recording at the layer of the VM. You
just record all inputs and outputs to the CPU and you virtualize the devices and
you can replay at that level. The problem, of course, is that this is going to end
up being fairly device-specific and very hard to compare events that happen
between different phones.
Going up it, between the OS and native libraries, this is sort of what the paranoid
Android case was where they did the recording at the system call level. We just
sort of what want to avoid that while it's a small nice well defined interface with
clear separation between state and kernel user mode, it's not guaranteed to be
stable. The one in Windows NT can change between releases. So Windows CE
and so on. And the way that this has gotten around is you have these user level
libraries that everyone is supposed to be using that define the stable WIN 32 API.
So you could also imagine trying to instrument between the managed code layer
and the native code layer. This, again, is kind of problematic because again it's
undocumented. It's not exposed to developers. It's not guaranteed to be stable.
And you also get some issues because here you're really mixing managed code
objects, native code objects, and it can be kind of unclear how you should
reconstruct these objects when you're doing replay.
So now let's talk about sort of two more viable possibilities. So one that sort of
seems really intuitively promising is sort of moving to the highest semantic level
possible, just below the application, where now the application's interactions with
the .NET runtime and .NET runtime libraries are going to be recorded and
replayed.
And this seemed really cool. So we actually went ahead and took CCI metadata,
which is a very nice project that allows you to essentially disassemble, make
modifications to and rewrite .NET assemblies. And we instrumented the .NET
standard library to do record and replay.
Things did not quite go as we planned. The biggest problem that we
encountered was that objects that are passed between managed the standard
library and the application end up being sort of shared state between the two
layers. And you want a more clear separation between these two. Because if
you have an object that's passed back in the application you now also have to
start reporting not just the one object you passed back but also potentially other
objects that it referenced.
So you have something that's intuitively simple like a touch event at XY, right?
You might think that this is sort of all you need to capture the semantics of that
particular event. But when it actually reaches the application events in .NET
have, tell the receiver about who the sender was. So now the sender is the
phone application frame. And the phone application frame might have
references to other UI objects. The buttons, the text boxes and so on that exist
inside it. And so the application by following references from the object that was
given back to it you know may end up accessing a fair bit of the application's
state. Another issue is that these touch events may not be pure managed code.
They may also have large native portions that are attached to them that you
would also potentially have to reproduce.
>>: So I am totally confused about why you have to log that entire graph. Are
you using it -- because it seems to me the system where you say let's say you
were just going to instrument the whole execution run we're not doing this whole
thing, just a single execution, a single program, it seems like all you would have
to log is just the touch next to Y and the rest of the graph would be induced by
the fact that you know what the program's data is, right?
>> Brendan Dolan-Gavitt: Right. So when I'm talking about a touch at XY here
I'm not talking about the sort of, the theoretical event, but the actual .NET object
that is given to the application. And that .NET object, when it's given to the
application, contains these references to other objects.
>>: I see. So this is essentially, if I heard it right it's essentially a serialization that
you're trying to record the event but it turns out this event has some nasty object
graph and is a little difficult to graph.
>> Brendan Dolan-Gavitt: Right.
>>: And, in fact, as you will see, because we knew it should be XY, it should be
able to allow us to touch it or induce it because of XY.
>> Brendan Dolan-Gavitt: This led us to move down ->>: Instrumented every store to that object or everywhere from that object?
>> Brendan Dolan-Gavitt: That was kind of the path that we were going down,
because, again, if you're replaying this from a log and the application may want
some data that you get to from that object, you have to be able to provide that
data. So this led us to move down a layer. And essentially instrument at the
second of the two stable APIs in the system, which is the WIN 32 layer.
>>: This is [inaudible].
>> Brendan Dolan-Gavitt: Yes.
>>: Close second?
>> Brendan Dolan-Gavitt: This is specific to Windows phone.
>>: [inaudible] instrument ->> Brendan Dolan-Gavitt: Right. I mean, for example on Linux the system call
API is stable. They may add later ones, but the APIs you have is call layer is in
fact stable. Show this is specific to Windows Phone. But in general you can
expect that if the platform provider cares at all about backwards compatibility,
that there's going to be some stable API.
Okay. So with this approach, the benefit is that as compared to going up to
managed code, the interface is a bit cleaner. It's a CAPI, and the amount of
shared state between the two layers is much smaller. There's still some and so
you have to reproduce side effects and things like that in some cases. But it's, in
general, much simpler. So when we're down at this level I should mention some
previous work, which is R2, which was a system by Goed, et al, at OSDI 2008.
And they're from MSR Asia. It's very good work. And but there's some different
differentiators between the design of the two systems that reflect the fact that
they have different goals that we sort of want to move to this collective scenario.
So R2 is sort of pure application. You tell it the application and the interfaces you
want it to interpose on and it gives you a record-and-replay library for that.
Whereas we want to really have this be sort of generic, we instrument the whole
platform and then as you add new applications on top of it, they're going to be
recorded and replayed and supported.
They also are very concerned with exact replay, because they're trying to
reproduce these kinds of concurrency related issues, difficult bugs, whereas
we're more concerned with monitoring and trying to get sort of this kind of
approximate replay we talked about.
And finally their recordings are very instant specific. They include details like the
exact address given back by the memory allocator and trying to make sure those
stay stable across record and replay and so on.
And whereas we're trying to really go for comparable recordings. So for current
progress, so this summer we focused really on implementing sort of traditional
record and replay for Windows Phone 7. This turned out to be a fair amount of
engineering challenge, because this is a new platform and so on. And, of
course, because we built one design and threw it away.
So we haven't yet extended it to the collective case but we hope to be able to
continue work on this project and actually move forward with this. So our next
steps are to actually dog food this on real phones. Maybe give them out to some
people at MSR. Support some more complex applications. I'll show you a demo
of one of the things we have right now in a second. And extend this to the sort of
collective scenario and, of course, run experiments to verify that all this does
actually work. So getting to the details of the implementation. Essentially the
way we do it is you have an application and it makes them call into our native
library. At this point we've placed hooks in the native library, and so it's going to
redirect into our library, at which point we can record whatever we need to, return
control to that native library, and pass it back to the application so that we can
record whatever information needs to be recorded.
Once we get to replay, when the application calls into the native library, again it's
going to redirect to our code, but now instead of invoking the native library again
to get the results, we're going to read the results from the log file and return it
back to the application. So as for how we actually do that, we make use of
detours to do this actual hooking. And I say make use of, but in fact we had to
port detours over to Windows CE. And there were some interesting changes that
had to be made because CE is a different beast than Windows MT. So, for
example, in Windows NT you when have a library that's shared across multiple
applications, and someone goes and makes a change to that library, say by
inserting a hook. Windows NT is kind enough to give you copy on write and give
you copies of that library in different process address spaces.
In Windows CE, when someone makes a write, that write is suddenly visible
across all processes across the system. This is because they don't actually want
you to make changes in user libraries we had to modify the kernel to allow us to
do that.
So it's understandable that this doesn't work. But we needed it to work. So, of
course, now if we have this hook and it tries to call into our code and our code
isn't in the other process, we're going to get a crash. So essentially we had to
work around this by making sure that whenever some library we were going to
hook was loaded into a process, that our recording library was also loaded into
every process and then it would decide dynamically depending on what process
it was whether or not it was going to be recording or replaying that particular
process.
So stop briefly here for a demo. And that looks like me. Also looks like I have
timed out. But -- that's exciting. Caps lock on. That's the problem.
>>: Are you in caps lock.
>> Brendan Dolan-Gavitt: No, when you turn cap locks on that happens. So....
it's not letting me in. All right. Yeah, I don't know. Let me try it with just starting
from you -- wonder if they realized I was leaving next week and just
preemptively -- all right. So okay. So what we're going to do here is take a
third-party application, in this case Draw Free, do some interactions with it and
record them and then try and replay them later on. So I'm going to go in here.
Small application to allow you to record and replay. Record and replay it turns
out also works on the settings application. And the last thing I did before this was
set it to replay mode so it replayed me setting it to replay mode. The system
works a little too well.
So anyway it's now set to record mode. And so we're going to start up Draw
Free. And so once it's loaded we're going to take it, make a little drawing. You
can all stand in awe at my artistic abilities.
>>: Solve this by ->> Brendan Dolan-Gavitt: Yeah. This is what you're getting for your money.
>>: Based on arbitrary third-party application.
>> Brendan Dolan-Gavitt: We downloaded this from the marketplace last night.
>>: It's not a special filtering done.
>> Brendan Dolan-Gavitt: Nothing up my sleeve.
>>: [inaudible].
>> Brendan Dolan-Gavitt: Yep. So now setting it back to replay. Going to go
ahead and start the application up again. And if all goes according to plan, we
should see it draw a smiley face.
>>: Very nice.
>> Brendan Dolan-Gavitt: All right.
>>: Another one.
>> Brendan Dolan-Gavitt: Yes, we can also -- we recently got recording and
replaying of the HTP layer working. So Weidong managed to log in better than I
can.
>>: Have you tried this on a camera?
>> Brendan Dolan-Gavitt: I haven't tried it on a camera. That's a very good
question. We ought to be able to replay the same kind of level of abstraction
because it's presumably a native library in there that's giving you access to the
camera. But we haven't looked at it yet. Hmmm, that raises a whole raft of
privacy issues, doesn't it? [laughter] all right. So starting up this and so right
now it's in neither mode. Hopefully I can actually go back. And so when we start
this little toy app, what this did is it went and made an http request out to a
server. And pulled back the current date and time.
>>: [inaudible] packet.
>> Brendan Dolan-Gavitt: Yes, right. So our record and replay system is not so
good that it can record events in the future. Future work.
>>: Right.
>> Brendan Dolan-Gavitt: So now we can go and set it over to record mode.
And you should notice that the time is different, right? So ->>: [inaudible].
>> Brendan Dolan-Gavitt: So it's now 14:49 and 27 seconds. So now we'll set
this back to replay mode. And 14:49 and 27 seconds. All right. So that's the
demos for today. And can go back to ->>: What happened with the switch applications between one and another, does
that work?
>> Brendan Dolan-Gavitt: That doesn't work now essentially because when it
tunes one and comes back to the other it detects task codes launching again and
assumes that it's a new application being recorded and replayed.
So that's the kind of thing that we would have to essentially build in some support
for this kind of save and restore. All right. So to finish up, the sort of overall goal
of this project is really to provide this kind of usefulness of full trace recording but
be able to minimize the overhead there. So right now we've got an
implementation of traditional record and replay for Windows Phone 7 along the
way we learned some interesting things that were somewhat unexpected. For
example, this idea of doing recording up at the application level doesn't seem to
work out very well with at least with Windows Phone 7 and the .NET framework.
And the other is that depending on our design goals, it's going to really influence
where we want to do our record and replay. And so that for this application we
ended up settling on the native interface, but we had to go through a few other
designs before we got there.
So that's it for me, and I'll take any more questions that we've got.
[applause].
>> Brendan Dolan-Gavitt: Yeah.
>>: So the problems you ran into by trying to do the application mode, because
potentially there's these paths to arbitrary knowledge, and so you felt that you
needed to preserve all of those just in case. Did you look at any -- did people
tend to do that or maybe it's the case that all you really needed was the touch XY
[inaudible].
>> Brendan Dolan-Gavitt: So this actually kind of gets to a question that was
asked earlier, which is what do you do if the applications you're trying to monitor
are aware they might be monitored. So even if in the benign case you can
assume that maybe no one's really going to go and look at that deep part of the
object graph, if it's not there, then a malicious application could use that as a way
of detecting it's being monitored and disable its malicious functionality. Yes?
>>: So possibly another way of producing the amount of recording that you can
do is just to look at the program usage and then identify what are the input that's
hard to reverse that just [inaudible] and that smart causing the usage ->> Brendan Dolan-Gavitt: That's the sort of preprocess it with some sort of static
analysis to really figure out what the bits you want to record are. So that's
interesting.
>>: Are there native functions that are deterministic so you don't even have to ->>: Another way to reduce the recording overhead is to say these native libraries
are deterministic.
>> Brendan Dolan-Gavitt: We actually ran into that when we were doing the HTP
recording replaying. There's one function Internet crack URL that takes in a URL
and splits it up into its different portions. And while it was used by the
application, and we had sort of tagged it as something that we might have to
support, we ended up not having to. At the moment, you know, the only way we
have of doing it is manually, but it's possible there's some automated process
that might be able to identify the ones that are deterministic. Another question?
Yep?
>>: So still a little bit confused about why you would have to record that entire
autograph at all. Seems that the object graph for at any given point for touch
event -- the object graph that that event will touch will be induced by whatever
the program state is. So, in other words, it seems you've got your server over
here, currently understanding what you're doing. You've got the server over
here. You're essentially collecting all those logs from the wild, and you're going
to synthesize a couple of state traces, basically feed those state traces to this
real running version of the app on the server. So that server side app boots up.
It boots up with an object graph. Then you happen to touch event, and whatever
happens to be the state of that server application at that point, well, those will
end up being the object graphs that it will touch. So why do you have to record
that if you want to be consistent?
>> Brendan Dolan-Gavitt: If you have some extra information about when the
touch, who the sender is then you could try to go through like at runtime and say
this object that's already in the application memory is the one that should be the
logical sender.
This also does mean that you need some way of identifying objects the same
one over time. Which isn't completely trivial.
>>: But what you said is actually what we are doing. If we do it at the native layer
all you need to do is set it as attack event XY, and Silverlight and application will
construct the object graph. But if we see on top of Silverlight, then we don't see
the attack event XY anymore. What we see is a big object passing from
Silverlight to the application. And the object and graph, for instance, without
analyzing the application we don't know what references application would use.
Nor how to give everything.
>>: I see. So is there any way, is there [inaudible] I don't know this very well. Is
there a way to generate let's say a fake touch event whereby all you would
specify is the X and Y? Because that seems if that exists that ->>: We could. There is a sort of a [inaudible] test framework. [inaudible]
however touch is just one type of event. What we were trying to do was original
design engineering object logging layer for Silverlight.
So while you're exactly right touch we can't actually sort of imagine sort of a
better way of doing it. That would be a special case. And we found that, hey, we
would have all these special cases, we don't want to deal with all these special
cases. It looks like it would be just easier to move down to a level really generic
solution that wouldn't run into this problem.
>>: We don't want to put our ->>: [inaudible] solutions but that's the trade off.
>>: Silverlight would be specific to Windows as well, right? So...
>>: Have you considered covered channels? Are they nonessential.
>> Brendan Dolan-Gavitt: That's not something that we have considered. So it's
a good question, though. If you could have -- I'm trying to think what covert
channel in this space would look like. So now you're recording and the covert
channel would be something where maybe based on -- actually, I'm not seeing it.
What sort of covert channel were you thinking of?
>>: Go through one of four different URLs for the same site depending on the two
bits you want to ship now.
>>: He's talking about information leakage and using by covert channel
[inaudible].
>> Brendan Dolan-Gavitt: Oh, okay. I see what you mean.
>>: That's one of the challenges for our applications. That's them to worry about
it.
>>: You can just delay your request and delayed can contend with whatever
info ->> Brendan Dolan-Gavitt: That sort of gets more on the side of detecting
malicious behavior after the fact. So if you want to think about it like this, we're
sort of assuming we have malicious behavior oracles that someone else is going
to apply to the data that we collect.
>>: You mentioned the privacy concern, do you have something else going on?
>> Brendan Dolan-Gavitt: Yeah. So in somewhat -- in some ways it's mitigated
in that by recording only little bits from individual people, it's better than doing full
trace recording. That may not be something that users actually buy. So there
are a few things. One is maybe make it something that they can opt into, but
then, of course, this can hurt our coverage because we do need lots of people to
opt in.
The other is there has been some work on essentially trying to mask the kind of
inputs in such a way that they still produce the same effect on the program but
they're not the original input. Right?
So there's some work on ->>: Program encryption.
>> Brendan Dolan-Gavitt: So perhaps the program execution is in itself the
private information potentially.
>>: Password, we don't save it. The credit card number we don't save it. Credit
card number let's just ignore it.
>>: That plays into our trade-offs saying here's some abstraction of the actual
execution to protect people's privacy [inaudible].
>>: So let him answer the question about the camera as I said earlier. I imagine
you would have something along the lines of the user choosing what type of
events they'd be willing to record, for example, [inaudible] click events but not
necessarily the camera recording. So maybe if you had a broad spectrum use
[inaudible] so say more paranoid people.
>> Brendan Dolan-Gavitt: That is a good idea. There's always a danger of
giving people lots of configuration options because they may decide it's too much
trouble to actually set them up.
>>: Especially when there's no immediate benefit to them.
>>: Yeah.
>> Brendan Dolan-Gavitt: Right. It's more of a -- it's sort of a social benefit,
right? So on the other hand, there is what's the application for Android that you
were mentioning.
>>: Look Out.
>> Brendan Dolan-Gavitt: That apparently again it's security software that runs
all the time and has had many users actually download and start using it. Some
people do opt into it.
>>: Lookout. An antivirus -- so it scans all the apps running. It does the full
functionality and sort of checks to see if [inaudible].
>>: Send the information back.
>>: [inaudible].
>> Brendan Dolan-Gavitt: Yeah. So you know people aren't totally [inaudible]
anyway, thank you.
>> Weidong Cui: Thank you.
[applause]
Download