>> Tom Zimmerman: So thank you all for coming. ... Andreas Zeller. He actually visited us many times. ...

advertisement
>> Tom Zimmerman: So thank you all for coming. It's my pleasure to welcome back
Andreas Zeller. He actually visited us many times. He spent time here as visiting
faculty I think two or three times now. And today he's here only for one day.
But to give you some background in case you don't know Andreas, he's professor at
Saarland University in Germany. And he's also ACM fellow. He organized many
conferences. He's very well known and respected in the field.
And today he's going to talk to us about his latest research on app store.
>> Andreas Zeller: Thank you, very much, Tom. And, well, we don't have the PA on,
but I hope you can hear me already. There's not that many in the room today, but it's
going to be -- it's going to be maybe a bit more interactive so feel free to interrupt any
time you like.
So this is a -- this is our first investigation into a field called app store mining, where we
are looking into applications as you find them in common stores and archives the
Google Play Store from -- which holds all the Android applications.
And we are looking at very general question here, and that is whether an application
does what it claims to do, which, of course, is well, this is [inaudible] it's one of the very
fundamental questions of computer -- one of the very fundamental question of computer
science.
And, yes, we had -- we definitely had some fun doing that. And let me tell you a bit
about what app behavior here means, what app description means and what the
checking process is.
Let me motivate you with a simple act as we found it in the Google Play Store. This is
London restaurants. London restaurants with a screenshot. And this is the full
description as you see it with a regular user when you want to download it. It's free -it's free and offers a lot.
It gives you a full list of restaurants and pubs in London together with the type of food.
And you can search for specific kind of food near you and whatnot. So it offers -- offers
all the usual -- all the usual stuff for classical travel and food applications.
The interesting thing about this app is that on top of what it claims to do, which it
actually delivers, it also gives you a couple of extras. In particular this application
accesses your account information. So including your mobile phone number and your
device ID, and sends all of this out to some unknown server. So -- and, of course, none
of this behavior is made explicit somewhere in the made explicit somewhere in the
description.
You could think that an application that goes and takes your mobile phone number and
sends this around for arbitrary usages is malicious in the first place, but that's actually
not true because the question, what makes an application malicious very much depends
on -- on the context. There's application, say, like, WhatsApp, for instance, which is a
very popular -- or perhaps the most popular instance messaging application on Android
and on mobile phones which happily does this just in well.
In order to function, the very, very time you start this WhatsApp application, it actually -it actually your entire phone book and sends the entire phone back to the WhatsApp
server in order to check whether any one of your contacts with that phone number is
already registered with the WhatsApp service.
So the simple fact that an application sends out private information does not make it
malicious in the first place. The question simply is -- the question -- the question is
more whether the behavior comes as a surprise to the regular user or whether it does
not come as a surprise to the regular user.
So we could say if an application does something which is covered -- which happens
somewhere in the background and in particular which is unadvertised, any of these
behaviors, this is -- this is what we're looking for, and this is what we, as regular users,
would be surprised.
So -- and this is something which is particular true for this London restaurants
application because London restaurants, this is a travel application, travel and mapping.
And while the travel and map application would, for instance, happily access your
current location, that's pretty normal for mapping applications. Accessing your mobile
phone number is not part of what a travel application actually has to do. And, therefore,
accessing such account information and sending it around is abnormal in this particular
-- in this particular category, which is different from the messaging category.
So what we do in this -- so what we do in this work is as follows: We have set up a -we have set up a tool chain which is named CHABADA, which is stands for Checking
Application Behavior Against description Of Application. CHABADA is also a French
word. Anyone French here? It's also a French word denoting a specific rhythm in jazz.
Of.
I think there's a song, CHABADA, CHABADA, CHABADA -- this is a famous French film
music for the '60s and a man and a woman, '61, I think, and this is where this specific
rhythm comes from. So if you're interested in French jazz, you'll recognize CHABADA.
And the last time I gave this talk, there were two people from France who noticed that
and who immediately burst into laughter. So apparently it strikes a chord. Oh, okay.
Good. We've had enough puns now. Good.
So this is what we're doing in CHABADA here. We are taking a number of apps.
They're collecting a large number of applications. We are grouping these. We are
identifying the topics that these applications deal with in their description. Then we
cluster them according to the topics -- according to the topics that are dealt with in their
description.
And from that, within each cluster, once we have found that we have a number of
applications that all are -- that have a number of applications that all say belong say to
travel and mapping and map themes, then we look at the APIs, these individual apps
are accessing, APIs as proxies for their behavior. And from that, we look for outliers if
there is any application in there, in that cluster which has different API which accesses
different apps than the other apps in that includes, then we want to identify it as an
outlier.
So this is what I'm going to show you today in this talk, including the outlier detection.
>>: I have a clarification question about the clustering.
>> Andreas Zeller: Yes?
>>: Can I think of each topic as an adjective, any number of which can be applied to a
particular app?
>> Andreas Zeller: I'll come to that -- I'll come to that in a second. I'll identify the
individual steps. And then I'll be happy to get back to your question.
Let's go and -- let's go back and check the individual steps in here.
The first thing we did, well, in order to do that, you need a collection of apps, which is
what we did. So we spent a total of four months running a script that would download
apps from the Google Play Store. We checked the over 30 categories -- we checked all
the 30 categories in the Google Play Store and would download the 150 apps as well as
their metadata which is the description, the rating, user comments and all of that, which
is something we did last winter and last spring.
We only looked at free applications, because for simple reason, my university cannot
afford to pay for plenty of expensive apps, so it's just the top free apps.
This gave us collection of 32,000 apps. Actually you have to run this script for quite
some time because if you download more than a few dozen applications a day,
Google's going to block you out because they would reasonably assume that you're -you're simply going to -- that you're a regular customer.
And we actually made a data package which contains all the metadata of all these apps.
And this is available on the Website. I'll come back to that. And I'll come back to that at
the end of my talk.
So this was our -- this was our collection of applications. And this is what we wanted to
-- this is what we wanted to analyze, and this is where we wanted to -- these are the
apps we wanted to learn which descriptions normally correspond to which behavior.
Next thing is so you would get all these -- we get all these natural language descriptions
of applications. So what you need to do next is you need to do some natural language
pre-processing on this. For instance, you have to get rid of stop words. These are
words that commonly occur in English language such as you, for instance. You don't
care for the -- and you also have to normalize the individual words. For instance, turn
plurals into singulars and likewise such that you get a vector overlap.
So this is what this looks like. So see you are in all these stop words, all these common
stop words would be removed. And this then gives you a vector of words that describes
what the application is, what the application is doing.
And these are the words then within each -- within each description that characterize
the behavior of the application.
On this -- on these sets of -- on these sets of stand words and description we would first
eliminate all the applications whose descriptions were, well, close to being empty, which
actually eliminated a third of all applications because they were literally close to no
description at all in there, which is something I find surprising. But apparently people
download these apps even though they have no description.
And then we used standard technique for topic analysis which is called Latent Dirichlet
Allocation, LDA, which would take these -- which would take these words and group
them -- and group them according to -- and group them into topics that frequently occur
together. So you take the word that frequently occur together and you play LDA to put
them into 30 groups.
And this is a standard technique. So for all the natural language processing we didn't
invent anything new because we are -- well, we're not exactly natural language
processing scientists so we stick to all the standard techniques as -- well, as we always
do when we are venturing in different -- when we are venturing in different areas.
These are the individual topics that we came up with. So we -- so we had 30 topics
overall. We gave them individual words. But these are the most representative words
in there. So here's galaxi, nexu -- you can identify the brand names in here, device,
screen, effect, install, customize, so these -- there's a cluster -- there's a cluster of
topics that all relate to personalization.
Game, video package -- game, video, page, cheat, link, and so on, this is for games and
cheat sheets. Slot machine, money, poker, currency, market, casino, coin, finance.
This is all related to money. And so on.
So we had a total of -- we had a total of 30 topics from top to bottom that all relate to
various aspects of -- various aspects of topics an individual application would address.
Yes?
>>: I have a question about how these apps are categorized in the app store?
>> Andreas Zeller: Yes.
>>: So I'm wondering is whether there are categories like this already assigned to these
applications?
>> Andreas Zeller: Yes.
>>: I suspect there are ->> Andreas Zeller: Yes. The application --
>>: [inaudible].
>> Andreas Zeller: The applications actually already come with categories, that's right.
There's a librarian which does that. But there's two things. For one thing -- for one
thing we would have a -- -- we would -- we would find clusters that you would not
normally find in a store. For instance, we had a -- we had an ads topic in there, which is
we had applications that from their description make clear we have plenty of ads. So
that's clear because we have words like notification policy, share, push, advertise. All
this is in there.
But in a regular store, I very much doubt that you would find a category of ads because
normally, well, this would imply that you, as a user, would specifically search for that
category which normally you don't. No, I'm not too aware of this.
So -- so I wanted to have something that -- when we actually needed something that
would abstract away from the category shown in a store, the second reason for that is
that the categories as we found them in the store did not suit our purpose very well.
But I'll come to this later in the evaluation. Yes?
>>: The better known app is the fewer words in its description?
>> Andreas Zeller: Could be, yes. If you have something like Angry Birds we can
immediately identify what's about Angry Birds. And Angry Birds is in there, too.
What's happening with the screen here? Oh, here we go. Touch screen to make
[inaudible].
>>: More generally maybe texts ->> Andreas Zeller: Yes.
>>: -- don't have enough of the context information to identify what an app is supposed
to do.
>> Andreas Zeller: Yes, that's right. We can -- in the longer run, you can -- you can
also imagine that if you have some sort of brand names like Angry Birds for instance
you could actually say bing the whole thing and then check for reviews and check for
additional information about what would be commonly -- what would be commonly
associated with that.
>>: If you go to Google Play Store ->> Andreas Zeller: Yes.
>>: And you click on app, you see similar apps?
>> Andreas Zeller: Uh-huh.
>>: I've seen Google does some kind of ->> Andreas Zeller: Yes. This ->>: -- analysis to cluster ->> Andreas Zeller: Yes. They -- Google can do that because what they do is they
know who -- they know which applications are downloaded by individual users. And so
they can recommend individual apps to you based on the download history.
They would tell you people who downloaded this application also downloaded these
other applications. This information we do not have. So all we had with the description,
we would not know the individual purchase history of individual customers.
>>: But I'm looking at the Google Play -- but I can look into Google play Website right
now and we can see that here is a similar app. So they're all kind of similar in terms of
functionality?
>> Andreas Zeller: I -- I very much assume that Google employs a number of librarians
who actually -- who actually take care of provisioning this store and organizing it. And
given the results we have, this is not always clear that Google has that. Because we
found -- because like I come to that in a second. But in principal, yes, you would
assume that a well-kept app store would have a number of people who actually take
care of filling the shelves and organizing things.
But what we found in the Google Play Store was that Google apparently does not very
care so much about what they're actually offering in there.
Okay. For the London restaurant -- so every application could address multiple topics.
For the Google -- for the London restaurant application, this would end up in navigation
and travel. This was the first topic. It's also about food and recipes. And it's about
travel.
So this is the -- so this is what we -- what we get now for every application we have a
set of topics that this application -- that this application addresses.
Next thing in here is we would go and we would cluster these applications. We would
cluster these applications according to the topics they deal with because applications
would deal with multiple topics. And the idea here is we wanted to have groups of
applications that are similar according to their descriptions. And for this we used a
standard technique, a K-means technique to identify such clusters. And for K-means
you need a number K in order to identify the number of clusters. And there again, we
used the standard techniques, the so-called element silhouette to identify the best
number of clusters.
And the clusters -- again, we have 32 clusters in here. The clusters are very much in
line with the individual topics. So if your most dominant topic, for instance, was puzzle
and card games, we would have an appropriate -- we would have an appropriate cluster
for that. We'd have a cluster of applications memory puzzles for music, music videos.
Religious wall papers was a category -- was a cluster which I was not aware of, but
apparently there's hundreds of applications that do nothing but display your -- display
your favorite -- this is display your favorite religious symbol somewhere or customize
your screen in order to reflect the -- so all of this is in here.
We also have -- we also have adult photo and adult wallpapers for those of you who are
interested. This is -- I'm happy to report that this was joint work with -- this was joint
work with Allessandra Gorla, Ilaria Tavecchia and Florian Gross.
Allessandra and Ilaria were in charge of looking into all the individual applications and
check for misclassifications. And they -- and these two girls were the ones who
checked out all the individual adult applications. We had plenty of laugher from their
room. And we, as men, wouldn't want to know anything about what they saw.
So, yes, you get -- well, you get the full breadth of human activities reflected in these
applications.
Okay. So these are now 32 clusters. And now in each of these clusters we would have
individual -- we would have -- we would have a set of applications. For instance, the
personalized cluster, these are the individual words as they occur -- as they occurred in
the descriptions of the personalized clusters. The personalized cluster is about themes
and launchers and lockers and choosing and design and whatnot. Okay?
And this is very different from, say, the travel cluster. The travel cluster would be about
maps and information and searching and travel of course. So these -- and this is how
you can see already from the word clouds that the applications in each of these clusters
are very different from each other.
So now comes next part. For the applications in each cluster we wanted to identify -wanted to identify outliers as it becomes to behavior. And as a proxy for the behavior,
we simply looked at static API usage, which is -- which is fortunately not difficult
because each APK, this is the binary form of -- this is the binary form of Android
applications. You could simply -- you could use a simple static analysis to check which
external APIs are being used.
However, we did not look into all APIs that were being used, say all the string API calls
and whatnot. This is not very interesting. We only considered those APIs which are
sensitive, which means that they would be governed by a specific Android permission,
meaning that the application -- if the application wants to use one of these sensitive
APIs, it specifically has to declare so in its -- in it's manifest.
So if an application wants to access the location, it has to ask for a specific permission.
And this would also, in principal, be asked -- be presented to the user when the user
wants to install the whole thing so the user has to explicitly grant the permission to have
the app use its -- use it's -- his or her current location.
So for London restaurants, the APIs being used looks like these. So these are all the
sensitive APIs that are being used. So from sensitive APIs you can already see, well, it
accesses web pages, it accesses -- it accesses servers by URL. It used the notification
manager. It checks whether WiFi is on, whether the network is connected.
So all of these are sensitive APIs. And these sensitive APIs are governed by a set of
permissions. These are the appropriate permissions. And you can see in here. So it
needs to access the WiFi state, network state. It needs to access the current phone
state and among others, it also needs to get access to your account. Well, that's what it
claims -- that's what it claims it needs.
And if you now -- if you now look at all the APIs that are being used in travel cluster or
specifically all the permissions that are being used in here, you see that these are also
pretty -- these are all very characteristic for the individual -- for the individual apps.
So all the applications in travel cluster typically need to access your current location,
your precise current location which is obvious, and they also need access to the
Internet. This is what makes a travel application.
Whereas a personalized cluster has means to access the external storage. It works
over the network. It is able to turn on and off individual components. This is what
makes it personalization. And so you can see that not only do these clusters very much
defer by their descriptions but they also very much defer by the APIs that are -- by the
APIs and the permissions that they commonly need.
So this is what we have now. For each application in our cluster we can check the
individual APIs that are being used in there. And now comes the next part. We check
for outliers in -- we check for outliers within each of these clusters.
So, again, here's a travel cluster with all the permissions of APIs used. And this is
generalized for all applications in the cluster. And now let's take a look into the London
restaurants. This is one of the applications in the cluster. And you can see that there
definitely is difference in here because London restaurants uses this -- uses APIs which
are related to the get accounts permission which are APIs which are not used by any
other application in the travel cluster. Which means that this is an outlier and these
three API calls which are in bold in here are precisely those API calls which are unusual
for this, which are unusual for that cluster.
So the question is how can we identify -- so how can we identify such outliers
automatically? For this purpose we used a so-called one-class support vector machine,
which is a variant of the well known support vector machines which are commonly used
for classifying. But in this case, we use -- we use it for a different purpose, namely for
identifying outliers.
For each APK, we put in a vector of -- for each APK in the cluster we put in a vector of
the sensitive APIs and the number of call sites for that API.
And what OC-SVM does is, like any SVM it creates a plane -- well, an N dimensional
plane that tries to get as near -- tries to get as near to the individual applications as
possible. And then you can identify the distance between that plane and the individual
applications. And from that distance you can identify, well, here's an application which
is far away from the norm, which is an outlier which is different in its features and,
therefore, this is a likely -- and therefore this is a likely outlier in here.
So -- and this is also precisely what OC-SVM would identify in here. London
restaurants would be properly identified as an outlier through by its usage of APIs it
would simply defer from the norm.
So -- and this is-this is what CHABADA gets you in the end of all these steps are -- all
these steps are fully automatic, I should say. In the end what it gives you is a list -- a
ranked list of outliers with respect to the API usage within each of these clusters.
So the question is does this make any sense? So what are -- what are the outliers that
-- what are the outlier -- what do the outliers look like that you actually get there?
This was -- this was fun work because the -- because you get to see a lot of apps this
way. So what we did was a simple thing. In each of our 32 clusters we identified the
top five outliers, and then we looked at them. We identified them. We checked what
they would actually do and how this would -- how this would correspond to the
description. And I can tell right away that 26 percent of all these outliers showed
behavior that, A, used sensitive APIs; B, was not advertised; and, C, acts against the
interest of its users such as, for instance, accessing sensitive information, propagating
this without being advertised in any place.
>>: How did you know it was doing this? Do you use the app or do ->> Andreas Zeller: We use the app. We started the app. We run the app. We use the
-- we use the standard debugging techniques in order to identify which information was
accessed, where it went. So this was a bit of -- this was a bit of standard -- well, so
good thing was that these applications were not malware, per se, so they didn't use any
sophisticated obfuscation techniques to disguise their behavior. So far so good.
So by using standard techniques, debugger and checking which APIs are being used,
we could very quickly find out what they're actually doing.
>>: So I might be getting ahead of you, but I'm curious what the other 75 percent ->> Andreas Zeller: Yeah. I'll be getting -- I'll be getting that in a second.
>>: So related to -- so [inaudible] this question. So a lot of these apps are interact
apps. I was wondering how you bug these traces automatically?
>> Andreas Zeller: We wouldn't -- so someone phone for the evaluation, this was
manual work. So we actually interacted -- we actually interacted with them. And we
checked whether the -- whether we get in the outlier description, whether this made any
sense.
>>: But you have to [inaudible] application that you are trying to classify based on their
run time tracers, right?
>> Andreas Zeller: No. So far -- so far what we did is all automatic, A; and, B, it runs
on -- we are checking API usage statically, so all of this does not require execution.
But, in fact, we assume that in the long run, a simple stamping of APIs will not suffice.
So we're looking into more fine-grained stuff. And I'll come to that in a second.
And you also had a question? Or let's just proceed a moment here.
>>: So you don't have the source code, but ->> Andreas Zeller: We don't have the source code, no.
>>: Also statically how do you make [inaudible] using APIs?
>> Andreas Zeller: We simply -- we simply know that it declares that it's -- that it has a
dynamic link to the API, so that's what it is. Number two, it uses the -- it requires the
proper permission to use that very API.
So the April usage is not there by accident because -- because others they wouldn't
have to declare the sensitive information too. So these are -- but, of course, we don't
know whether the API is actually ever being called.
>>: So this is about declarations and ->> Andreas Zeller: Yup. You can think of -- think of a binary, which is dynamically
linked. And you have an external reference to -- external reference to some API. And
this is precisely what we're identifying there.
>>: So I'm confused on what the [inaudible] vector machine was. Because she said a
number of calls ->> Andreas Zeller: Yes.
>>: Okay. So what is that number? If you don't have access to the source to see how
many calls to the API are made, how many ->> Andreas Zeller: We know the number -- we know the number of call sets because
you can do a simple binary analysis to identify the number of calls -- the number of
jumps in the -- you look at the bytecode, yes, straightforward. Nothing more
complicated than that.
And if you ask me whether the number of call site is needed or not, I would love to tell
you that, but I don't know. They start with the number of call sites and that's what we
had in the beginning.
>>: Sort of [inaudible] anything like any API call you use you have to tell us about that?
>> Andreas Zeller: No. But if you want to access the sensitive API, you have to declare
this in the manifest by requesting the appropriate permission.
>>: Just click okay, I guess?
>> Andreas Zeller: Users generally as far as I see click okay to even the most -- click
okay even when it's only the most outrageous -- even when asking the most outrageous
behavior. So we have that experience in malware. There's an Angry Birds game. You
start it. It asks we want -- we want you -- we want the ability to send SMS. And you
wonder why do you need to send SMS? Because that's how we finance ourselves.
See if you like the game, after a couple of games you can send an SMS and to some
premium number. This is why we need it. Only for that, of course.
What the application really does it starts churning out premium SMS the very moment
you start it. So -- yeah. Well, it's not always easy.
Okay. Let's come -- let's come to some of our -- let's come to our stuff, what we found
in there. First and foremost outliers. We found plenty of applications that use ad
frameworks. There's apploving and airpush. These are -- well, many of these -- well,
many of these free apps finance themselves through ads. So far so good.
What's interesting, though, is these ad frameworks give you, as a developer, the higher
percentage of revenue if you allow your app to access more sensitive information.
Meaning that -- meaning that the more sensitive information your app accesses, the
higher -- the more money you get from your ads.
>>: That's crazy.
>> Andreas Zeller: I know. That's ->>: [inaudible] more effectively?
>> Andreas Zeller: That's simply because they gain more information about you, which
is -- which is more marketable. So if they not only know -- so if they -- if you just send
an ad to some anonymous user, yeah, you get a small share. But if they know where
you are, if they know who you are, if they know what's your mobile phone number such
that they can text you messages or whatnot, yes, you get a -- you get a higher amount
of money. That's the -- that's the sad reality of -- that's the sad reality of ads it gains.
That's what we found in here.
We also had strange behavior in there. We had the UNO game, for instance. What
was that, the UNO game requires access to your microphone and extensively asks so.
So there's no -- there's no future in the game which actually does that. But maybe -maybe this is targeted to some head of state. We don't know.
You install UNO on your -- your ->>: There is a part of that game, at least the physical one, where you actually are
supposed to yell out UNO.
>> Andreas Zeller: Yeah. That's right. That's right. That's perfectly fine.
[brief talking over].
>> Andreas Zeller: I have that in my notes. One second. Here we go. Wonderful.
What else do we have? WICKED -- WICKED is the official application for the -- no,
sorry. I was confused. WICKED is the application for the musical. This is what -- this is
the one that accesses your microphone. UNO needs your location. Possibly to figure
out whether you're sitting in front or which side of the chair you're sitting. I'm not sure
about that.
Yahoo -- Yahoo mail, this is Yahoo mail, is also able to send text messages. It asks for
text messages and apparently it can send them. But, of course, this is -- this is not
advertised anywhere.
>>: Some of the mail people use text messages to -- as verifies.
>> Andreas Zeller: Yes. That's right. That's right. But then you would -- you would
have at some point during your user interaction or in the description you would have
some message -- some notice that tells you that this is actually the case. Nothing of
that found in these apps.
>>: Did it actually send SMSes when you were using it? You said you ran these up ->> Andreas Zeller: No.
>>: You didn't actually ->> Andreas Zeller: We didn't see -- we didn't see -- we didn't see if anybody explicitly
requested the permission to send out SMS messages, I have no idea why -- I have no
idea why -- why this is in there. Okay?
We also had applications that didn't fall into the right clusters. A SoundCloud is an
example. This is an application where you can actually record audio and send this
around. And this ended up in a sharing -- in a sharing cluster. However, recording
audio is uncommon for sharing application. This actually means that SoundCloud
actually is a mix of somewhere between audio and fairing and -- well, it had to fit into
one of our clusters. So that's obviously room for improvement.
We also had a an application called hamster life, which ended up in religious wallpapers
and promptly triggered [inaudible] became an outlier in there. We have no idea this is
the case.
And here's my favorite example. This is Mr. Will's Stud Poker. This ended up in the -this ended up in the game and money cluster. And it was the only application in that
whole cluster that would not display ads. And since it would not display ads it was
immediately identified as an outlier. So -- so what you see in there outliers does not
mean it's good or bad, it simply means it's uncommon. It shows uncommon behavior.
And, well, if it does not show ads, then this is uncommon behavior.
>>: Okay. Remind me what the features are to the SVM here. Is it just the APIs ->> Andreas Zeller: Just the APIs. The set of APIs, yes. And -- and this Stud Poker
game does not access the Internet. And but what you normally need for access
application adds ->>: [inaudible] but it's the fact that --
>> Andreas Zeller: Exactly. Exactly. Yeah.
>>: Okay.
>> Andreas Zeller: That's the thing. So, yes, so you can also have outliers in these
clusters which are simply uncommon behavior. But in this case it's actually one good
behavior and a whole set of bad behaviors in there.
And speaking of bad behaviors, we also did a second run evaluation and that's even
more fun. We also took a set of known malicious Android applications and checked
whether these applications -- checked how these applications would end up, whether
they would end up as outliers or not.
So for this what we did was we trained the OV-SVM now as a classifier on 90% of
benign apps which we downloaded from the store, which was fine.
And then we had 173 known malware apps in which we used the OC-SVM as a
classifier and then we wanted to see how these benign apps as well as our known
malware apps would be classified. And this is what we got out here. So these are -- all
these, of course, are average numbers over multiple runs and multiple splits. So from
our malicious apps -- from our malicious apps the majority, vast majority was also
predicted as malicious. Smaller part was predicted as benign. And our benign apps,
larger majority was also predicted -- was also properly predicted as benign. And
percentages, 84% of our benign apps were correctly predicted as such and 56% of our
malicious apps were also properly predicted as such.
And this is -- and, of course, if you do have some -- if you do have a malware scanner
on your Android app, you will get much better results. But then such malware scanner
actually checks against patterns of malware that are already known. All right? So it has
a history that it can check against. Whereas we can take an arbitrary app which nobody
has ever seen before and the description and these APIs and find out that what it claims
to do does not correspond to what it actually -- what it actually does in terms of API
usage.
So this -- as a -- as a sole detector of malware, this would not be enough. But if you
combined this, say if you combined this with patterns of known malware, if you bring this
together, this can be an excellent -- this can be excellent combination.
Checking against new apps which nobody knows anything about, just checking what
they claim to do and what they actually do. Here again, this is a correct classification
with clusters. We also check the individual components of our CHABADA approach.
So this is without clusters. You see that without clusters the detection rate of malicious
apps immediately goes down. So all of this clustering, according to descriptions
actually -- actually is needed.
And also you asked beforehand the Google Play Store also has categories, that's right,
which is what we did too. And we checked for the categories give from the Google Play
Store and you can see that our clusters also are better than what we find in the Google
Play Store. Also we require less because, well, we don't have -- we don't actually need
any such -- any such categorization in the first place.
So this is what we get in our -- what we got in our second -- what we got in our second
evaluation. So we can improve or even malware detection, which is good. We're better
than the given categories. And we don't require any prior knowledge about what makes
malware or not.
>>: I just have a quick question about the -- this evaluation.
>> Andreas Zeller: Yes?
>>: Where did you get the 175 malware apps?
>> Andreas Zeller: This comes from a standard paper. I think the name is Android
malware dissected where researchers -- I think this was one or two years ago where
they brought together a collection of malware which they made available for download
which was the precise at which we used in there.
However, we only looked at those malware -- we only looked at those applications for
which we could actually find a description. Including copies of Angry Birds where we
actually looked up the official description from the play store for Angry Birds to associate
this with a name. So this was relating to the question you had earlier. So you find
something that's called Angry Birds APK, okay, and so we associated this with the
description of Angry Birds from the Google Play Store and used this as an -- and used
this as a classifier for the malware. Okay?
And then what you would find then is -- so if this APK would be -- would be nothing like
Angry Birds, nothing at all, would not even look like a game at all, it would immediately
be flagged as an outlier, but even it would fall into the correct cluster based on its
description and then, again, we would identify -- well, it does a couple of things
differently from the other game and again would be flagged as an outlier.
And that's -- and I'm not saying that this is going to be the ultimate tool in malware
detection. It won't be. But if you have something that's totally new and nothing else but
the description and it's behavior, this can help -- this can help you a lot identifying the
worst offenders in the first place.
Okay. A bit of current work. And we're running out of time. This is -- as I said, we're
not experts in natural language processing. We can do much better than that. One can
look -- one can make use of ontologies. One can make use of -- what's the plural of
thesaurus? Thesauree? Synonym dictionaries. Okay. That may be easier. That's
things we can do.
We can also go and create implicit explicit mappings between topics and behavior,
figuring out here's a topic and these are the APIs that are -- combinations of APIs that
will be associated with that topic.
We can look for much better features of individual -- of individual applications.
Information flow would be -- information flow would be our candidate -- we're heavily
investing into that. And also checking whether any of this behavior is actually -- is
actually authorized by the user. Something like a dialog pops up asking we'd like -- in
order to continue play, please allow us to send this text messages to this premium
number which is going to charge you by so much, are you okay with that, yes, no.
At this point when you say yes and all the information is there, it's perfectly legitimate at
some point. But this is also something we don't see at this point. Okay? This is part of
-- this is part of looking into the information flow.
Problem with these -- problems within -- with these later points is as you already pointed
out, if you do -- if you do -- if you want to do all of this by static analysis, you run into
problems because the app -- including some of the apps we found. And definitely
malware does that. You have plenty of features that make static analysis very hard. I
mean, as soon as you set up an interpreter of your own of a language of your own or
even subset of language of your own, your static analysis essentially stops working.
You also have -- you also have instances of code that get side loaded, that gets created
on the fly or that is just normally obfuscated, so all of this will make your life hard if you
do static and symbolic analysis, and in practice this means that -- oh, yes?
>>: One thing that I thought you were going to mention on the slide is what about a
sound recorder application ->> Andreas Zeller: Sir?
>>: What about a sound recorder application that also spies on you ->> Andreas Zeller: Yes?
>>: -- so it has a genuine purpose for using an API, but it also has a ->> Andreas Zeller: Yes.
>>: [inaudible].
>> Andreas Zeller: Well, the thing is if the sound recorder does what it says, it says I'm
recording sound and don't install this -- only install this on your own device, never install
this on one else's devices we can only tell whether it does what it claims to do and what
-- and that's it. But ->>: [inaudible].
>>: [inaudible].
>> Andreas Zeller: We can only tell you beforehand this is what it claims to do and it
does this well or it won't. But what you -- the use you make of it is not something we
can check afterwards. Yes?
>>: What can we make remote code?
>> Andreas Zeller: A remote code? A remote code simply means there's some sort of
behavior which happens on the server which you don't know. You evoke some service
on some server. The result comes back and you have no idea what's happening,
what's happened there.
>>: So much of your analysis is based not necessarily on what's in the bytecode or
what's in the manifest.
>> Andreas Zeller: Yes.
>>: Because the manifest tells you what capabilities application ->> Andreas Zeller: Request you mean?
>>: Request. And if it doesn't ask for that capability, if it's not able to call those certain
APIs ->> Andreas Zeller: That's right, yes.
>>: -- are they able to call those APIs using any of this tricks, or does Google somehow
prevent this because they do dynamic monitoring to make sure that ->> Andreas Zeller: Google does some dynamic monitoring, yes. But Google does
some dynamic monitoring they have an application in place which is called bouncer
which actually will -- it actually clicks randomly around on the application in order to see
what it's done, what it's dynamically doing.
But if you wanted to set up any covert behavior in your app, the simplest thing to do is to
check the network you're in and if your network is Google's network you behave just
perfectly. And as soon as you leave the Google network, that's when you can -- that's
when you would start your bad behavior.
>>: So -[brief talking over].
>> Andreas Zeller: Huh?
>>: So an app could use those sensitive APIs ->> Andreas Zeller: Yes.
>>: -- without having asked for permission?
>> Andreas Zeller: No. No. No. That's not true. They have to ask for that. But what
you would normally find is they would come up with some legitimate use for using these
apps. If you wanted an application to spy on you, for instance, okay, say using a sound
recorder you would claim that your reputation has this cool feature of voice command,
okay? Say -- and now new in this release control your character by voice command.
And, of course, then you need to enable the microphone.
And the thing is, once you have -- this only happens once during installation. So there's
no dynamic check afterwards. With iOS there's difference. So the very first time you
access an API, that's when you'll be asked whether you want to use -- whether you
want to use the microphone. And then this would be a bit different because you would
be asked -- the application would ask you in context, okay?
So you would start -- you would say in your settings, you would say enable voice
commands and then you would get a dialog from the operating system asking you
whether you want the application to -- whether you want to enable the application to
access your microphone, which means only at -- only at this point would it actually be
able to do that.
And in Google this is different because -- well, you give all these permissions at the
moment you install this. And you may even forget about them afterwards. So not all is
well in -- not all is well in these categories. And the fun part is all these mobile devices,
all these wonderful sensors and all this sensitive information and all -- they will soon be
able to access your heart rate and your pulse and your blood pressure and whatnot.
>>: Your fingerprint.
>> Andreas Zeller: Your fingerprint, yes, that's true. And all that. Yes, so there's this
involved with that.
>>: So one question. Sometimes an application in a particular domain may use some
-- some senator [inaudible] whatever, and that's actually something that is able to
differentiate that app. And so, like, UNO maybe uses the location because it's like, oh,
find people really close to you that want to play UNO or something like that.
>> Andreas Zeller: Yes.
>>: When you looked at your top five outliers, did you ->> Andreas Zeller: Yes.
>>: Did you check back with the actual app descriptions, not the LDA summarizations,
the actual app descriptions ->> Andreas Zeller: We checked with a ->>: -- to see if it was a ->> Andreas Zeller: We checked with the -- so UNO was -- UNO was one of the outliers
where we found -- where it accesses -- where it can access your location. But we would
note be able to tell whether this was against your interest or not. So these 26 percent
was only stuff where we found the difference explicitly against your own interest.
There's another second category, I think which is another 14 percent or so where we
found, well, it accesses your data but we're not able to tell whether this is a feature or
this is a feature or something that is directed against you. Okay?
So this is -- there is the second -- the third category was simply false positive, which is -well, something -- something that's unavoidable with this sort of analysis.
I'd just like to come to a close. So this is -- so we're -- so we will -- so figuring out the
relationship between what's happening on the screen and what's happening in the code
is becoming increasingly difficult with static analysis alone. So what we're doing here is
we are investing heavily in -- we're investigating heavily in test case generation, which
allows us to interact with the application on this level and then check what happens on
the -- and then check what happens on the code level and then use this whole thing to
guide our -- to guide our -- to guide the test generation.
And in the interest of time, what we're using here is -- what we're using here is text
called search-based testing, which is an evolutionary algorithms which start with a
population of random inputs which involve these inputs mutating them towards specific
fitness goals.
And even if you have a very simple way to measure the fitness, these algorithms will be
effective. Which means that you don't have to deploy your full-fledged symbolic
analysis on [inaudible] code but all it takes, for instance, could be a simple
measurement which code is being executed and which not.
And we are looking into various fitness goals in had here. Branch coverage. API
coverage, meaning that we can guide test generation toward specific APIs. For
instance, sensitive APIs. We can kind test generation toward specific information flows,
which is very interesting. And all of these fitness functions are very easy to set up. I
mean, it takes -- how long does it take? It takes about two to three weeks to set up an
individual fitness function and then you will have your generate -- your generic test
generator adapted to the specific runtime feature that you want to look at.
And this is a demo we had this year at the [inaudible] fair. This is one of -- this is our
robot which happily clicks along on android tablet. So we are well able, and we were
able on this date to very -- to identify individual interaction elements, to click on them, to
-- and to guide the entire process.
At this point where we showed the whole thing, we only have a handful of applications
on this -- on which this would work reliably. But by the end of this year, we're going to
expand this to the full roster of applications we're having right now in our set. And then
we will be able to look at all these dynamic features too.
By the way, at the booth, plenty of people come along and ask, hey, you have a robot,
that's cool. But why don't you do a -- but can't you do this in software as well?
And we said, yes, we can do this in software. It's 10 to a hundred times faster. We can
do a hundred times in parallel. And we can put it all into a black box which I will put up
-- which I will put up here and it will be happily humming around.
But it turns out that a black box that humming around doesn't make a good demo, so we
had -- so this robot was -- this robot was a nice --
>>: [inaudible]. [laughter].
>> Andreas Zeller: Sure. Sure. I could just -- I should just hire -- I should just set up a
corporation with this Chinese university where they have plenty of cheap work labor.
Okay. And this is what we actually -- this is the overall theme what we're looking into.
So for hundreds of thousands of applications. Possibly the totality of all applications
there are. We want to be able to gather their description, they're metadata, but also
gather static features, gather dynamic features as many as we can. And then find out
what's common in there, what's uncommon in there.
And in particular, I'll learn the common behavior such that we can more easily identify
what would be a surprise for future users. And given that we now have these
wonderfully plated, these wonderful populated stores, hint, Microsoft also has them I
think, and given that so far the field of -- given that so far when we in research mind
individual projects, we looked at the version history, which is fine, which is very detailed,
but we only had access to overall, a handful, two handfuls of projects. This is fine
because we get all the detailed development information. But with this you get an entire
new access where you can look at hundreds of thousands of applications and see
what's common in these applications. Not so much the development history, but how
they behave and -- but how they behave and what users can reasonably expect from
them.
And with that, I'd like to close. I've shown you our CHABADA approach which identifies
outliers based on descriptions and APIs which identify such outliers reliably, plenty of
key findings and a full data set, 200 megabyte XL file to downloaded with all of the -- all
of the metadata in there.
That's it. Thank you for attention.
[applause].
>> Andreas Zeller: I'm happy to take more questions.
>>: Even though the application that mostly makes benign use of your personal data
could occasionally misuse it, right, I mean, your approach will not be able to [inaudible]
approach ->> Andreas Zeller: Yes.
>>: Do you think there is a need for application developers to actually prove that they're
making only benign use of once personal data because they can't publish their apps ->> Andreas Zeller: Well, they ->>: [inaudible].
>> Andreas Zeller: The thing is that it won't be easy to tell what is benign and what is
not benign. This is going to vary tremendously from between users, also between
social contacts. I remember when Apple came up with the app store in Germany, there
was lots of outrage because Apple would sensor any application that would show a
nipple, for instance, which in Europe is not that much of an issue. Of course it is in the
US. And so would you -- and the question is, if an application does that, would it be
considered malicious, would it be considered benign?
This very much depends on your culture context. So ->>: [inaudible].
>> Andreas Zeller: So ->>: [inaudible] sending your e-mail to an advertiser, for instance ->> Andreas Zeller: Yes.
>>: -- where the app should declare that to the ->> Andreas Zeller: Yes. So normally -- so normally I would assume that there would be
-- so what I'd like to have is that for every sensitive information, an application accesses
there should be some part of -- there should be some implicit declaration and
expression of what it does and why it does that, and you can think of this as a simple
form of consumer protection. Okay? So that's something I'd like to see in our app
stores.
I don't care -- I actually don't care that much whether an application is benign or not
because this is -- this is very hard to tell. I just want to -- I just want to make sure -- I
just want to make sure that if an application -- well, in particular for free applications,
okay, which is where -- where it's obvious that they have to -- they have to come up with
some revenue model in order to finance this development, but it has to be evident to
users what these applications do and the risk that come witness.
And this is something we don't see in the -- in the Google store at all. There is
essentially no regulation at all. So -- and I'm also not sure what consumer protection
agencies think of the -- think of what's being offered there. Right now it's wild west.
That's all we see. And from what I see is that many of the outliers that we have
identified have been taken down by Google since -- in the meantime. Not because of
our work but actually because users complain. That's a fine policy.
But I would very much prefer if such actions took place before users start to complain
about what -- about the damage that's being done.
So that -- that's what I'm aiming for. Not so much coming up with better definitions of
benign and malicious but protecting users against what they can -- against surprises
they can find in their -- in these stores.
And if you protect them against surprises, well, as a side affect you also get rid of many
of the malware problems, which is a good thing.
>>: I mean there's a lot thing here but I have a very hard time sort of understanding
what it is actually that you -- what is actionable in what you ever built. Because there is
a lot here, right? And, you know, there is a lot malicious malware detection, especially
for Android.
But in some sense it's like analyzing [inaudible] applications, you know. Any time you
look at one you find something.
>> Andreas Zeller: Yes.
>>: So the bar is at this point fairly high.
>> Andreas Zeller: Yes.
>>: And it's not exactly malware ->> Andreas Zeller: No.
>>: I think the hard problem that is fundamentally unsolved actually was [inaudible] tries
to address it is understanding intent, which is to say if there's a disclosure for instance
of some sort, is that intentional, or is it not? And ultimately your point ->> Andreas Zeller: Yes.
>>: -- cases where it's not so obvious. Now, how to present the purpose of a particular
disclosure to the user when usable security has no [inaudible] every possible paper I've
read about it has all the negative results.
>> Andreas Zeller: Yes. I ->>: So, I mean, frankly Android's attempts to have these permissions has made it so
it's impossible for these [inaudible] ->> Andreas Zeller: Yes. No. What ->>: So, I mean, there's a lot here. And certainly, you know, outlier detection is, you
know, profitable in a number of ways as [inaudible].
>> Andreas Zeller: Yes.
>>: But it's not like, you know, given the false positive rates in a sense. It's not a tool in
a sense, right? You don't really present the [inaudible]. So what is actionable, I guess?
>> Andreas Zeller: So for me there's two things in it. First, the approach is actionable
in the sense that if you do have -- if you do have an application which you know nothing
about, you can check it, you can find out that it does things that are unusual in the first
place, which is the not enough to -- which is -- which may not be enough to kick it out of
some app store instantly. But certainly it is something which should prioritize your
scrutiny efforts on this one, okay? If you say that ->>: [inaudible].
>> Andreas Zeller: He will have yet another wallpaper app and it does nothing unusual,
okay, just get it through. So user can use it for triaging.
The second thing is -- and when you're talking about usable security, I think that if you
indeed go and present users these onsets of permissions and what they imply or worse
-- or worse description of information flow within applications here's the of course, here's
the sync, here's another source, here's another sync and here's 134 possible mappings.
Which ones of these are okay and which are not? Totally out of the question.
I mean, even we as experts will have a hard time understanding that. And what I think
-- and we're not there yet, but what I think is if you can leverage the common knowledge
about applications and the expectations that are encoded in there, then you can say
here's Angry Birds 2, it's the same as Angry Birds and it's the same as any action game
you've ever seen as it comes in security. So this is safe. That would be a thing.
So if you can, rather than coming up with entire at the time of things, if you can simply
point out the differences between how this particular app is different from everything
you've seen so far or what it enables -- the things -- the things that make this unusual,
and if you have a way to catch this, if you have a way to grasp it, it will, I think there's a
chance in there to make usable security much more incremental than it is today.
>>: [inaudible] filter, a solution for some sort of [inaudible] but as far as exposing to the
end user, the end users think that more permissions is better. Because ->> Andreas Zeller: Can do more with more permissions. Yes.
>>: So you say that there's Angry Birds 2 that [inaudible].
>> Andreas Zeller: Yes.
>>: [inaudible].
>> Andreas Zeller: Yes.
>>: [inaudible] wonderful than the previous one. [laughter].
>>: What happens in these app stores when they deny the permission ->> Andreas Zeller: They won't get installed.
>>: [inaudible] some operation.
>> Andreas Zeller: Then they're not installed.
>>: They don't install them.
>> Andreas Zeller: It's -- it's -- it's -- it's -- it's all or nothing. It's an all or nothing.
>>: [inaudible] installation time.
>> Andreas Zeller: Yes, only at installation time. You're obviously an iPhone user,
right?
>>: Yeah. Yeah. [laughter].
>> Andreas Zeller: Different. They're very different for iPhones. Yes.
>>: We have recommendations on the whole permissions model, right. When I'm
asked ->> Andreas Zeller: Oh, that's right. That's right.
>>: I mean, it's so easy to be [inaudible].
>> Andreas Zeller: Yes. So what I'm -- this our very first attempt actually to extract
such common behavior from applications. I don't think it's going to be a -- I don't think it
will be a solution to the malware problem or to privacy issues in [inaudible] but it's -- but
I think -- but I think we're in a good direction here.
>>: [inaudible] problem in that malware as it's termed is slightly misused in this
[inaudible] privacy violates.
>> Andreas Zeller: Yes.
>>: And what privacy violating is unclear because [inaudible] fundamentally unclear,
right, and so this is kind of a bit of [inaudible] fashion.
If people knew that it -- then you can game it by ->> Andreas Zeller: Oh, yes.
>>: By generating loading ->> Andreas Zeller: Yes.
>>: With certain phrases.
>> Andreas Zeller: You could -- you could ->>: So that you're -- so that you're not an outlier anymore.
[brief talking over].
>>: Yeah, that's right.
>> Andreas Zeller: But then this is well known if you advertise your house for sales, for
instance, and you say this is in a district of the town which is known for it's vibrant night
life, whatever, okay, you can find euphemism for whatever is going on.
>>: Basically if the outlier is getting somebody's address --
>> Andreas Zeller: Yes.
>>: -- that's why these phrases, then what you can do is generate a thousand apps ->> Andreas Zeller: Oh, yes, yes.
>>: -- back into those phrases ->> Andreas Zeller: Yes, of course.
>>: With that call. So even though they're not actually downloaded, it now makes it no
longer an outlier.
>> Andreas Zeller: That's -- that's right. Once you have such a system in place, you
may be able to gain.
>>: Yeah.
>> Andreas Zeller: But then this is true for -- but then what you do next is you would
actually -- you wouldn't do this for the totality of all apps that are offered, but you would
do this only for apps that are actually being used. And then coming up with a thousand
apps that are -- they're only for -- for generating new common behavior would help you.
So but, of course, in malware this is always an arm's race at some point. It can always
be something new. It can always be something different. There you go.
>> Tom Zimmerman: Thanks.
>> Andreas Zeller: Thanks.
[applause].
Download