>> Tom Zimmerman: So thank you all for coming. It's my pleasure to welcome back Andreas Zeller. He actually visited us many times. He spent time here as visiting faculty I think two or three times now. And today he's here only for one day. But to give you some background in case you don't know Andreas, he's professor at Saarland University in Germany. And he's also ACM fellow. He organized many conferences. He's very well known and respected in the field. And today he's going to talk to us about his latest research on app store. >> Andreas Zeller: Thank you, very much, Tom. And, well, we don't have the PA on, but I hope you can hear me already. There's not that many in the room today, but it's going to be -- it's going to be maybe a bit more interactive so feel free to interrupt any time you like. So this is a -- this is our first investigation into a field called app store mining, where we are looking into applications as you find them in common stores and archives the Google Play Store from -- which holds all the Android applications. And we are looking at very general question here, and that is whether an application does what it claims to do, which, of course, is well, this is [inaudible] it's one of the very fundamental questions of computer -- one of the very fundamental question of computer science. And, yes, we had -- we definitely had some fun doing that. And let me tell you a bit about what app behavior here means, what app description means and what the checking process is. Let me motivate you with a simple act as we found it in the Google Play Store. This is London restaurants. London restaurants with a screenshot. And this is the full description as you see it with a regular user when you want to download it. It's free -it's free and offers a lot. It gives you a full list of restaurants and pubs in London together with the type of food. And you can search for specific kind of food near you and whatnot. So it offers -- offers all the usual -- all the usual stuff for classical travel and food applications. The interesting thing about this app is that on top of what it claims to do, which it actually delivers, it also gives you a couple of extras. In particular this application accesses your account information. So including your mobile phone number and your device ID, and sends all of this out to some unknown server. So -- and, of course, none of this behavior is made explicit somewhere in the made explicit somewhere in the description. You could think that an application that goes and takes your mobile phone number and sends this around for arbitrary usages is malicious in the first place, but that's actually not true because the question, what makes an application malicious very much depends on -- on the context. There's application, say, like, WhatsApp, for instance, which is a very popular -- or perhaps the most popular instance messaging application on Android and on mobile phones which happily does this just in well. In order to function, the very, very time you start this WhatsApp application, it actually -it actually your entire phone book and sends the entire phone back to the WhatsApp server in order to check whether any one of your contacts with that phone number is already registered with the WhatsApp service. So the simple fact that an application sends out private information does not make it malicious in the first place. The question simply is -- the question -- the question is more whether the behavior comes as a surprise to the regular user or whether it does not come as a surprise to the regular user. So we could say if an application does something which is covered -- which happens somewhere in the background and in particular which is unadvertised, any of these behaviors, this is -- this is what we're looking for, and this is what we, as regular users, would be surprised. So -- and this is something which is particular true for this London restaurants application because London restaurants, this is a travel application, travel and mapping. And while the travel and map application would, for instance, happily access your current location, that's pretty normal for mapping applications. Accessing your mobile phone number is not part of what a travel application actually has to do. And, therefore, accessing such account information and sending it around is abnormal in this particular -- in this particular category, which is different from the messaging category. So what we do in this -- so what we do in this work is as follows: We have set up a -we have set up a tool chain which is named CHABADA, which is stands for Checking Application Behavior Against description Of Application. CHABADA is also a French word. Anyone French here? It's also a French word denoting a specific rhythm in jazz. Of. I think there's a song, CHABADA, CHABADA, CHABADA -- this is a famous French film music for the '60s and a man and a woman, '61, I think, and this is where this specific rhythm comes from. So if you're interested in French jazz, you'll recognize CHABADA. And the last time I gave this talk, there were two people from France who noticed that and who immediately burst into laughter. So apparently it strikes a chord. Oh, okay. Good. We've had enough puns now. Good. So this is what we're doing in CHABADA here. We are taking a number of apps. They're collecting a large number of applications. We are grouping these. We are identifying the topics that these applications deal with in their description. Then we cluster them according to the topics -- according to the topics that are dealt with in their description. And from that, within each cluster, once we have found that we have a number of applications that all are -- that have a number of applications that all say belong say to travel and mapping and map themes, then we look at the APIs, these individual apps are accessing, APIs as proxies for their behavior. And from that, we look for outliers if there is any application in there, in that cluster which has different API which accesses different apps than the other apps in that includes, then we want to identify it as an outlier. So this is what I'm going to show you today in this talk, including the outlier detection. >>: I have a clarification question about the clustering. >> Andreas Zeller: Yes? >>: Can I think of each topic as an adjective, any number of which can be applied to a particular app? >> Andreas Zeller: I'll come to that -- I'll come to that in a second. I'll identify the individual steps. And then I'll be happy to get back to your question. Let's go and -- let's go back and check the individual steps in here. The first thing we did, well, in order to do that, you need a collection of apps, which is what we did. So we spent a total of four months running a script that would download apps from the Google Play Store. We checked the over 30 categories -- we checked all the 30 categories in the Google Play Store and would download the 150 apps as well as their metadata which is the description, the rating, user comments and all of that, which is something we did last winter and last spring. We only looked at free applications, because for simple reason, my university cannot afford to pay for plenty of expensive apps, so it's just the top free apps. This gave us collection of 32,000 apps. Actually you have to run this script for quite some time because if you download more than a few dozen applications a day, Google's going to block you out because they would reasonably assume that you're -you're simply going to -- that you're a regular customer. And we actually made a data package which contains all the metadata of all these apps. And this is available on the Website. I'll come back to that. And I'll come back to that at the end of my talk. So this was our -- this was our collection of applications. And this is what we wanted to -- this is what we wanted to analyze, and this is where we wanted to -- these are the apps we wanted to learn which descriptions normally correspond to which behavior. Next thing is so you would get all these -- we get all these natural language descriptions of applications. So what you need to do next is you need to do some natural language pre-processing on this. For instance, you have to get rid of stop words. These are words that commonly occur in English language such as you, for instance. You don't care for the -- and you also have to normalize the individual words. For instance, turn plurals into singulars and likewise such that you get a vector overlap. So this is what this looks like. So see you are in all these stop words, all these common stop words would be removed. And this then gives you a vector of words that describes what the application is, what the application is doing. And these are the words then within each -- within each description that characterize the behavior of the application. On this -- on these sets of -- on these sets of stand words and description we would first eliminate all the applications whose descriptions were, well, close to being empty, which actually eliminated a third of all applications because they were literally close to no description at all in there, which is something I find surprising. But apparently people download these apps even though they have no description. And then we used standard technique for topic analysis which is called Latent Dirichlet Allocation, LDA, which would take these -- which would take these words and group them -- and group them according to -- and group them into topics that frequently occur together. So you take the word that frequently occur together and you play LDA to put them into 30 groups. And this is a standard technique. So for all the natural language processing we didn't invent anything new because we are -- well, we're not exactly natural language processing scientists so we stick to all the standard techniques as -- well, as we always do when we are venturing in different -- when we are venturing in different areas. These are the individual topics that we came up with. So we -- so we had 30 topics overall. We gave them individual words. But these are the most representative words in there. So here's galaxi, nexu -- you can identify the brand names in here, device, screen, effect, install, customize, so these -- there's a cluster -- there's a cluster of topics that all relate to personalization. Game, video package -- game, video, page, cheat, link, and so on, this is for games and cheat sheets. Slot machine, money, poker, currency, market, casino, coin, finance. This is all related to money. And so on. So we had a total of -- we had a total of 30 topics from top to bottom that all relate to various aspects of -- various aspects of topics an individual application would address. Yes? >>: I have a question about how these apps are categorized in the app store? >> Andreas Zeller: Yes. >>: So I'm wondering is whether there are categories like this already assigned to these applications? >> Andreas Zeller: Yes. >>: I suspect there are ->> Andreas Zeller: Yes. The application -- >>: [inaudible]. >> Andreas Zeller: The applications actually already come with categories, that's right. There's a librarian which does that. But there's two things. For one thing -- for one thing we would have a -- -- we would -- we would find clusters that you would not normally find in a store. For instance, we had a -- we had an ads topic in there, which is we had applications that from their description make clear we have plenty of ads. So that's clear because we have words like notification policy, share, push, advertise. All this is in there. But in a regular store, I very much doubt that you would find a category of ads because normally, well, this would imply that you, as a user, would specifically search for that category which normally you don't. No, I'm not too aware of this. So -- so I wanted to have something that -- when we actually needed something that would abstract away from the category shown in a store, the second reason for that is that the categories as we found them in the store did not suit our purpose very well. But I'll come to this later in the evaluation. Yes? >>: The better known app is the fewer words in its description? >> Andreas Zeller: Could be, yes. If you have something like Angry Birds we can immediately identify what's about Angry Birds. And Angry Birds is in there, too. What's happening with the screen here? Oh, here we go. Touch screen to make [inaudible]. >>: More generally maybe texts ->> Andreas Zeller: Yes. >>: -- don't have enough of the context information to identify what an app is supposed to do. >> Andreas Zeller: Yes, that's right. We can -- in the longer run, you can -- you can also imagine that if you have some sort of brand names like Angry Birds for instance you could actually say bing the whole thing and then check for reviews and check for additional information about what would be commonly -- what would be commonly associated with that. >>: If you go to Google Play Store ->> Andreas Zeller: Yes. >>: And you click on app, you see similar apps? >> Andreas Zeller: Uh-huh. >>: I've seen Google does some kind of ->> Andreas Zeller: Yes. This ->>: -- analysis to cluster ->> Andreas Zeller: Yes. They -- Google can do that because what they do is they know who -- they know which applications are downloaded by individual users. And so they can recommend individual apps to you based on the download history. They would tell you people who downloaded this application also downloaded these other applications. This information we do not have. So all we had with the description, we would not know the individual purchase history of individual customers. >>: But I'm looking at the Google Play -- but I can look into Google play Website right now and we can see that here is a similar app. So they're all kind of similar in terms of functionality? >> Andreas Zeller: I -- I very much assume that Google employs a number of librarians who actually -- who actually take care of provisioning this store and organizing it. And given the results we have, this is not always clear that Google has that. Because we found -- because like I come to that in a second. But in principal, yes, you would assume that a well-kept app store would have a number of people who actually take care of filling the shelves and organizing things. But what we found in the Google Play Store was that Google apparently does not very care so much about what they're actually offering in there. Okay. For the London restaurant -- so every application could address multiple topics. For the Google -- for the London restaurant application, this would end up in navigation and travel. This was the first topic. It's also about food and recipes. And it's about travel. So this is the -- so this is what we -- what we get now for every application we have a set of topics that this application -- that this application addresses. Next thing in here is we would go and we would cluster these applications. We would cluster these applications according to the topics they deal with because applications would deal with multiple topics. And the idea here is we wanted to have groups of applications that are similar according to their descriptions. And for this we used a standard technique, a K-means technique to identify such clusters. And for K-means you need a number K in order to identify the number of clusters. And there again, we used the standard techniques, the so-called element silhouette to identify the best number of clusters. And the clusters -- again, we have 32 clusters in here. The clusters are very much in line with the individual topics. So if your most dominant topic, for instance, was puzzle and card games, we would have an appropriate -- we would have an appropriate cluster for that. We'd have a cluster of applications memory puzzles for music, music videos. Religious wall papers was a category -- was a cluster which I was not aware of, but apparently there's hundreds of applications that do nothing but display your -- display your favorite -- this is display your favorite religious symbol somewhere or customize your screen in order to reflect the -- so all of this is in here. We also have -- we also have adult photo and adult wallpapers for those of you who are interested. This is -- I'm happy to report that this was joint work with -- this was joint work with Allessandra Gorla, Ilaria Tavecchia and Florian Gross. Allessandra and Ilaria were in charge of looking into all the individual applications and check for misclassifications. And they -- and these two girls were the ones who checked out all the individual adult applications. We had plenty of laugher from their room. And we, as men, wouldn't want to know anything about what they saw. So, yes, you get -- well, you get the full breadth of human activities reflected in these applications. Okay. So these are now 32 clusters. And now in each of these clusters we would have individual -- we would have -- we would have a set of applications. For instance, the personalized cluster, these are the individual words as they occur -- as they occurred in the descriptions of the personalized clusters. The personalized cluster is about themes and launchers and lockers and choosing and design and whatnot. Okay? And this is very different from, say, the travel cluster. The travel cluster would be about maps and information and searching and travel of course. So these -- and this is how you can see already from the word clouds that the applications in each of these clusters are very different from each other. So now comes next part. For the applications in each cluster we wanted to identify -wanted to identify outliers as it becomes to behavior. And as a proxy for the behavior, we simply looked at static API usage, which is -- which is fortunately not difficult because each APK, this is the binary form of -- this is the binary form of Android applications. You could simply -- you could use a simple static analysis to check which external APIs are being used. However, we did not look into all APIs that were being used, say all the string API calls and whatnot. This is not very interesting. We only considered those APIs which are sensitive, which means that they would be governed by a specific Android permission, meaning that the application -- if the application wants to use one of these sensitive APIs, it specifically has to declare so in its -- in it's manifest. So if an application wants to access the location, it has to ask for a specific permission. And this would also, in principal, be asked -- be presented to the user when the user wants to install the whole thing so the user has to explicitly grant the permission to have the app use its -- use it's -- his or her current location. So for London restaurants, the APIs being used looks like these. So these are all the sensitive APIs that are being used. So from sensitive APIs you can already see, well, it accesses web pages, it accesses -- it accesses servers by URL. It used the notification manager. It checks whether WiFi is on, whether the network is connected. So all of these are sensitive APIs. And these sensitive APIs are governed by a set of permissions. These are the appropriate permissions. And you can see in here. So it needs to access the WiFi state, network state. It needs to access the current phone state and among others, it also needs to get access to your account. Well, that's what it claims -- that's what it claims it needs. And if you now -- if you now look at all the APIs that are being used in travel cluster or specifically all the permissions that are being used in here, you see that these are also pretty -- these are all very characteristic for the individual -- for the individual apps. So all the applications in travel cluster typically need to access your current location, your precise current location which is obvious, and they also need access to the Internet. This is what makes a travel application. Whereas a personalized cluster has means to access the external storage. It works over the network. It is able to turn on and off individual components. This is what makes it personalization. And so you can see that not only do these clusters very much defer by their descriptions but they also very much defer by the APIs that are -- by the APIs and the permissions that they commonly need. So this is what we have now. For each application in our cluster we can check the individual APIs that are being used in there. And now comes the next part. We check for outliers in -- we check for outliers within each of these clusters. So, again, here's a travel cluster with all the permissions of APIs used. And this is generalized for all applications in the cluster. And now let's take a look into the London restaurants. This is one of the applications in the cluster. And you can see that there definitely is difference in here because London restaurants uses this -- uses APIs which are related to the get accounts permission which are APIs which are not used by any other application in the travel cluster. Which means that this is an outlier and these three API calls which are in bold in here are precisely those API calls which are unusual for this, which are unusual for that cluster. So the question is how can we identify -- so how can we identify such outliers automatically? For this purpose we used a so-called one-class support vector machine, which is a variant of the well known support vector machines which are commonly used for classifying. But in this case, we use -- we use it for a different purpose, namely for identifying outliers. For each APK, we put in a vector of -- for each APK in the cluster we put in a vector of the sensitive APIs and the number of call sites for that API. And what OC-SVM does is, like any SVM it creates a plane -- well, an N dimensional plane that tries to get as near -- tries to get as near to the individual applications as possible. And then you can identify the distance between that plane and the individual applications. And from that distance you can identify, well, here's an application which is far away from the norm, which is an outlier which is different in its features and, therefore, this is a likely -- and therefore this is a likely outlier in here. So -- and this is also precisely what OC-SVM would identify in here. London restaurants would be properly identified as an outlier through by its usage of APIs it would simply defer from the norm. So -- and this is-this is what CHABADA gets you in the end of all these steps are -- all these steps are fully automatic, I should say. In the end what it gives you is a list -- a ranked list of outliers with respect to the API usage within each of these clusters. So the question is does this make any sense? So what are -- what are the outliers that -- what are the outlier -- what do the outliers look like that you actually get there? This was -- this was fun work because the -- because you get to see a lot of apps this way. So what we did was a simple thing. In each of our 32 clusters we identified the top five outliers, and then we looked at them. We identified them. We checked what they would actually do and how this would -- how this would correspond to the description. And I can tell right away that 26 percent of all these outliers showed behavior that, A, used sensitive APIs; B, was not advertised; and, C, acts against the interest of its users such as, for instance, accessing sensitive information, propagating this without being advertised in any place. >>: How did you know it was doing this? Do you use the app or do ->> Andreas Zeller: We use the app. We started the app. We run the app. We use the -- we use the standard debugging techniques in order to identify which information was accessed, where it went. So this was a bit of -- this was a bit of standard -- well, so good thing was that these applications were not malware, per se, so they didn't use any sophisticated obfuscation techniques to disguise their behavior. So far so good. So by using standard techniques, debugger and checking which APIs are being used, we could very quickly find out what they're actually doing. >>: So I might be getting ahead of you, but I'm curious what the other 75 percent ->> Andreas Zeller: Yeah. I'll be getting -- I'll be getting that in a second. >>: So related to -- so [inaudible] this question. So a lot of these apps are interact apps. I was wondering how you bug these traces automatically? >> Andreas Zeller: We wouldn't -- so someone phone for the evaluation, this was manual work. So we actually interacted -- we actually interacted with them. And we checked whether the -- whether we get in the outlier description, whether this made any sense. >>: But you have to [inaudible] application that you are trying to classify based on their run time tracers, right? >> Andreas Zeller: No. So far -- so far what we did is all automatic, A; and, B, it runs on -- we are checking API usage statically, so all of this does not require execution. But, in fact, we assume that in the long run, a simple stamping of APIs will not suffice. So we're looking into more fine-grained stuff. And I'll come to that in a second. And you also had a question? Or let's just proceed a moment here. >>: So you don't have the source code, but ->> Andreas Zeller: We don't have the source code, no. >>: Also statically how do you make [inaudible] using APIs? >> Andreas Zeller: We simply -- we simply know that it declares that it's -- that it has a dynamic link to the API, so that's what it is. Number two, it uses the -- it requires the proper permission to use that very API. So the April usage is not there by accident because -- because others they wouldn't have to declare the sensitive information too. So these are -- but, of course, we don't know whether the API is actually ever being called. >>: So this is about declarations and ->> Andreas Zeller: Yup. You can think of -- think of a binary, which is dynamically linked. And you have an external reference to -- external reference to some API. And this is precisely what we're identifying there. >>: So I'm confused on what the [inaudible] vector machine was. Because she said a number of calls ->> Andreas Zeller: Yes. >>: Okay. So what is that number? If you don't have access to the source to see how many calls to the API are made, how many ->> Andreas Zeller: We know the number -- we know the number of call sets because you can do a simple binary analysis to identify the number of calls -- the number of jumps in the -- you look at the bytecode, yes, straightforward. Nothing more complicated than that. And if you ask me whether the number of call site is needed or not, I would love to tell you that, but I don't know. They start with the number of call sites and that's what we had in the beginning. >>: Sort of [inaudible] anything like any API call you use you have to tell us about that? >> Andreas Zeller: No. But if you want to access the sensitive API, you have to declare this in the manifest by requesting the appropriate permission. >>: Just click okay, I guess? >> Andreas Zeller: Users generally as far as I see click okay to even the most -- click okay even when it's only the most outrageous -- even when asking the most outrageous behavior. So we have that experience in malware. There's an Angry Birds game. You start it. It asks we want -- we want you -- we want the ability to send SMS. And you wonder why do you need to send SMS? Because that's how we finance ourselves. See if you like the game, after a couple of games you can send an SMS and to some premium number. This is why we need it. Only for that, of course. What the application really does it starts churning out premium SMS the very moment you start it. So -- yeah. Well, it's not always easy. Okay. Let's come -- let's come to some of our -- let's come to our stuff, what we found in there. First and foremost outliers. We found plenty of applications that use ad frameworks. There's apploving and airpush. These are -- well, many of these -- well, many of these free apps finance themselves through ads. So far so good. What's interesting, though, is these ad frameworks give you, as a developer, the higher percentage of revenue if you allow your app to access more sensitive information. Meaning that -- meaning that the more sensitive information your app accesses, the higher -- the more money you get from your ads. >>: That's crazy. >> Andreas Zeller: I know. That's ->>: [inaudible] more effectively? >> Andreas Zeller: That's simply because they gain more information about you, which is -- which is more marketable. So if they not only know -- so if they -- if you just send an ad to some anonymous user, yeah, you get a small share. But if they know where you are, if they know who you are, if they know what's your mobile phone number such that they can text you messages or whatnot, yes, you get a -- you get a higher amount of money. That's the -- that's the sad reality of -- that's the sad reality of ads it gains. That's what we found in here. We also had strange behavior in there. We had the UNO game, for instance. What was that, the UNO game requires access to your microphone and extensively asks so. So there's no -- there's no future in the game which actually does that. But maybe -maybe this is targeted to some head of state. We don't know. You install UNO on your -- your ->>: There is a part of that game, at least the physical one, where you actually are supposed to yell out UNO. >> Andreas Zeller: Yeah. That's right. That's right. That's perfectly fine. [brief talking over]. >> Andreas Zeller: I have that in my notes. One second. Here we go. Wonderful. What else do we have? WICKED -- WICKED is the official application for the -- no, sorry. I was confused. WICKED is the application for the musical. This is what -- this is the one that accesses your microphone. UNO needs your location. Possibly to figure out whether you're sitting in front or which side of the chair you're sitting. I'm not sure about that. Yahoo -- Yahoo mail, this is Yahoo mail, is also able to send text messages. It asks for text messages and apparently it can send them. But, of course, this is -- this is not advertised anywhere. >>: Some of the mail people use text messages to -- as verifies. >> Andreas Zeller: Yes. That's right. That's right. But then you would -- you would have at some point during your user interaction or in the description you would have some message -- some notice that tells you that this is actually the case. Nothing of that found in these apps. >>: Did it actually send SMSes when you were using it? You said you ran these up ->> Andreas Zeller: No. >>: You didn't actually ->> Andreas Zeller: We didn't see -- we didn't see -- we didn't see if anybody explicitly requested the permission to send out SMS messages, I have no idea why -- I have no idea why -- why this is in there. Okay? We also had applications that didn't fall into the right clusters. A SoundCloud is an example. This is an application where you can actually record audio and send this around. And this ended up in a sharing -- in a sharing cluster. However, recording audio is uncommon for sharing application. This actually means that SoundCloud actually is a mix of somewhere between audio and fairing and -- well, it had to fit into one of our clusters. So that's obviously room for improvement. We also had a an application called hamster life, which ended up in religious wallpapers and promptly triggered [inaudible] became an outlier in there. We have no idea this is the case. And here's my favorite example. This is Mr. Will's Stud Poker. This ended up in the -this ended up in the game and money cluster. And it was the only application in that whole cluster that would not display ads. And since it would not display ads it was immediately identified as an outlier. So -- so what you see in there outliers does not mean it's good or bad, it simply means it's uncommon. It shows uncommon behavior. And, well, if it does not show ads, then this is uncommon behavior. >>: Okay. Remind me what the features are to the SVM here. Is it just the APIs ->> Andreas Zeller: Just the APIs. The set of APIs, yes. And -- and this Stud Poker game does not access the Internet. And but what you normally need for access application adds ->>: [inaudible] but it's the fact that -- >> Andreas Zeller: Exactly. Exactly. Yeah. >>: Okay. >> Andreas Zeller: That's the thing. So, yes, so you can also have outliers in these clusters which are simply uncommon behavior. But in this case it's actually one good behavior and a whole set of bad behaviors in there. And speaking of bad behaviors, we also did a second run evaluation and that's even more fun. We also took a set of known malicious Android applications and checked whether these applications -- checked how these applications would end up, whether they would end up as outliers or not. So for this what we did was we trained the OV-SVM now as a classifier on 90% of benign apps which we downloaded from the store, which was fine. And then we had 173 known malware apps in which we used the OC-SVM as a classifier and then we wanted to see how these benign apps as well as our known malware apps would be classified. And this is what we got out here. So these are -- all these, of course, are average numbers over multiple runs and multiple splits. So from our malicious apps -- from our malicious apps the majority, vast majority was also predicted as malicious. Smaller part was predicted as benign. And our benign apps, larger majority was also predicted -- was also properly predicted as benign. And percentages, 84% of our benign apps were correctly predicted as such and 56% of our malicious apps were also properly predicted as such. And this is -- and, of course, if you do have some -- if you do have a malware scanner on your Android app, you will get much better results. But then such malware scanner actually checks against patterns of malware that are already known. All right? So it has a history that it can check against. Whereas we can take an arbitrary app which nobody has ever seen before and the description and these APIs and find out that what it claims to do does not correspond to what it actually -- what it actually does in terms of API usage. So this -- as a -- as a sole detector of malware, this would not be enough. But if you combined this, say if you combined this with patterns of known malware, if you bring this together, this can be an excellent -- this can be excellent combination. Checking against new apps which nobody knows anything about, just checking what they claim to do and what they actually do. Here again, this is a correct classification with clusters. We also check the individual components of our CHABADA approach. So this is without clusters. You see that without clusters the detection rate of malicious apps immediately goes down. So all of this clustering, according to descriptions actually -- actually is needed. And also you asked beforehand the Google Play Store also has categories, that's right, which is what we did too. And we checked for the categories give from the Google Play Store and you can see that our clusters also are better than what we find in the Google Play Store. Also we require less because, well, we don't have -- we don't actually need any such -- any such categorization in the first place. So this is what we get in our -- what we got in our second -- what we got in our second evaluation. So we can improve or even malware detection, which is good. We're better than the given categories. And we don't require any prior knowledge about what makes malware or not. >>: I just have a quick question about the -- this evaluation. >> Andreas Zeller: Yes? >>: Where did you get the 175 malware apps? >> Andreas Zeller: This comes from a standard paper. I think the name is Android malware dissected where researchers -- I think this was one or two years ago where they brought together a collection of malware which they made available for download which was the precise at which we used in there. However, we only looked at those malware -- we only looked at those applications for which we could actually find a description. Including copies of Angry Birds where we actually looked up the official description from the play store for Angry Birds to associate this with a name. So this was relating to the question you had earlier. So you find something that's called Angry Birds APK, okay, and so we associated this with the description of Angry Birds from the Google Play Store and used this as an -- and used this as a classifier for the malware. Okay? And then what you would find then is -- so if this APK would be -- would be nothing like Angry Birds, nothing at all, would not even look like a game at all, it would immediately be flagged as an outlier, but even it would fall into the correct cluster based on its description and then, again, we would identify -- well, it does a couple of things differently from the other game and again would be flagged as an outlier. And that's -- and I'm not saying that this is going to be the ultimate tool in malware detection. It won't be. But if you have something that's totally new and nothing else but the description and it's behavior, this can help -- this can help you a lot identifying the worst offenders in the first place. Okay. A bit of current work. And we're running out of time. This is -- as I said, we're not experts in natural language processing. We can do much better than that. One can look -- one can make use of ontologies. One can make use of -- what's the plural of thesaurus? Thesauree? Synonym dictionaries. Okay. That may be easier. That's things we can do. We can also go and create implicit explicit mappings between topics and behavior, figuring out here's a topic and these are the APIs that are -- combinations of APIs that will be associated with that topic. We can look for much better features of individual -- of individual applications. Information flow would be -- information flow would be our candidate -- we're heavily investing into that. And also checking whether any of this behavior is actually -- is actually authorized by the user. Something like a dialog pops up asking we'd like -- in order to continue play, please allow us to send this text messages to this premium number which is going to charge you by so much, are you okay with that, yes, no. At this point when you say yes and all the information is there, it's perfectly legitimate at some point. But this is also something we don't see at this point. Okay? This is part of -- this is part of looking into the information flow. Problem with these -- problems within -- with these later points is as you already pointed out, if you do -- if you do -- if you want to do all of this by static analysis, you run into problems because the app -- including some of the apps we found. And definitely malware does that. You have plenty of features that make static analysis very hard. I mean, as soon as you set up an interpreter of your own of a language of your own or even subset of language of your own, your static analysis essentially stops working. You also have -- you also have instances of code that get side loaded, that gets created on the fly or that is just normally obfuscated, so all of this will make your life hard if you do static and symbolic analysis, and in practice this means that -- oh, yes? >>: One thing that I thought you were going to mention on the slide is what about a sound recorder application ->> Andreas Zeller: Sir? >>: What about a sound recorder application that also spies on you ->> Andreas Zeller: Yes? >>: -- so it has a genuine purpose for using an API, but it also has a ->> Andreas Zeller: Yes. >>: [inaudible]. >> Andreas Zeller: Well, the thing is if the sound recorder does what it says, it says I'm recording sound and don't install this -- only install this on your own device, never install this on one else's devices we can only tell whether it does what it claims to do and what -- and that's it. But ->>: [inaudible]. >>: [inaudible]. >> Andreas Zeller: We can only tell you beforehand this is what it claims to do and it does this well or it won't. But what you -- the use you make of it is not something we can check afterwards. Yes? >>: What can we make remote code? >> Andreas Zeller: A remote code? A remote code simply means there's some sort of behavior which happens on the server which you don't know. You evoke some service on some server. The result comes back and you have no idea what's happening, what's happened there. >>: So much of your analysis is based not necessarily on what's in the bytecode or what's in the manifest. >> Andreas Zeller: Yes. >>: Because the manifest tells you what capabilities application ->> Andreas Zeller: Request you mean? >>: Request. And if it doesn't ask for that capability, if it's not able to call those certain APIs ->> Andreas Zeller: That's right, yes. >>: -- are they able to call those APIs using any of this tricks, or does Google somehow prevent this because they do dynamic monitoring to make sure that ->> Andreas Zeller: Google does some dynamic monitoring, yes. But Google does some dynamic monitoring they have an application in place which is called bouncer which actually will -- it actually clicks randomly around on the application in order to see what it's done, what it's dynamically doing. But if you wanted to set up any covert behavior in your app, the simplest thing to do is to check the network you're in and if your network is Google's network you behave just perfectly. And as soon as you leave the Google network, that's when you can -- that's when you would start your bad behavior. >>: So -[brief talking over]. >> Andreas Zeller: Huh? >>: So an app could use those sensitive APIs ->> Andreas Zeller: Yes. >>: -- without having asked for permission? >> Andreas Zeller: No. No. No. That's not true. They have to ask for that. But what you would normally find is they would come up with some legitimate use for using these apps. If you wanted an application to spy on you, for instance, okay, say using a sound recorder you would claim that your reputation has this cool feature of voice command, okay? Say -- and now new in this release control your character by voice command. And, of course, then you need to enable the microphone. And the thing is, once you have -- this only happens once during installation. So there's no dynamic check afterwards. With iOS there's difference. So the very first time you access an API, that's when you'll be asked whether you want to use -- whether you want to use the microphone. And then this would be a bit different because you would be asked -- the application would ask you in context, okay? So you would start -- you would say in your settings, you would say enable voice commands and then you would get a dialog from the operating system asking you whether you want the application to -- whether you want to enable the application to access your microphone, which means only at -- only at this point would it actually be able to do that. And in Google this is different because -- well, you give all these permissions at the moment you install this. And you may even forget about them afterwards. So not all is well in -- not all is well in these categories. And the fun part is all these mobile devices, all these wonderful sensors and all this sensitive information and all -- they will soon be able to access your heart rate and your pulse and your blood pressure and whatnot. >>: Your fingerprint. >> Andreas Zeller: Your fingerprint, yes, that's true. And all that. Yes, so there's this involved with that. >>: So one question. Sometimes an application in a particular domain may use some -- some senator [inaudible] whatever, and that's actually something that is able to differentiate that app. And so, like, UNO maybe uses the location because it's like, oh, find people really close to you that want to play UNO or something like that. >> Andreas Zeller: Yes. >>: When you looked at your top five outliers, did you ->> Andreas Zeller: Yes. >>: Did you check back with the actual app descriptions, not the LDA summarizations, the actual app descriptions ->> Andreas Zeller: We checked with a ->>: -- to see if it was a ->> Andreas Zeller: We checked with the -- so UNO was -- UNO was one of the outliers where we found -- where it accesses -- where it can access your location. But we would note be able to tell whether this was against your interest or not. So these 26 percent was only stuff where we found the difference explicitly against your own interest. There's another second category, I think which is another 14 percent or so where we found, well, it accesses your data but we're not able to tell whether this is a feature or this is a feature or something that is directed against you. Okay? So this is -- there is the second -- the third category was simply false positive, which is -well, something -- something that's unavoidable with this sort of analysis. I'd just like to come to a close. So this is -- so we're -- so we will -- so figuring out the relationship between what's happening on the screen and what's happening in the code is becoming increasingly difficult with static analysis alone. So what we're doing here is we are investing heavily in -- we're investigating heavily in test case generation, which allows us to interact with the application on this level and then check what happens on the -- and then check what happens on the code level and then use this whole thing to guide our -- to guide our -- to guide the test generation. And in the interest of time, what we're using here is -- what we're using here is text called search-based testing, which is an evolutionary algorithms which start with a population of random inputs which involve these inputs mutating them towards specific fitness goals. And even if you have a very simple way to measure the fitness, these algorithms will be effective. Which means that you don't have to deploy your full-fledged symbolic analysis on [inaudible] code but all it takes, for instance, could be a simple measurement which code is being executed and which not. And we are looking into various fitness goals in had here. Branch coverage. API coverage, meaning that we can guide test generation toward specific APIs. For instance, sensitive APIs. We can kind test generation toward specific information flows, which is very interesting. And all of these fitness functions are very easy to set up. I mean, it takes -- how long does it take? It takes about two to three weeks to set up an individual fitness function and then you will have your generate -- your generic test generator adapted to the specific runtime feature that you want to look at. And this is a demo we had this year at the [inaudible] fair. This is one of -- this is our robot which happily clicks along on android tablet. So we are well able, and we were able on this date to very -- to identify individual interaction elements, to click on them, to -- and to guide the entire process. At this point where we showed the whole thing, we only have a handful of applications on this -- on which this would work reliably. But by the end of this year, we're going to expand this to the full roster of applications we're having right now in our set. And then we will be able to look at all these dynamic features too. By the way, at the booth, plenty of people come along and ask, hey, you have a robot, that's cool. But why don't you do a -- but can't you do this in software as well? And we said, yes, we can do this in software. It's 10 to a hundred times faster. We can do a hundred times in parallel. And we can put it all into a black box which I will put up -- which I will put up here and it will be happily humming around. But it turns out that a black box that humming around doesn't make a good demo, so we had -- so this robot was -- this robot was a nice -- >>: [inaudible]. [laughter]. >> Andreas Zeller: Sure. Sure. I could just -- I should just hire -- I should just set up a corporation with this Chinese university where they have plenty of cheap work labor. Okay. And this is what we actually -- this is the overall theme what we're looking into. So for hundreds of thousands of applications. Possibly the totality of all applications there are. We want to be able to gather their description, they're metadata, but also gather static features, gather dynamic features as many as we can. And then find out what's common in there, what's uncommon in there. And in particular, I'll learn the common behavior such that we can more easily identify what would be a surprise for future users. And given that we now have these wonderfully plated, these wonderful populated stores, hint, Microsoft also has them I think, and given that so far the field of -- given that so far when we in research mind individual projects, we looked at the version history, which is fine, which is very detailed, but we only had access to overall, a handful, two handfuls of projects. This is fine because we get all the detailed development information. But with this you get an entire new access where you can look at hundreds of thousands of applications and see what's common in these applications. Not so much the development history, but how they behave and -- but how they behave and what users can reasonably expect from them. And with that, I'd like to close. I've shown you our CHABADA approach which identifies outliers based on descriptions and APIs which identify such outliers reliably, plenty of key findings and a full data set, 200 megabyte XL file to downloaded with all of the -- all of the metadata in there. That's it. Thank you for attention. [applause]. >> Andreas Zeller: I'm happy to take more questions. >>: Even though the application that mostly makes benign use of your personal data could occasionally misuse it, right, I mean, your approach will not be able to [inaudible] approach ->> Andreas Zeller: Yes. >>: Do you think there is a need for application developers to actually prove that they're making only benign use of once personal data because they can't publish their apps ->> Andreas Zeller: Well, they ->>: [inaudible]. >> Andreas Zeller: The thing is that it won't be easy to tell what is benign and what is not benign. This is going to vary tremendously from between users, also between social contacts. I remember when Apple came up with the app store in Germany, there was lots of outrage because Apple would sensor any application that would show a nipple, for instance, which in Europe is not that much of an issue. Of course it is in the US. And so would you -- and the question is, if an application does that, would it be considered malicious, would it be considered benign? This very much depends on your culture context. So ->>: [inaudible]. >> Andreas Zeller: So ->>: [inaudible] sending your e-mail to an advertiser, for instance ->> Andreas Zeller: Yes. >>: -- where the app should declare that to the ->> Andreas Zeller: Yes. So normally -- so normally I would assume that there would be -- so what I'd like to have is that for every sensitive information, an application accesses there should be some part of -- there should be some implicit declaration and expression of what it does and why it does that, and you can think of this as a simple form of consumer protection. Okay? So that's something I'd like to see in our app stores. I don't care -- I actually don't care that much whether an application is benign or not because this is -- this is very hard to tell. I just want to -- I just want to make sure -- I just want to make sure that if an application -- well, in particular for free applications, okay, which is where -- where it's obvious that they have to -- they have to come up with some revenue model in order to finance this development, but it has to be evident to users what these applications do and the risk that come witness. And this is something we don't see in the -- in the Google store at all. There is essentially no regulation at all. So -- and I'm also not sure what consumer protection agencies think of the -- think of what's being offered there. Right now it's wild west. That's all we see. And from what I see is that many of the outliers that we have identified have been taken down by Google since -- in the meantime. Not because of our work but actually because users complain. That's a fine policy. But I would very much prefer if such actions took place before users start to complain about what -- about the damage that's being done. So that -- that's what I'm aiming for. Not so much coming up with better definitions of benign and malicious but protecting users against what they can -- against surprises they can find in their -- in these stores. And if you protect them against surprises, well, as a side affect you also get rid of many of the malware problems, which is a good thing. >>: I mean there's a lot thing here but I have a very hard time sort of understanding what it is actually that you -- what is actionable in what you ever built. Because there is a lot here, right? And, you know, there is a lot malicious malware detection, especially for Android. But in some sense it's like analyzing [inaudible] applications, you know. Any time you look at one you find something. >> Andreas Zeller: Yes. >>: So the bar is at this point fairly high. >> Andreas Zeller: Yes. >>: And it's not exactly malware ->> Andreas Zeller: No. >>: I think the hard problem that is fundamentally unsolved actually was [inaudible] tries to address it is understanding intent, which is to say if there's a disclosure for instance of some sort, is that intentional, or is it not? And ultimately your point ->> Andreas Zeller: Yes. >>: -- cases where it's not so obvious. Now, how to present the purpose of a particular disclosure to the user when usable security has no [inaudible] every possible paper I've read about it has all the negative results. >> Andreas Zeller: Yes. I ->>: So, I mean, frankly Android's attempts to have these permissions has made it so it's impossible for these [inaudible] ->> Andreas Zeller: Yes. No. What ->>: So, I mean, there's a lot here. And certainly, you know, outlier detection is, you know, profitable in a number of ways as [inaudible]. >> Andreas Zeller: Yes. >>: But it's not like, you know, given the false positive rates in a sense. It's not a tool in a sense, right? You don't really present the [inaudible]. So what is actionable, I guess? >> Andreas Zeller: So for me there's two things in it. First, the approach is actionable in the sense that if you do have -- if you do have an application which you know nothing about, you can check it, you can find out that it does things that are unusual in the first place, which is the not enough to -- which is -- which may not be enough to kick it out of some app store instantly. But certainly it is something which should prioritize your scrutiny efforts on this one, okay? If you say that ->>: [inaudible]. >> Andreas Zeller: He will have yet another wallpaper app and it does nothing unusual, okay, just get it through. So user can use it for triaging. The second thing is -- and when you're talking about usable security, I think that if you indeed go and present users these onsets of permissions and what they imply or worse -- or worse description of information flow within applications here's the of course, here's the sync, here's another source, here's another sync and here's 134 possible mappings. Which ones of these are okay and which are not? Totally out of the question. I mean, even we as experts will have a hard time understanding that. And what I think -- and we're not there yet, but what I think is if you can leverage the common knowledge about applications and the expectations that are encoded in there, then you can say here's Angry Birds 2, it's the same as Angry Birds and it's the same as any action game you've ever seen as it comes in security. So this is safe. That would be a thing. So if you can, rather than coming up with entire at the time of things, if you can simply point out the differences between how this particular app is different from everything you've seen so far or what it enables -- the things -- the things that make this unusual, and if you have a way to catch this, if you have a way to grasp it, it will, I think there's a chance in there to make usable security much more incremental than it is today. >>: [inaudible] filter, a solution for some sort of [inaudible] but as far as exposing to the end user, the end users think that more permissions is better. Because ->> Andreas Zeller: Can do more with more permissions. Yes. >>: So you say that there's Angry Birds 2 that [inaudible]. >> Andreas Zeller: Yes. >>: [inaudible]. >> Andreas Zeller: Yes. >>: [inaudible] wonderful than the previous one. [laughter]. >>: What happens in these app stores when they deny the permission ->> Andreas Zeller: They won't get installed. >>: [inaudible] some operation. >> Andreas Zeller: Then they're not installed. >>: They don't install them. >> Andreas Zeller: It's -- it's -- it's -- it's -- it's all or nothing. It's an all or nothing. >>: [inaudible] installation time. >> Andreas Zeller: Yes, only at installation time. You're obviously an iPhone user, right? >>: Yeah. Yeah. [laughter]. >> Andreas Zeller: Different. They're very different for iPhones. Yes. >>: We have recommendations on the whole permissions model, right. When I'm asked ->> Andreas Zeller: Oh, that's right. That's right. >>: I mean, it's so easy to be [inaudible]. >> Andreas Zeller: Yes. So what I'm -- this our very first attempt actually to extract such common behavior from applications. I don't think it's going to be a -- I don't think it will be a solution to the malware problem or to privacy issues in [inaudible] but it's -- but I think -- but I think we're in a good direction here. >>: [inaudible] problem in that malware as it's termed is slightly misused in this [inaudible] privacy violates. >> Andreas Zeller: Yes. >>: And what privacy violating is unclear because [inaudible] fundamentally unclear, right, and so this is kind of a bit of [inaudible] fashion. If people knew that it -- then you can game it by ->> Andreas Zeller: Oh, yes. >>: By generating loading ->> Andreas Zeller: Yes. >>: With certain phrases. >> Andreas Zeller: You could -- you could ->>: So that you're -- so that you're not an outlier anymore. [brief talking over]. >>: Yeah, that's right. >> Andreas Zeller: But then this is well known if you advertise your house for sales, for instance, and you say this is in a district of the town which is known for it's vibrant night life, whatever, okay, you can find euphemism for whatever is going on. >>: Basically if the outlier is getting somebody's address -- >> Andreas Zeller: Yes. >>: -- that's why these phrases, then what you can do is generate a thousand apps ->> Andreas Zeller: Oh, yes, yes. >>: -- back into those phrases ->> Andreas Zeller: Yes, of course. >>: With that call. So even though they're not actually downloaded, it now makes it no longer an outlier. >> Andreas Zeller: That's -- that's right. Once you have such a system in place, you may be able to gain. >>: Yeah. >> Andreas Zeller: But then this is true for -- but then what you do next is you would actually -- you wouldn't do this for the totality of all apps that are offered, but you would do this only for apps that are actually being used. And then coming up with a thousand apps that are -- they're only for -- for generating new common behavior would help you. So but, of course, in malware this is always an arm's race at some point. It can always be something new. It can always be something different. There you go. >> Tom Zimmerman: Thanks. >> Andreas Zeller: Thanks. [applause].