>> Ben Livshits: Okay. Let's get started. I hope that we are on the video. I'm Ben Livshits. We have a visitor today, Nick Nikiforakis from KU Leuven at the moment, but soon to be at UC Santa Barbara as far as I know. He's going to talk to us about web fingerprinting. Thank you. >> Nick Nikiforakis: I don't know about the last part. Thank you very much for being here, and thank you Ben for inviting me. So the title is Everything You Always Wanted to Know about Web-Based Device Fingerprinting but Were Afraid to Ask. I'm not very honest because there are a lot of things you would like to know and I will not cover because we don't know yet, but I'll try to give you a good introduction. I'm a postdoc researcher at KU Leuven. I work mainly in web security and privacy and I have done a few heap overflows in the past. What I like to do in the web is I like to look at the web as clusters of services and identify the ecosystems of these clusters. In there I enumerate the players, the interactions between the players and search for systematic problems within the services. This fingerprinting work is also a type of ecosystem work, so you can keep this in mind. You may have seen a similar slide in the past. This is an article from the New York Times and the content that is highlighted with red is all third-party content, so it's not coming from the servers of the New York Times. It's coming from various advertising servers and social networking servers and every time that you ask for content from a third-party server, the server has the ability to send cookies to you. These are called thirdparty cookies because it's not the New York Times that gave them to you, but a website or a banner within New York Times. When you go to another website that is affiliated with the same advertising agencies then you will send back the cookies they gave you and then they create browsing profiles of you, so they know that you went to the New York Times and to another 1000 websites and they can use this information to sell, to do better targeting of ads. This is not exactly new, but it's still very interesting because there's a lot of companies that people don't know about that are very, very popular in terms of being included over third-party in the web. We did a study last year for CCS for who includes JavaScript from whom and we found, for example, in the top ten there are at least three companies that are providers of JavaScript libraries. These guys they give some functionality to website administrators like analytics, but in return they get data. There's quantserve, scorecardresearch and addthis and chances are if you're not doing some sort of privacy related research then you don't know them, because simply people, they don't have a first party relationship with any of these three sites. This, as I said, is not really news. Today I want to talk to you and convince you that tracking involves more than just third-party cookies. For the purposes of this talk, fingerprinting is the ability to tell users apart based on the browsing environments without using any extra stateful identifiers. No Flash cookies, no E tax, no cache tricks, nothing like that. I'm going to present to you some of the results from a thorough study of current fingerprinting practices on the web, and I also hope to convince you that it's really, really hard today to hide the true nature of your browser and how this relates to fingerprinting. Today, people know more about cookies than they used to know. According to a 2011 study from comscore, a third of users delete their first party and third-party cookies within a month of them being set up. That's a problem because as an advertiser if you rely on cookies to track people around, then you sort of, once a month you lose your browsing profile for that user. Also, you have this increased interest in self-help extensions in browsers and you have Ghostery and Lightbeam and I will show in the next slide what they are and what they do. You also have the private mode of browsers that users can go in and out in order not to keep track of their data, of cookies and stuff that websites give for certain websites that they don't want to keep their data on their machines. This is I hope a relevant site for you, Redmond reporter.com and here you can see Ghostery, which is this plug in, this extension rather that has this long list of third-party trackers and it tells you what it finds on each page. Here you can see for instance that Ghostery found ten trackers on the Redmond reporter. You can do various things from here like check out what specific scripts they found from these guys and also blacklist them if you want to. Then you have Lightbeam which is from Mozilla and Lightbeam does something different. It tries to connect different parties in terms of tracking, so you see here for instance you have the Redmond reporter.com and you have this other Greek news website and they are connected through Facebook, so when you go to the sites Facebook will be included in both and Facebook will know that you as an individual went to the Redmond Reporter and then to the Greek news website.. Here is also CNN and you see a common connection, so as you browse the web, these connections increase and essentially if you leave this on for a while you see that websites are much more connected to each other that you would like to think. This is all about cookies, but what if today I could tell you that interested parties could track users without the need of cookies or any other stateful client-side identifiers? As a bonus, this is hidden from users so there is no dialogue for you to inspect of all the various cookies that are given to you and delete them, maybe. It's also hard to avoid it or opt out, so you cannot just click on something and say I don't want third-party cookies anymore because that's not relevant. So this is web-based device fingerprinting and this gained popularity first in 2010 from Peter Eckersley. He wrote a paper about how unique is your browser, and there he showed that certain attributes of your browsing environment can be combined in order to track you. He said that essentially if you combine them the right way, they create a kind of unique fingerprint for you. How does this work? So Eckersley said you come to my website and then I first ask your browser a couple of questions, like who are you? Are you Firefox? Which version are you? Which platform do you run on? Then if you have JavaScript enabled and you do know that almost all users do, then Eckersley would keep on asking a few questions through JavaScript, so he would, for example say what is the width and the height of your monitor? Then he would ask what is the time zone that you are currently located in, also accessible through JavaScript. Then he would ask for a list of plug-ins from your system, like the Adobe reader, your job applicants, maybe your Flash plug-in and then if Java Flash were installed, then he would actually get a list of the fonts on your system because Java and Flash actually have the ability, they have this API that they provide to the applications. You can ask for the fonts that are installed on the user’s system. Then he would also have some super cookie checks like cookie set for global storage or a local storage cookie. He found out that in the half-million users that participated in this study, from the ones that have Java or Flash enabled, or the ones that he could get fonts from 94.2 were uniquely identifiable. Essentially, almost no users, 94.2 had all these attributes the same. He also showed that you could use simple heuristic algorithms to track local changes in fingerprint. If your user agent changes but your fonts don't and other things don't then maybe I can infer that it's still you with an updated browser rather than a new person with a new fingerprint. Yes? >>: Do you believe that number, that 94.4 percent number? >> Nick Nikiforakis: I don't know. I'm just using related work to position myself here. What I can show you in connection to that is that Panopticlick is still available online so you can try it on your own machines, and I tried here this morning with my own and it said that my browser fingerprint appears to be unique in now 3.6 million users installed. If you kind of look at what goes into my fingerprint, you can see here, for instance, my user agent, which is this one, which it's not really that unique, so one in 310 users share the identical user agent, the same for your accept headers. But then this long list of things here, these are the list of plug-ins on my machine, not extensions, plug-ins. I didn't choose to install any of these. And here you can see that this is like a ridiculously high amount of entropy and essentially no other user has everything identical. These are the names of plug-ins, their versions and a human readable description, everything concatenated in one stream. You get a similar result for fonts. You see here that only one in 600,000 users have all these fonts identically as I have. That's what Eckersley said back in 2010 and that was very interesting. What can you use, what do you use fingerprinting for? The first and obvious thing is ads. There are no cookies for you to delete. There is no check for you to say I don't want third party cookies in my browser, so now I can just connect your browsing profile and instead of connecting to a cookie, I can just connect it to your fingerprint, so I can maintain the list of websites you have visited regardless of what you delete client-side. And I can do the same even if you enter in your private mode because your fingerprint does not change when you enter the private mode because nothing is different. Everything is still there in the same way. The second thing, and this is the more positive way of looking at things, is that you can use it for anti-fraud. Your bank is tracking you for a year and they know that you log in from a Linux machine and they know you use Firefox and they know you log in during the morning, for instance; they can add timestamp information. If suddenly you log in at night from a Mac from let's say Indonesia, then something may be wrong. Then they would say, please verify that it's you and it's not someone who has stolen your credentials. Then we found that some companies use fingerprinting for pay walling. There are websites, for example, are news sites where you can read ten articles for free but if you would like to read more you have to pay some sort of subscription. If they would do that tracking with cookies, then you would just delete your cookies and you would read another ten and you would go on and on. However, they could use fingerprinting to do the pay walling, so that there is nothing for you to delete once you are done reading your ten articles because the fingerprinting is part of who you are. Finally, this happened in the summer, there was this attack against an outdated Tor browser and there was a Firefox vulnerability and the people analyzed the payload and they saw that it essentially fingerprinted a bit of the user’s system and it sent that fingerprint to a remote server. The most plausible theory at this point is that they were Feds that they were doing this, trying to identify which users from the Tor network are visiting certain shady websites or non-shady. There's a lot of interesting intrusive and less intrusive uses of fingerprinting. In 2012, which is when we started doing this research, what we knew is that what Eckersley had said and we also knew that there was like some companies that were quite vocal that they were offering fingerprinting as a service. What we wanted to find out is how are these companies doing it? So are they relabeling Panopticlick as their own product or are they adding something more to it? The question is then could they do more, could they, if necessary, could they fingerprint you more than they do today? Then we wanted to find out the user base, so which websites are buying services from these fingerprinting as a service companies and the question, then the last one was like how our users trying to hide. What do people do in order to protect themselves against fingerprinting and if it's working for them. This talk is essentially two papers in one. If you want to know more this is the first one, Cookieless Monster published in Security and Privacy this year and the second one, FPDetective published in CCS this year. So that's that. We started our work by analyzing the code of these three vocal companies that said that we offer fingerprinting and you can use it for all cool things. This is what we did. We first found the domains that they used to serve the fingerprinting scripts. Essentially, most of them they advertise the fact that you can adopt fingerprinting in a very straightforward way. You just dump a bit of JavaScript code in your page and now you are fingerprinting your clients. Then we found some websites that use them, we extracted the code; we isolated it from the code of the website. We de-obfuscated and analyzed the code of these services and then we compared the code to each other and we created some sort of taxonomy to find out where every company stands. Of course, companies are not eager to share their fingerprinting code with us, so most of them this is actually the real part of the code that we had to look at manually. The results are that we were able to create this taxonomy as I said, compare the companies to each other and there were quite some interesting findings. The first one was that collectively Panopticlick was fully covered. Usually is that the industry is a bit behind academia. In this case they were really up-to-date. What Eckersley had said in 2010 they were offering in their fingerprinting services. The classification, the taxonomy that we broke up is split up into five levels. You can start from fingerprinting things in your browser customizations, fingerprinting features at the browser level user configuration, fingerprinting your browser family and version, fingerprinting your operating system and applications and finally fingerprinting your hardware and network. Here you see these things here are all new things over what Eckersley was doing in 2010. For instance, for Internet Explorer, Internet Explorer does not share its plug-ins, so there is no navigated or plug in property in JavaScript that one can use to read all of the plug-ins. So what they were doing is they were having this very long list of class identifiers and they were just enumerating them one by one. Do you have that? Do you have that? Do you have that, in order to get, you know, a partial list of the class IDs that were installed on that browser. Then in the browser level user configuration we saw that they were actually, one company was tracking the do not track choice, so that's a bit interesting. They were reading the fact whether you wanted to be tracked or not and they were adding it as part of your fingerprint. A company was also reading math constants from your JavaScript engine and was incorporating those math constants into your fingerprint and we assume that they are doing this in an effort to separate JavaScript engines from each other, so if you are different in the floating points of something, then I may be able to identify that browser that houses this JavaScript engine. Then we also found, interestingly, that they were fingerprinting the Windows registry and TCP/IP parameters. The question is how do they do that because JavaScript cannot look into your Windows registry nor into your TCP/IP parameters and just stick with me and I'll tell you. So the nontrivial extras that we found is the first thing that we found non-plug in font detection. If you remember Eckersley, he had to rely on Flash or Java in order to get a list of fonts from your machine. However, one company was doing the following, all of this, of course, in an invisible I-frame. It was creating these long strings, for instance, I do not need Flash. They were setting the Ariel typeface and they were measuring the box around, the width and the height of the box, which essentially means the width and the height of the text. Once they get this number, then they have this long list of fonts and they keep on doing the same operation. Because of stylistic differences on each font family, the same string on the same font size it will add up to a different height and a different width than Ariel. Ariel is used by your browser as a fallback font, so if I ask for a fancy font and you don't have it, Ariel will be used to display the text. Every time that a font measurement was different than Ariel, it meant that the font was present and thus it was used to render the text in the screen. By doing this for like 200, 300, 400 fonts, they could get a list of fonts through JavaScript through a side channel essentially attack in JavaScript without needing Flash or Java. The second thing that we found and this is essentially, you know, how they access your registry, is that we found for two companies that they have native fingerprinting plug-ins. Once we are looking at the code then we saw that when they are checking for the plug-ins installed they have this specific check for whether a specific plug-in is present and if it is it is loaded and handed off control. We were able to isolate these plug-ins and analyze them using [indiscernible] and we saw that they were essentially plug-ins that were existing on the user system for the sole purpose of fingerprinting even better. These are not extensions, so these are plug-ins and they run with the same privileges as the browser process itself, so that plug-in could look into your registry and they were reading things like your installation date of your Windows, your device drivers, your IP address and your hostname and we were able essentially to find that these native fingerprint plug-ins they were usually bundled with something that you downloaded and silently installed in the back and essentially bundled with cousin applications and maybe Second Life type applications. The third thing is that they're interesting fingerprint delivery mechanisms, so how do you offer fingerprinting as a service? We saw essentially two different modes. The first one was that the remote code was brought in from the fingerprinter. It fingerprinted the user and then it added this fingerprint in the DOM of the first party page. So for instance, on IMVU when the user is sort of waiting for the page to log in so that he can type in his username and password, he is being fingerprinted and the fingerprint is added as a hidden element in the form, in the login form and once you click submit you are sending your username and password and your fingerprint to IMVU. The second mode was that the first party site was saying fingerprint the user. Here's the session identifier and then the fingerprinting service, it was fingerprinting the user but it wasn't sending the fingerprints to the first party service. It was sending it to itself and then as we understand using the session identifier of the service side later they will be able to say what do you know about user with that session identifier and they will get back information about that user. The final thing is I will talk to you was essentially proxy detection. There's this interesting thing. The thing that we saw is the following. We saw that for, I think it was two companies, they were loading JavaScript in Flash. They were creating these long random strings and exchanging them between each other and then they were just sending these strings to the fingerprinting server. What happens in JavaScript is you have, for example, an HTTP proxy, your request will go through the proxy. If this is the generated token which is exchanged through [indiscernible] Flash it will go through the proxy server so when the request reaches the fingerprinting server it will come from the IP address of the proxy server. However, Flash has the ability to open direct connections ignoring your browser level proxies, so you have another token that was sent directly to the fingerprinting server. Now the finger printing server, you see two requests coming in from two different IP addresses with the same long alphanumerical token. Now you can say okay. There's actually the same user. He's coming from two different IP addresses because he's using a proxy. This is his normal IP address and this is his proxy. You can incorporate this information in the fingerprint. If we move to adoption, this is how they work with using them. We crawled the top 10,000 sites quite shallowly and we are searching for inclusions from these three fingerprinting providers. We found at the time 40 sites that were using them and the categories were mostly -- they were across all borders -- but porn and dating sites were most prominent. We sent e-mails to these people saying why are you using fingerprinting. Only one dating site replied. For the rest we just took our best guess. For porn sites our theory is that they are trying to detect the use of shared credentials, so you have a user. Yes? >>: When you say most prominent, are we talking about a percentage? >> Nick Nikiforakis: Yes. From the categories, the two top categories were porn and dating sites. >>: That would cover what, 50 percent, 80 percent, what? >> Nick Nikiforakis: You mean, cover the total of the dating sites present in the 10,000 or the total of the 10,000? >>: The total of the 10,000. >> Nick Nikiforakis: It's 40 sites in total, so that would be the percentage of these 40. Yeah. For porn sites they are trying to identify shared credentials. They don't want one user buying a subscription and sharing it with ten friends. They want one user per subscription, so they're using fingerprinting to protect themselves. For dating sites, one site replied to us and they said that they don't want people to have multiple profiles because then they game the system. They want one user to have one dating profile, so they use fingerprinting, again, to identify multiple users hiding between, well a single-user hiding between, behind multiple usernames. At the time Skype.com was the highest-ranking website fingerprinting its users when you try to login. Yes? >>: Did you ask the question of Skype and did they respond? >> Nick Nikiforakis: Yes, no. >>: Are they still using? You said at the time. >> Nick Nikiforakis: This was done in 2012. I think they are. >>: Can you check and let us know and then maybe we can follow up? >> Nick Nikiforakis: Sure. To their credit, they are only using it at their login page, so if you were to go to Skype.com you will not see anything, but if you try to login, that's when the JavaScript library is loaded and it's fingerprinting you. >>: Right. The login page is now a Microsoft account because it got rid of the Skype account. >> Nick Nikiforakis: Okay. I'll be happy to check it if you want to afterwards. >>: What do you think Javier? >>: Yes. >> Nick Nikiforakis: Okay. That was for popular sites. Then since we did this work with UCSB we had access to the domains from Wepawet and Wepawet is as high interactive honeypot that is used to find JavaScript malware in webpages. We just searched what are the domains that when analyzed loaded code from these fingerprinting providers. We found 3800 domains. The first thing is when we got the categories that they span a lot of categories, so we had shopping sites that use fingerprinting, travel, internet services, business and economy. The second thing that was very interesting was that the two top categories were malicious sites and spam sites. Malicious sites and spam sites when analyzed by Wepawet were loading JavaScript from these fingerprinting providers. What makes this a bit more interesting is that these three companies don't offer free previews of their services and you can't go online and just grab fingerprinting services. You have to meet with a sales representative. The question now is the following. Are malicious sites and spam sites using, just grabbing the JavaScript libraries and, they can't use it because they're not paying for the services, but they're just doing it as a smoke screen for something? Or is it something else? We just left it at that. I don't know any one of the three companies that will answer this question. That was the first paper. Then the question was the following. We know how these three companies work. We analyzed them. We did the taxonomy, but are there more? Are there more that were less vocal that were not included in our surveys? And the question is how do you find more. If you were to do some sort of dynamic analysis, let's say you look at things loaded, how do you separate, for instance, a fingerprinting script from a generic analytics screen? If I have Google analytics or stack count or something else, it will also look whether I have Flash. It will also see my screen size but how do you tell one from the other? Our conclusion was we can actually look at fonts. According to Eckersley’s experiment fonts were the second-most revealing thing checked, it was the secondmost revealing attribute of your fingerprint. We said that if a script passes this threshold and it goes and it now asks, starts asking questions of fonts on your machine, then we treat this as a fingerprinting script. Our paper and our tool that is available online is essentially FPDectective the fingerprinting sensitive crawler, so you just… Yes? >>: Is there a good reason why some sites might want to know what fonts you have? >> Nick Nikiforakis: For example, I may want to display some text to you, maybe using three or four font faces, but why would I ask for 400? >>: Sure. But you did say you were, you didn't say your threshold was [indiscernible] >> Nick Nikiforakis: This was an exploratory study so we did not know what sort of threshold to set. So what we did is we just recorded everyone that asked for fonts and then we manually analyzed and said okay. These guys they're using the fonts, for example, in order to show text on the screen so we remove them. It's like a helper tool to identify more fingerprinters rather than an autonomous system that will tell you for sure that this guy right there is fingerprinting you. >>: Okay. So you are not just making this stuff up. >> Nick Nikiforakis: No. >>: I have a follow-up question. What's the adverse effect of blocking all of those proving requests? I assume that [indiscernible] servers may want to find out plug-ins and lots of stuff, but let's say that you've logged all those requests. Probably it should still be okay. >> Nick Nikiforakis: This is one of the recent questions we're having to ask. The answers I don't know. While I was saving this for the end, but I can tell you now. If I constantly reject your scripts, is it possible that I actually make myself fingerprintable in the long run? There's that guy from Microsoft Research every single day with that browser who is not loading our JavaScript files. And then I am again single doubt simply because I'm not loading scripts and this is just an open question right now. I don't have a good answer to that. The detection of font snooping we just using our results from Cookieless Monster we just said how can we detect fonts. Okay. They have two ways of grabbing them. They have the JavaScript-based measuring the width and the height, so let's modify the browser and add a code that when something is playing with the width and the height of your elements, let's record that. Then for Flash we just have a proxy that was grabbing all the swift files. Was that compiling them and it was just searching for the API closing Flash that give back a list of fonts. So these are the old results and with FPDetective we saw actually 140 fingerprinting sites, 145 in the top Alexa 10,000. Another experiment we did is that we tried one run with no do not track header and another one with do not track equals one and we got the identical result, so we fingerprinted 100 percent in both cases. At this point do not track does not matter in terms of fingerprinting. Companies actually like BlueCava is actually quite vocal about that. They say we use fingerprinting for ads and we use fingerprinting for fraud detection. If you're a bad guy you can't just say do not track equals one because then, we cannot honor that because then we will just stop tracking you. We will fingerprint you, but we will not use your fingerprint for ads. That's a promise that's very hard to verify because it's on the server-side. You're being fingerprinted identically at the client side. So that's that. >>: You can look at the ads that you are getting… >> Nick Nikiforakis: Yeah, so that would be a side effect sort of. That's an interesting question. >>: That's what [indiscernible] >>: [indiscernible] study. >>: At CMU [indiscernible] >> Nick Nikiforakis: Okay. So for fingerprinting or in general for third-party tracking? >>: In general. >>: In deciding [indiscernible] >> Nick Nikiforakis: I would like to get appointment. >>: He was at the workshop, the web security and privacy workshop. >> Nick Nikiforakis: Okay. I'll check it out. The status at this point is that fingerprinting is out there. There's quite a number of new techniques over Panopticlick. We know that large and popular sites are using them, not too many but they are still getting millions of visitors per day since they are in the Alexa top 10,000. So the question is, the second one is could they be doing more? If they wanted to fingerprint a bit better could they check more things? The question essentially boils down to us as to how to your browser internals relate to your browser's identity. And I think I'll only go over one thing. We decided to do some fingerprinting of our own; that's what we did. We focused on two, on the two special JavaScript objects that have historically attracted the most fingerprinting efforts. The navigator object that has all the information about your browser like your plug-ins, your [indiscernible] your platforms and so on and the screen that has information about the width and height of your screen, the depth of colors and what we did is we just, we are every day guys so we just performed a series of everyday operations on these objects. We tried to add properties to these special JavaScript objects. We tried to remove properties and we also tried to modify properties. These are special objects because they are created by the browser. They are not created by [indiscernible] program, so these are better candidates to reveal browser dependent behavior. One of the things, probably my favorite one that we discovered is that the natural ordering of properties can give away a browser family and occasionally a browser version. If you would go to Chrome and you would say please give me, just list the properties into the navigator object and you would get navigator.geolocation.online.cookie enabled and so on. If you would ask the same question of Firefox you would get a different ordering here. Regardless of who the browser pretends to be, through the ordering, if I get a different result, then I know that's who the browser is. Here in Internet Explorer it started kind of similarly with Firefox but then deviated fast into its own ordering and that's a bit of an underspecified thing, the ordering of elements in a list in bomb objects. There's also more things but I'll just skip them for today. So the question is now, you know, there is more that fingerprinting can do, we have that fingerprinters can do. How do users react? How do people today, how are they told to react and how do they react themselves? We looked at the few stuff that Dale just talk to you about user agent, user agent spoofing browser extensions. We found essentially previous research as well as underground hacking guides that were telling just install this user agent spoofer and change your browser user agent and then people will think that you are many machines behind a single IP address. We just went to each market and we search for users of spoofing and we found 11 add-ons, extensions in total with more than 800,000 users, now more than a million. The question is how do they stand up against fingerprinting. And we try to be fair, so we didn't use any of our newly found, of our newly discovered techniques. We just sort of, we installed them and we looked at the navigator and the screen object and we looked at these things before and after them being active. What we found out was that unfortunately all of them had one or more problems. The first one is that they had an incomplete coverage of the navigator object, so I would change a user agent, but I wouldn't change, for example, the platform, so now I was Internet Explorer landing on Linux. Also there were some that were randomizing attributes, so they were, these allowed for impossible configurations. You had your browser pretending to be an iPhone but having a screen size of a desktop computer. And finally, some were forgetting that the user agent communicated both through your JavaScript environment as well as your HTTP header, so they were changing one but they were forgetting the other. So that is straightforwardly revealable. The thing is that you may think that we tried and we failed. We're back to where we started, but that's not true. Since I am a Greek I like using Greek words and we classify this as a latrogenic problem. And latrogenic means something or rather a latrogenic disease is a disease that was caused during examination by a doctor or during treatment by a doctor. So you have users that installed these in order to hide but they actually become more visible than before and you can think of this sort of in a bad way. If we assume this big box here to be the entire fingerprint of the surface of users, of a browser with all the quirks and the ordering of properties and everything, you have then the extensions, let's say extension A, B and C that go in. At first they all change the navigator or user agent, but then each extension author he tries to do something more. One would also try to change the platform. The other would say I'll also change the screen size to match. But the thing is that what they cover versus what is not covered is huge. So the first thing you can do is you can definitely use all of this part here to find what is the real browser. So you are back to where you started. But then you can also check for any of these areas that are only covered by a single plug-in, or by a small combination of them, and now you have reduced your anonymity said from 100 million users of Firefox to 3000 users of that specific extension. Essentially you reveal extensions from the side effects because extensions are not as innumerable as plug-ins are in browsers. What we've done so far is we've been detecting fingerprinters and we've been raising awareness both to ourselves and to people in general. We haven't yet tried to see how would you go about stopping that. One of the things that we know, that we talked about earlier is what happens if we block fingerprints today? Does that make it worse and can we track that over time? And then, you know, some things we were discussing with Ben, for example, if we're going to start modifying the browser, should we all pretend to be the same, like Tor browser does or like these browser extensions that we're trying to do or should we rather be a different user every single time so that I am different but I'm different from the last time he saw me and I'll be different the next time so you can still not correlate. And these are open things and I would love to discuss these in person with you if you want. To conclude, fingerprinting is a real problem and browsers are these complex beasts that you can't just go in and change the attributes and say now I'm done. Let's tackle another problem. Current browser extensions should not be used for privacy reasons, so there are some sites that, some old sites that say I need Internet Explorer 6 to run. If you have to go to these websites then you may use them but you shouldn't have this thing enabled all the time in your browser because it makes you more visible than before. This is not really a scientific statement. It's more like what I think at this point, is that long-term solutions will most likely not be technical ones, so we may have to say that we'll do our best to identify fingerprinting. We'll do our best to combat it, but then the site has to tell you in the same ways that they do today for cookies, we are using fingerprinting on this site. Are you okay? And then you as a user, you can choose whether you are okay with that or if you would rather go to a different website. That's all. And thank you for your attention. I'd be happy to take any questions. [applause]. >> Ben Livshits: We have a few minutes for questions. >>: So do you feel like this has gotten -- as someone who has is in the field I've heard of browser fingerprinting for a while. In my opinion, I don't feel like it's really gotten as much like common media attention even though it is prevalent and easily identifiable et cetera et cetera. What is it that you think will bring this to public attention in how do not track will become a thing and any of the other, you know? >> Nick Nikiforakis: I think you have the third-party cookies and do not track that got public attention because they were bigger than fingerprinting and still are. You have all these selfhelp tools for third-party cookies, but you don't have any self-help today for fingerprinting per se. >>: They do, but they don't work. >> Nick Nikiforakis: Yes, any good ones. So I think that actually for the second paper that I mentioned FPDetective, it was picked up by a lot of outlets and sort of everyone talked about it, but the more they talked about it, the less correct it becomes so at the end you had websites that invisibly tracked you even though you don't want to. It has nothing to do with fingerprinting. So I guess it's a matter of presenting it to the world in a concise way and essentially showing okay. Now you clicked, I will not accept third-party cookies, but what's left? >>: I would agree that the easier you can make this for people to understand the more attention it will get. However, I think for people to really start paying attention to this new need, well maybe not you, but someone needs to concentrate on the [indiscernible]. And that's why I'm trying to find some of the grass of who is actually using this tool so interestingly. So far as you can show that these tools are being used for some nefarious purpose, then you will start getting the right attention. >> Nick Nikiforakis: The thing is that people are not very vocal about it. I send around 40 emails at the time and we got like two responses. One saying I cannot tell you anything and the other one from the dating site telling me about the Sybil attacks. The thing is that there's a server side component to all of this that we do not know how it works, so you have the JavaScript. It runs. It generates a fingerprint. This thing is actually encrypted, so even if you get it directly from the JavaScript library you cannot look into it as a first party website. You have to still give it to the fingerprinting service and say what do you know about that user. Then they give you back information about them and some of them claim that they have this threats corp where they correlate the same fingerprint across many websites and they have a reputation database and they say that guy, you know, he has a low, he is not trustworthy, so you should not do business with him. But all of that is a thing we don't know about and I cannot go online and subscribe to these things. I have to go through sales representatives and pretend to be a company and well, we haven't done that yet. I don't know if we we'll succeed in doing that, but it's much harder than for other services. Yeah? >>: Fingerprinting versus just cookie usage, do you think fingerprinting is worse than thirdparty cookies because the end result is that these companies are tracking users. They're building profiles. What do you think versus the two? I mean third-party cookies or cookies in general are used by almost 100 percent of websites out there. >> Nick Nikiforakis:) There is like multiple layers to this. The thing is that if I give you a unique identifier and I know for a fact that it's you, so and you bring it back to me. I don't have to correlate things and if I see two similar fingerprints decide that it's two users instead of one. However, fingerprinting has some interesting properties. For example, it works when you transit from the private mode into the non-private mode of your browser. Or for example, if Flash is used, I guess that's probably similar with the Flash cookies that, you know, you have fingerprinting across browsers because it goes through the Flash cookies that are shared by all of your browsers in your system. Today if you would tackle the problem I would say just try to stop or limit third-party cookies, but fingerprinting is currently not legislated in any way, so it doesn't appear anywhere and people are now starting to think whether they should be. As long as you have a lot of legal attention on the cookies, then there is this sort of window for people to use fingerprinting to get the same results and not have to expose themselves as we use something that tracks you. >>: If you view fingerprinting as just a way to do cookies without cookies and you see it as an attempt to circumvent legal restrictions, then you could classify this as malware. And if you classify this as malware it seems like among the classes of malware out there, fingerprinting is quite easy to fingerprint. There's a very clearly defined subset of things they do and there's only so many tools for doing that. So if you wanted to fingerprint the fingerprinters is it that hard or is it something that we could just do? >> Nick Nikiforakis: We are fingerprinting the fingerprinters in the sense that we know who they are at this point, but I know from analyzing the code that now they're minimizing their JavaScript but they are not per se trying to hide more than that. The thing is that a lot of the companies they come out and they say look you can use fingerprinting for fraud purposes, for detecting fraud and I think this is not my area, but there are certain legal requirements for banks in order to have some sort of anti-fingerprinting measures to protect their customers and fingerprinting could fit that bill. It's not as clear cut and that's malware and that's done. And we just consider it as malware. For instance, the native fingerprinting libraries that we isolated for the two companies, we submitted them to virus total and we get 0 to 42 engines flagging these as malware. That was a real deal. It was loaded into your browser for the sole reason of fingerprinting you better, but none of the companies is saying this is adware or it's something bad. They were all just flagging it as great. I think it's complicated to just call them malware and just… >>: Who do you talk to for legal on this? >> Nick Nikiforakis: I just read around I guess and… >>: Have you talked to [indiscernible] >> Nick Nikiforakis: To who? >>: Ben Edleman, he's a Harvard business school professor. He made a career as a grad student exposing adware and he has both computer science skills and he's got a law degree and an economics degree, so he knows exactly how to attack these guys. >> Nick Nikiforakis: Okay. I'd be happy to talk with him. That sounds good. >>: Okay. You want to be talking to him. I'll hook you up. >> Nick Nikiforakis: Okay. Thank you. >>: Do you have any information like about mobile devices versus nonmobile devices? I would think that mobile devices would probably be less fingerprintable. >> Nick Nikiforakis: We don't have solid results, but what we do know is that users they customize their mobile browsers much less than they customize their desk browsers. For instance, I talked to a guy from ING two weeks ago and they use fingerprinting as part of protecting their users and they told me that fingerprinting does not work for mobile devices. Do you have any good tricks that we can use? I said no [laughter]. At this point I think it's sort of like an open question. It's probably true that they customize less, but I wonder if they have special APIs that connect, for example, to the Android operating system that could be used also for fingerprinting. >>: Would fingerprinting be more effective if HTML 5 and stuff? >> Nick Nikiforakis: There is some research, for example, people have looked to the canvas element of HTML 5 where they just write some text in it and they read it back out as an image and they showed that there is a difference when you do this operation on a Linux machine with Chrome, for example, or on a Windows machine with Chrome. But the results are a bit like there's a difference. We're not sure how to quantify it yet. It could be used, but that's it right now. There was no like from the code that I analyzed, there was no evidence that they were using that. >>: So the porn sites, I'm assuming they're doing it because people go in to incognito mode which doesn't allow for the dropping of cookies so they do it to track users? >> Nick Nikiforakis: I think that's probably one reason. Our own theory is for the detection of shared credentials so that you don't buy one subscription and share it with 100 people. They detect whether the same user name and password combination is connected to more than one real browsing environment. >>: Do you see [indiscernible] sites using fingerprinting [indiscernible] >> Nick Nikiforakis: Right. We haven't checked that one. The thing is that for crawling that's a general problem in crawling that we have is that it's really hard to go past the authentication wall. You will check for all of the pages that are publicly accessible, but once things are behind a login formula problem, and I would suspect that maybe for Hulu and for Netflix that fingerprinting starts maybe right after you login. >>: That would be interesting because I know that HBO has said that, they know that people share credentials. They watch HBO shows but they're okay with it because people have to subscribe anyway, they're delegating the end person like cable subscriber even though the subscriber's friends and family might be watching content also. And I think Netflix has like about a 50 sign in limit or something, only 50 different users can sign in with the same ID at the same time period I don't know how they detect that. >>: It's an interesting hypothesis in terms of [indiscernible] friends you can get credentials versus being able to detect. You need people because for privacy sensitive things like porn people are going into incognito or other methods of managing their footprints. >> Nick Nikiforakis: My thesis here is that fingerprinting happens on the client side so if you're like this sophisticated attacker, you can claim that the script run and you can send back a fingerprint and you can construct this to have exactly what you want in it and when they open it up on the server side they will say okay. This thing run and it's a user located in that place in the world, but you can falsify all of these. It's not easy to do it, but you can. For normal people you can't. They don't know how to use search much less actually modify JavaScript libraries. So I think that today even though the argument is that fingerprinting can be used a lot for fraud detection, I think that if someone wants to go through he will easily be able to go through. Fingerprinting seems to me to be working much more for the large part of the population that will just browse websites. They will get fingerprinted and then they will get ads based on their fingerprinting. Yeah? >>: Do you see a trend in what they've been doing? Has it been going up or, how fast do you think? >> Nick Nikiforakis: We don't know. We do plan to use the FPDetective tool to use to do a longitudinal study where we track it for a year and we tried to see whether companies are trying to change or whether more sites are coming in. I have this sort of gut feeling that it's growing, but I cannot give you numbers right now. >>: Have you reached out to the extension neighbors to share your findings? What would their reaction be? >> Nick Nikiforakis: No. I haven't, so I cannot guess their reaction. >>: [indiscernible] >> Nick Nikiforakis: The thing is that we do not know whether these people are making, what we do know that some of them are making, are saying oh this privacy, this privacy, don't use our extension, but there's other guys that they just offer an extension that changes your user agent. It could be that they are just trying to offer services towards people who try to access specific websites that check the user agent and say I will not work unless you have that. But we did find both academic research and underground guides suggesting the use of these and the large numbers show that even if you download it and install it to access that website, you, this thing still can be used against you on other websites that fingerprint you. >> Ben Livshits: Are there further questions? Okay. Let's thank the speaker. >> Nick Nikiforakis: Thank you. [applause]