Document 17865256

advertisement
>> Ben Livshits: Okay. Let's get started. I hope that we are on the video. I'm Ben Livshits. We
have a visitor today, Nick Nikiforakis from KU Leuven at the moment, but soon to be at UC
Santa Barbara as far as I know. He's going to talk to us about web fingerprinting. Thank you.
>> Nick Nikiforakis: I don't know about the last part. Thank you very much for being here, and
thank you Ben for inviting me. So the title is Everything You Always Wanted to Know about
Web-Based Device Fingerprinting but Were Afraid to Ask. I'm not very honest because there
are a lot of things you would like to know and I will not cover because we don't know yet, but
I'll try to give you a good introduction. I'm a postdoc researcher at KU Leuven. I work mainly in
web security and privacy and I have done a few heap overflows in the past. What I like to do in
the web is I like to look at the web as clusters of services and identify the ecosystems of these
clusters. In there I enumerate the players, the interactions between the players and search for
systematic problems within the services. This fingerprinting work is also a type of ecosystem
work, so you can keep this in mind. You may have seen a similar slide in the past. This is an
article from the New York Times and the content that is highlighted with red is all third-party
content, so it's not coming from the servers of the New York Times. It's coming from various
advertising servers and social networking servers and every time that you ask for content from
a third-party server, the server has the ability to send cookies to you. These are called thirdparty cookies because it's not the New York Times that gave them to you, but a website or a
banner within New York Times. When you go to another website that is affiliated with the
same advertising agencies then you will send back the cookies they gave you and then they
create browsing profiles of you, so they know that you went to the New York Times and to
another 1000 websites and they can use this information to sell, to do better targeting of ads.
This is not exactly new, but it's still very interesting because there's a lot of companies that
people don't know about that are very, very popular in terms of being included over third-party
in the web.
We did a study last year for CCS for who includes JavaScript from whom and we found, for
example, in the top ten there are at least three companies that are providers of JavaScript
libraries. These guys they give some functionality to website administrators like analytics, but
in return they get data. There's quantserve, scorecardresearch and addthis and chances are if
you're not doing some sort of privacy related research then you don't know them, because
simply people, they don't have a first party relationship with any of these three sites. This, as I
said, is not really news. Today I want to talk to you and convince you that tracking involves
more than just third-party cookies. For the purposes of this talk, fingerprinting is the ability to
tell users apart based on the browsing environments without using any extra stateful
identifiers. No Flash cookies, no E tax, no cache tricks, nothing like that. I'm going to present to
you some of the results from a thorough study of current fingerprinting practices on the web,
and I also hope to convince you that it's really, really hard today to hide the true nature of your
browser and how this relates to fingerprinting. Today, people know more about cookies than
they used to know. According to a 2011 study from comscore, a third of users delete their first
party and third-party cookies within a month of them being set up. That's a problem because
as an advertiser if you rely on cookies to track people around, then you sort of, once a month
you lose your browsing profile for that user. Also, you have this increased interest in self-help
extensions in browsers and you have Ghostery and Lightbeam and I will show in the next slide
what they are and what they do. You also have the private mode of browsers that users can go
in and out in order not to keep track of their data, of cookies and stuff that websites give for
certain websites that they don't want to keep their data on their machines. This is I hope a
relevant site for you, Redmond reporter.com and here you can see Ghostery, which is this plug
in, this extension rather that has this long list of third-party trackers and it tells you what it finds
on each page. Here you can see for instance that Ghostery found ten trackers on the Redmond
reporter. You can do various things from here like check out what specific scripts they found
from these guys and also blacklist them if you want to. Then you have Lightbeam which is from
Mozilla and Lightbeam does something different. It tries to connect different parties in terms
of tracking, so you see here for instance you have the Redmond reporter.com and you have this
other Greek news website and they are connected through Facebook, so when you go to the
sites Facebook will be included in both and Facebook will know that you as an individual went
to the Redmond Reporter and then to the Greek news website.. Here is also CNN and you see a
common connection, so as you browse the web, these connections increase and essentially if
you leave this on for a while you see that websites are much more connected to each other
that you would like to think. This is all about cookies, but what if today I could tell you that
interested parties could track users without the need of cookies or any other stateful client-side
identifiers? As a bonus, this is hidden from users so there is no dialogue for you to inspect of all
the various cookies that are given to you and delete them, maybe. It's also hard to avoid it or
opt out, so you cannot just click on something and say I don't want third-party cookies anymore
because that's not relevant. So this is web-based device fingerprinting and this gained
popularity first in 2010 from Peter Eckersley. He wrote a paper about how unique is your
browser, and there he showed that certain attributes of your browsing environment can be
combined in order to track you. He said that essentially if you combine them the right way,
they create a kind of unique fingerprint for you. How does this work? So Eckersley said you
come to my website and then I first ask your browser a couple of questions, like who are you?
Are you Firefox? Which version are you? Which platform do you run on? Then if you have
JavaScript enabled and you do know that almost all users do, then Eckersley would keep on
asking a few questions through JavaScript, so he would, for example say what is the width and
the height of your monitor? Then he would ask what is the time zone that you are currently
located in, also accessible through JavaScript. Then he would ask for a list of plug-ins from your
system, like the Adobe reader, your job applicants, maybe your Flash plug-in and then if Java
Flash were installed, then he would actually get a list of the fonts on your system because Java
and Flash actually have the ability, they have this API that they provide to the applications. You
can ask for the fonts that are installed on the user’s system. Then he would also have some
super cookie checks like cookie set for global storage or a local storage cookie. He found out
that in the half-million users that participated in this study, from the ones that have Java or
Flash enabled, or the ones that he could get fonts from 94.2 were uniquely identifiable.
Essentially, almost no users, 94.2 had all these attributes the same. He also showed that you
could use simple heuristic algorithms to track local changes in fingerprint. If your user agent
changes but your fonts don't and other things don't then maybe I can infer that it's still you
with an updated browser rather than a new person with a new fingerprint. Yes?
>>: Do you believe that number, that 94.4 percent number?
>> Nick Nikiforakis: I don't know. I'm just using related work to position myself here. What I
can show you in connection to that is that Panopticlick is still available online so you can try it
on your own machines, and I tried here this morning with my own and it said that my browser
fingerprint appears to be unique in now 3.6 million users installed. If you kind of look at what
goes into my fingerprint, you can see here, for instance, my user agent, which is this one, which
it's not really that unique, so one in 310 users share the identical user agent, the same for your
accept headers. But then this long list of things here, these are the list of plug-ins on my
machine, not extensions, plug-ins. I didn't choose to install any of these. And here you can see
that this is like a ridiculously high amount of entropy and essentially no other user has
everything identical. These are the names of plug-ins, their versions and a human readable
description, everything concatenated in one stream. You get a similar result for fonts. You see
here that only one in 600,000 users have all these fonts identically as I have. That's what
Eckersley said back in 2010 and that was very interesting. What can you use, what do you use
fingerprinting for? The first and obvious thing is ads. There are no cookies for you to delete.
There is no check for you to say I don't want third party cookies in my browser, so now I can
just connect your browsing profile and instead of connecting to a cookie, I can just connect it to
your fingerprint, so I can maintain the list of websites you have visited regardless of what you
delete client-side. And I can do the same even if you enter in your private mode because your
fingerprint does not change when you enter the private mode because nothing is different.
Everything is still there in the same way. The second thing, and this is the more positive way of
looking at things, is that you can use it for anti-fraud. Your bank is tracking you for a year and
they know that you log in from a Linux machine and they know you use Firefox and they know
you log in during the morning, for instance; they can add timestamp information. If suddenly
you log in at night from a Mac from let's say Indonesia, then something may be wrong. Then
they would say, please verify that it's you and it's not someone who has stolen your credentials.
Then we found that some companies use fingerprinting for pay walling. There are websites, for
example, are news sites where you can read ten articles for free but if you would like to read
more you have to pay some sort of subscription. If they would do that tracking with cookies,
then you would just delete your cookies and you would read another ten and you would go on
and on. However, they could use fingerprinting to do the pay walling, so that there is nothing
for you to delete once you are done reading your ten articles because the fingerprinting is part
of who you are. Finally, this happened in the summer, there was this attack against an
outdated Tor browser and there was a Firefox vulnerability and the people analyzed the
payload and they saw that it essentially fingerprinted a bit of the user’s system and it sent that
fingerprint to a remote server. The most plausible theory at this point is that they were Feds
that they were doing this, trying to identify which users from the Tor network are visiting
certain shady websites or non-shady. There's a lot of interesting intrusive and less intrusive
uses of fingerprinting. In 2012, which is when we started doing this research, what we knew is
that what Eckersley had said and we also knew that there was like some companies that were
quite vocal that they were offering fingerprinting as a service. What we wanted to find out is
how are these companies doing it? So are they relabeling Panopticlick as their own product or
are they adding something more to it? The question is then could they do more, could they, if
necessary, could they fingerprint you more than they do today? Then we wanted to find out
the user base, so which websites are buying services from these fingerprinting as a service
companies and the question, then the last one was like how our users trying to hide. What do
people do in order to protect themselves against fingerprinting and if it's working for them.
This talk is essentially two papers in one. If you want to know more this is the first one,
Cookieless Monster published in Security and Privacy this year and the second one, FPDetective
published in CCS this year. So that's that. We started our work by analyzing the code of these
three vocal companies that said that we offer fingerprinting and you can use it for all cool
things. This is what we did. We first found the domains that they used to serve the
fingerprinting scripts. Essentially, most of them they advertise the fact that you can adopt
fingerprinting in a very straightforward way. You just dump a bit of JavaScript code in your
page and now you are fingerprinting your clients. Then we found some websites that use them,
we extracted the code; we isolated it from the code of the website. We de-obfuscated and
analyzed the code of these services and then we compared the code to each other and we
created some sort of taxonomy to find out where every company stands. Of course, companies
are not eager to share their fingerprinting code with us, so most of them this is actually the real
part of the code that we had to look at manually. The results are that we were able to create
this taxonomy as I said, compare the companies to each other and there were quite some
interesting findings. The first one was that collectively Panopticlick was fully covered. Usually
is that the industry is a bit behind academia. In this case they were really up-to-date. What
Eckersley had said in 2010 they were offering in their fingerprinting services. The classification,
the taxonomy that we broke up is split up into five levels. You can start from fingerprinting
things in your browser customizations, fingerprinting features at the browser level user
configuration, fingerprinting your browser family and version, fingerprinting your operating
system and applications and finally fingerprinting your hardware and network. Here you see
these things here are all new things over what Eckersley was doing in 2010. For instance, for
Internet Explorer, Internet Explorer does not share its plug-ins, so there is no navigated or plug
in property in JavaScript that one can use to read all of the plug-ins. So what they were doing is
they were having this very long list of class identifiers and they were just enumerating them
one by one. Do you have that? Do you have that? Do you have that, in order to get, you know,
a partial list of the class IDs that were installed on that browser. Then in the browser level user
configuration we saw that they were actually, one company was tracking the do not track
choice, so that's a bit interesting. They were reading the fact whether you wanted to be
tracked or not and they were adding it as part of your fingerprint. A company was also reading
math constants from your JavaScript engine and was incorporating those math constants into
your fingerprint and we assume that they are doing this in an effort to separate JavaScript
engines from each other, so if you are different in the floating points of something, then I may
be able to identify that browser that houses this JavaScript engine. Then we also found,
interestingly, that they were fingerprinting the Windows registry and TCP/IP parameters. The
question is how do they do that because JavaScript cannot look into your Windows registry nor
into your TCP/IP parameters and just stick with me and I'll tell you. So the nontrivial extras that
we found is the first thing that we found non-plug in font detection. If you remember
Eckersley, he had to rely on Flash or Java in order to get a list of fonts from your machine.
However, one company was doing the following, all of this, of course, in an invisible I-frame. It
was creating these long strings, for instance, I do not need Flash. They were setting the Ariel
typeface and they were measuring the box around, the width and the height of the box, which
essentially means the width and the height of the text. Once they get this number, then they
have this long list of fonts and they keep on doing the same operation. Because of stylistic
differences on each font family, the same string on the same font size it will add up to a
different height and a different width than Ariel. Ariel is used by your browser as a fallback
font, so if I ask for a fancy font and you don't have it, Ariel will be used to display the text.
Every time that a font measurement was different than Ariel, it meant that the font was
present and thus it was used to render the text in the screen. By doing this for like 200, 300,
400 fonts, they could get a list of fonts through JavaScript through a side channel essentially
attack in JavaScript without needing Flash or Java. The second thing that we found and this is
essentially, you know, how they access your registry, is that we found for two companies that
they have native fingerprinting plug-ins. Once we are looking at the code then we saw that
when they are checking for the plug-ins installed they have this specific check for whether a
specific plug-in is present and if it is it is loaded and handed off control. We were able to isolate
these plug-ins and analyze them using [indiscernible] and we saw that they were essentially
plug-ins that were existing on the user system for the sole purpose of fingerprinting even
better. These are not extensions, so these are plug-ins and they run with the same privileges as
the browser process itself, so that plug-in could look into your registry and they were reading
things like your installation date of your Windows, your device drivers, your IP address and your
hostname and we were able essentially to find that these native fingerprint plug-ins they were
usually bundled with something that you downloaded and silently installed in the back and
essentially bundled with cousin applications and maybe Second Life type applications. The
third thing is that they're interesting fingerprint delivery mechanisms, so how do you offer
fingerprinting as a service? We saw essentially two different modes. The first one was that the
remote code was brought in from the fingerprinter. It fingerprinted the user and then it added
this fingerprint in the DOM of the first party page. So for instance, on IMVU when the user is
sort of waiting for the page to log in so that he can type in his username and password, he is
being fingerprinted and the fingerprint is added as a hidden element in the form, in the login
form and once you click submit you are sending your username and password and your
fingerprint to IMVU. The second mode was that the first party site was saying fingerprint the
user. Here's the session identifier and then the fingerprinting service, it was fingerprinting the
user but it wasn't sending the fingerprints to the first party service. It was sending it to itself
and then as we understand using the session identifier of the service side later they will be able
to say what do you know about user with that session identifier and they will get back
information about that user. The final thing is I will talk to you was essentially proxy detection.
There's this interesting thing. The thing that we saw is the following. We saw that for, I think it
was two companies, they were loading JavaScript in Flash. They were creating these long
random strings and exchanging them between each other and then they were just sending
these strings to the fingerprinting server. What happens in JavaScript is you have, for example,
an HTTP proxy, your request will go through the proxy. If this is the generated token which is
exchanged through [indiscernible] Flash it will go through the proxy server so when the request
reaches the fingerprinting server it will come from the IP address of the proxy server. However,
Flash has the ability to open direct connections ignoring your browser level proxies, so you have
another token that was sent directly to the fingerprinting server. Now the finger printing
server, you see two requests coming in from two different IP addresses with the same long
alphanumerical token. Now you can say okay. There's actually the same user. He's coming
from two different IP addresses because he's using a proxy. This is his normal IP address and
this is his proxy. You can incorporate this information in the fingerprint. If we move to
adoption, this is how they work with using them. We crawled the top 10,000 sites quite
shallowly and we are searching for inclusions from these three fingerprinting providers. We
found at the time 40 sites that were using them and the categories were mostly -- they were
across all borders -- but porn and dating sites were most prominent. We sent e-mails to these
people saying why are you using fingerprinting. Only one dating site replied. For the rest we
just took our best guess. For porn sites our theory is that they are trying to detect the use of
shared credentials, so you have a user. Yes?
>>: When you say most prominent, are we talking about a percentage?
>> Nick Nikiforakis: Yes. From the categories, the two top categories were porn and dating
sites.
>>: That would cover what, 50 percent, 80 percent, what?
>> Nick Nikiforakis: You mean, cover the total of the dating sites present in the 10,000 or the
total of the 10,000?
>>: The total of the 10,000.
>> Nick Nikiforakis: It's 40 sites in total, so that would be the percentage of these 40. Yeah.
For porn sites they are trying to identify shared credentials. They don't want one user buying a
subscription and sharing it with ten friends. They want one user per subscription, so they're
using fingerprinting to protect themselves. For dating sites, one site replied to us and they said
that they don't want people to have multiple profiles because then they game the system.
They want one user to have one dating profile, so they use fingerprinting, again, to identify
multiple users hiding between, well a single-user hiding between, behind multiple usernames.
At the time Skype.com was the highest-ranking website fingerprinting its users when you try to
login. Yes?
>>: Did you ask the question of Skype and did they respond?
>> Nick Nikiforakis: Yes, no.
>>: Are they still using? You said at the time.
>> Nick Nikiforakis: This was done in 2012. I think they are.
>>: Can you check and let us know and then maybe we can follow up?
>> Nick Nikiforakis: Sure. To their credit, they are only using it at their login page, so if you
were to go to Skype.com you will not see anything, but if you try to login, that's when the
JavaScript library is loaded and it's fingerprinting you.
>>: Right. The login page is now a Microsoft account because it got rid of the Skype account.
>> Nick Nikiforakis: Okay. I'll be happy to check it if you want to afterwards.
>>: What do you think Javier?
>>: Yes.
>> Nick Nikiforakis: Okay. That was for popular sites. Then since we did this work with UCSB
we had access to the domains from Wepawet and Wepawet is as high interactive honeypot
that is used to find JavaScript malware in webpages. We just searched what are the domains
that when analyzed loaded code from these fingerprinting providers. We found 3800 domains.
The first thing is when we got the categories that they span a lot of categories, so we had
shopping sites that use fingerprinting, travel, internet services, business and economy. The
second thing that was very interesting was that the two top categories were malicious sites and
spam sites. Malicious sites and spam sites when analyzed by Wepawet were loading JavaScript
from these fingerprinting providers. What makes this a bit more interesting is that these three
companies don't offer free previews of their services and you can't go online and just grab
fingerprinting services. You have to meet with a sales representative. The question now is the
following. Are malicious sites and spam sites using, just grabbing the JavaScript libraries and,
they can't use it because they're not paying for the services, but they're just doing it as a smoke
screen for something? Or is it something else? We just left it at that. I don't know any one of
the three companies that will answer this question. That was the first paper. Then the
question was the following. We know how these three companies work. We analyzed them.
We did the taxonomy, but are there more? Are there more that were less vocal that were not
included in our surveys? And the question is how do you find more. If you were to do some
sort of dynamic analysis, let's say you look at things loaded, how do you separate, for instance,
a fingerprinting script from a generic analytics screen? If I have Google analytics or stack count
or something else, it will also look whether I have Flash. It will also see my screen size but how
do you tell one from the other? Our conclusion was we can actually look at fonts. According to
Eckersley’s experiment fonts were the second-most revealing thing checked, it was the secondmost revealing attribute of your fingerprint. We said that if a script passes this threshold and it
goes and it now asks, starts asking questions of fonts on your machine, then we treat this as a
fingerprinting script. Our paper and our tool that is available online is essentially FPDectective
the fingerprinting sensitive crawler, so you just… Yes?
>>: Is there a good reason why some sites might want to know what fonts you have?
>> Nick Nikiforakis: For example, I may want to display some text to you, maybe using three or
four font faces, but why would I ask for 400?
>>: Sure. But you did say you were, you didn't say your threshold was [indiscernible]
>> Nick Nikiforakis: This was an exploratory study so we did not know what sort of threshold to
set. So what we did is we just recorded everyone that asked for fonts and then we manually
analyzed and said okay. These guys they're using the fonts, for example, in order to show text
on the screen so we remove them. It's like a helper tool to identify more fingerprinters rather
than an autonomous system that will tell you for sure that this guy right there is fingerprinting
you.
>>: Okay. So you are not just making this stuff up.
>> Nick Nikiforakis: No.
>>: I have a follow-up question. What's the adverse effect of blocking all of those proving
requests? I assume that [indiscernible] servers may want to find out plug-ins and lots of stuff,
but let's say that you've logged all those requests. Probably it should still be okay.
>> Nick Nikiforakis: This is one of the recent questions we're having to ask. The answers I don't
know. While I was saving this for the end, but I can tell you now. If I constantly reject your
scripts, is it possible that I actually make myself fingerprintable in the long run? There's that
guy from Microsoft Research every single day with that browser who is not loading our
JavaScript files. And then I am again single doubt simply because I'm not loading scripts and
this is just an open question right now. I don't have a good answer to that. The detection of
font snooping we just using our results from Cookieless Monster we just said how can we
detect fonts. Okay. They have two ways of grabbing them. They have the JavaScript-based
measuring the width and the height, so let's modify the browser and add a code that when
something is playing with the width and the height of your elements, let's record that. Then for
Flash we just have a proxy that was grabbing all the swift files. Was that compiling them and it
was just searching for the API closing Flash that give back a list of fonts. So these are the old
results and with FPDetective we saw actually 140 fingerprinting sites, 145 in the top Alexa
10,000. Another experiment we did is that we tried one run with no do not track header and
another one with do not track equals one and we got the identical result, so we fingerprinted
100 percent in both cases. At this point do not track does not matter in terms of fingerprinting.
Companies actually like BlueCava is actually quite vocal about that. They say we use
fingerprinting for ads and we use fingerprinting for fraud detection. If you're a bad guy you
can't just say do not track equals one because then, we cannot honor that because then we will
just stop tracking you. We will fingerprint you, but we will not use your fingerprint for ads.
That's a promise that's very hard to verify because it's on the server-side. You're being
fingerprinted identically at the client side. So that's that.
>>: You can look at the ads that you are getting…
>> Nick Nikiforakis: Yeah, so that would be a side effect sort of. That's an interesting question.
>>: That's what [indiscernible]
>>: [indiscernible] study.
>>: At CMU [indiscernible]
>> Nick Nikiforakis: Okay. So for fingerprinting or in general for third-party tracking?
>>: In general.
>>: In deciding [indiscernible]
>> Nick Nikiforakis: I would like to get appointment.
>>: He was at the workshop, the web security and privacy workshop.
>> Nick Nikiforakis: Okay. I'll check it out. The status at this point is that fingerprinting is out
there. There's quite a number of new techniques over Panopticlick. We know that large and
popular sites are using them, not too many but they are still getting millions of visitors per day
since they are in the Alexa top 10,000. So the question is, the second one is could they be
doing more? If they wanted to fingerprint a bit better could they check more things? The
question essentially boils down to us as to how to your browser internals relate to your
browser's identity. And I think I'll only go over one thing. We decided to do some
fingerprinting of our own; that's what we did. We focused on two, on the two special
JavaScript objects that have historically attracted the most fingerprinting efforts. The navigator
object that has all the information about your browser like your plug-ins, your [indiscernible]
your platforms and so on and the screen that has information about the width and height of
your screen, the depth of colors and what we did is we just, we are every day guys so we just
performed a series of everyday operations on these objects. We tried to add properties to
these special JavaScript objects. We tried to remove properties and we also tried to modify
properties. These are special objects because they are created by the browser. They are not
created by [indiscernible] program, so these are better candidates to reveal browser dependent
behavior. One of the things, probably my favorite one that we discovered is that the natural
ordering of properties can give away a browser family and occasionally a browser version. If
you would go to Chrome and you would say please give me, just list the properties into the
navigator object and you would get navigator.geolocation.online.cookie enabled and so on. If
you would ask the same question of Firefox you would get a different ordering here.
Regardless of who the browser pretends to be, through the ordering, if I get a different result,
then I know that's who the browser is. Here in Internet Explorer it started kind of similarly with
Firefox but then deviated fast into its own ordering and that's a bit of an underspecified thing,
the ordering of elements in a list in bomb objects. There's also more things but I'll just skip
them for today. So the question is now, you know, there is more that fingerprinting can do, we
have that fingerprinters can do. How do users react? How do people today, how are they told
to react and how do they react themselves? We looked at the few stuff that Dale just talk to
you about user agent, user agent spoofing browser extensions. We found essentially previous
research as well as underground hacking guides that were telling just install this user agent
spoofer and change your browser user agent and then people will think that you are many
machines behind a single IP address. We just went to each market and we search for users of
spoofing and we found 11 add-ons, extensions in total with more than 800,000 users, now
more than a million. The question is how do they stand up against fingerprinting. And we try
to be fair, so we didn't use any of our newly found, of our newly discovered techniques. We
just sort of, we installed them and we looked at the navigator and the screen object and we
looked at these things before and after them being active. What we found out was that
unfortunately all of them had one or more problems. The first one is that they had an
incomplete coverage of the navigator object, so I would change a user agent, but I wouldn't
change, for example, the platform, so now I was Internet Explorer landing on Linux. Also there
were some that were randomizing attributes, so they were, these allowed for impossible
configurations. You had your browser pretending to be an iPhone but having a screen size of a
desktop computer. And finally, some were forgetting that the user agent communicated both
through your JavaScript environment as well as your HTTP header, so they were changing one
but they were forgetting the other. So that is straightforwardly revealable. The thing is that
you may think that we tried and we failed. We're back to where we started, but that's not true.
Since I am a Greek I like using Greek words and we classify this as a latrogenic problem. And
latrogenic means something or rather a latrogenic disease is a disease that was caused during
examination by a doctor or during treatment by a doctor. So you have users that installed
these in order to hide but they actually become more visible than before and you can think of
this sort of in a bad way. If we assume this big box here to be the entire fingerprint of the
surface of users, of a browser with all the quirks and the ordering of properties and everything,
you have then the extensions, let's say extension A, B and C that go in. At first they all change
the navigator or user agent, but then each extension author he tries to do something more.
One would also try to change the platform. The other would say I'll also change the screen size
to match. But the thing is that what they cover versus what is not covered is huge. So the first
thing you can do is you can definitely use all of this part here to find what is the real browser.
So you are back to where you started. But then you can also check for any of these areas that
are only covered by a single plug-in, or by a small combination of them, and now you have
reduced your anonymity said from 100 million users of Firefox to 3000 users of that specific
extension. Essentially you reveal extensions from the side effects because extensions are not
as innumerable as plug-ins are in browsers. What we've done so far is we've been detecting
fingerprinters and we've been raising awareness both to ourselves and to people in general.
We haven't yet tried to see how would you go about stopping that. One of the things that we
know, that we talked about earlier is what happens if we block fingerprints today? Does that
make it worse and can we track that over time? And then, you know, some things we were
discussing with Ben, for example, if we're going to start modifying the browser, should we all
pretend to be the same, like Tor browser does or like these browser extensions that we're
trying to do or should we rather be a different user every single time so that I am different but
I'm different from the last time he saw me and I'll be different the next time so you can still not
correlate. And these are open things and I would love to discuss these in person with you if you
want. To conclude, fingerprinting is a real problem and browsers are these complex beasts that
you can't just go in and change the attributes and say now I'm done. Let's tackle another
problem. Current browser extensions should not be used for privacy reasons, so there are
some sites that, some old sites that say I need Internet Explorer 6 to run. If you have to go to
these websites then you may use them but you shouldn't have this thing enabled all the time in
your browser because it makes you more visible than before. This is not really a scientific
statement. It's more like what I think at this point, is that long-term solutions will most likely
not be technical ones, so we may have to say that we'll do our best to identify fingerprinting.
We'll do our best to combat it, but then the site has to tell you in the same ways that they do
today for cookies, we are using fingerprinting on this site. Are you okay? And then you as a
user, you can choose whether you are okay with that or if you would rather go to a different
website. That's all. And thank you for your attention. I'd be happy to take any questions.
[applause].
>> Ben Livshits: We have a few minutes for questions.
>>: So do you feel like this has gotten -- as someone who has is in the field I've heard of
browser fingerprinting for a while. In my opinion, I don't feel like it's really gotten as much like
common media attention even though it is prevalent and easily identifiable et cetera et cetera.
What is it that you think will bring this to public attention in how do not track will become a
thing and any of the other, you know?
>> Nick Nikiforakis: I think you have the third-party cookies and do not track that got public
attention because they were bigger than fingerprinting and still are. You have all these selfhelp tools for third-party cookies, but you don't have any self-help today for fingerprinting per
se.
>>: They do, but they don't work.
>> Nick Nikiforakis: Yes, any good ones. So I think that actually for the second paper that I
mentioned FPDetective, it was picked up by a lot of outlets and sort of everyone talked about it,
but the more they talked about it, the less correct it becomes so at the end you had websites
that invisibly tracked you even though you don't want to. It has nothing to do with
fingerprinting. So I guess it's a matter of presenting it to the world in a concise way and
essentially showing okay. Now you clicked, I will not accept third-party cookies, but what's left?
>>: I would agree that the easier you can make this for people to understand the more
attention it will get. However, I think for people to really start paying attention to this new
need, well maybe not you, but someone needs to concentrate on the [indiscernible]. And that's
why I'm trying to find some of the grass of who is actually using this tool so interestingly. So far
as you can show that these tools are being used for some nefarious purpose, then you will start
getting the right attention.
>> Nick Nikiforakis: The thing is that people are not very vocal about it. I send around 40 emails at the time and we got like two responses. One saying I cannot tell you anything and the
other one from the dating site telling me about the Sybil attacks. The thing is that there's a
server side component to all of this that we do not know how it works, so you have the
JavaScript. It runs. It generates a fingerprint. This thing is actually encrypted, so even if you
get it directly from the JavaScript library you cannot look into it as a first party website. You
have to still give it to the fingerprinting service and say what do you know about that user.
Then they give you back information about them and some of them claim that they have this
threats corp where they correlate the same fingerprint across many websites and they have a
reputation database and they say that guy, you know, he has a low, he is not trustworthy, so
you should not do business with him. But all of that is a thing we don't know about and I
cannot go online and subscribe to these things. I have to go through sales representatives and
pretend to be a company and well, we haven't done that yet. I don't know if we we'll succeed
in doing that, but it's much harder than for other services. Yeah?
>>: Fingerprinting versus just cookie usage, do you think fingerprinting is worse than thirdparty cookies because the end result is that these companies are tracking users. They're
building profiles. What do you think versus the two? I mean third-party cookies or cookies in
general are used by almost 100 percent of websites out there.
>> Nick Nikiforakis:) There is like multiple layers to this. The thing is that if I give you a unique
identifier and I know for a fact that it's you, so and you bring it back to me. I don't have to
correlate things and if I see two similar fingerprints decide that it's two users instead of one.
However, fingerprinting has some interesting properties. For example, it works when you
transit from the private mode into the non-private mode of your browser. Or for example, if
Flash is used, I guess that's probably similar with the Flash cookies that, you know, you have
fingerprinting across browsers because it goes through the Flash cookies that are shared by all
of your browsers in your system. Today if you would tackle the problem I would say just try to
stop or limit third-party cookies, but fingerprinting is currently not legislated in any way, so it
doesn't appear anywhere and people are now starting to think whether they should be. As long
as you have a lot of legal attention on the cookies, then there is this sort of window for people
to use fingerprinting to get the same results and not have to expose themselves as we use
something that tracks you.
>>: If you view fingerprinting as just a way to do cookies without cookies and you see it as an
attempt to circumvent legal restrictions, then you could classify this as malware. And if you
classify this as malware it seems like among the classes of malware out there, fingerprinting is
quite easy to fingerprint. There's a very clearly defined subset of things they do and there's
only so many tools for doing that. So if you wanted to fingerprint the fingerprinters is it that
hard or is it something that we could just do?
>> Nick Nikiforakis: We are fingerprinting the fingerprinters in the sense that we know who
they are at this point, but I know from analyzing the code that now they're minimizing their
JavaScript but they are not per se trying to hide more than that. The thing is that a lot of the
companies they come out and they say look you can use fingerprinting for fraud purposes, for
detecting fraud and I think this is not my area, but there are certain legal requirements for
banks in order to have some sort of anti-fingerprinting measures to protect their customers and
fingerprinting could fit that bill. It's not as clear cut and that's malware and that's done. And
we just consider it as malware. For instance, the native fingerprinting libraries that we isolated
for the two companies, we submitted them to virus total and we get 0 to 42 engines flagging
these as malware. That was a real deal. It was loaded into your browser for the sole reason of
fingerprinting you better, but none of the companies is saying this is adware or it's something
bad. They were all just flagging it as great. I think it's complicated to just call them malware
and just…
>>: Who do you talk to for legal on this?
>> Nick Nikiforakis: I just read around I guess and…
>>: Have you talked to [indiscernible]
>> Nick Nikiforakis: To who?
>>: Ben Edleman, he's a Harvard business school professor. He made a career as a grad
student exposing adware and he has both computer science skills and he's got a law degree and
an economics degree, so he knows exactly how to attack these guys.
>> Nick Nikiforakis: Okay. I'd be happy to talk with him. That sounds good.
>>: Okay. You want to be talking to him. I'll hook you up.
>> Nick Nikiforakis: Okay. Thank you.
>>: Do you have any information like about mobile devices versus nonmobile devices? I would
think that mobile devices would probably be less fingerprintable.
>> Nick Nikiforakis: We don't have solid results, but what we do know is that users they
customize their mobile browsers much less than they customize their desk browsers. For
instance, I talked to a guy from ING two weeks ago and they use fingerprinting as part of
protecting their users and they told me that fingerprinting does not work for mobile devices.
Do you have any good tricks that we can use? I said no [laughter]. At this point I think it's sort
of like an open question. It's probably true that they customize less, but I wonder if they have
special APIs that connect, for example, to the Android operating system that could be used also
for fingerprinting.
>>: Would fingerprinting be more effective if HTML 5 and stuff?
>> Nick Nikiforakis: There is some research, for example, people have looked to the canvas
element of HTML 5 where they just write some text in it and they read it back out as an image
and they showed that there is a difference when you do this operation on a Linux machine with
Chrome, for example, or on a Windows machine with Chrome. But the results are a bit like
there's a difference. We're not sure how to quantify it yet. It could be used, but that's it right
now. There was no like from the code that I analyzed, there was no evidence that they were
using that.
>>: So the porn sites, I'm assuming they're doing it because people go in to incognito mode
which doesn't allow for the dropping of cookies so they do it to track users?
>> Nick Nikiforakis: I think that's probably one reason. Our own theory is for the detection of
shared credentials so that you don't buy one subscription and share it with 100 people. They
detect whether the same user name and password combination is connected to more than one
real browsing environment.
>>: Do you see [indiscernible] sites using fingerprinting [indiscernible]
>> Nick Nikiforakis: Right. We haven't checked that one. The thing is that for crawling that's a
general problem in crawling that we have is that it's really hard to go past the authentication
wall. You will check for all of the pages that are publicly accessible, but once things are behind
a login formula problem, and I would suspect that maybe for Hulu and for Netflix that
fingerprinting starts maybe right after you login.
>>: That would be interesting because I know that HBO has said that, they know that people
share credentials. They watch HBO shows but they're okay with it because people have to
subscribe anyway, they're delegating the end person like cable subscriber even though the
subscriber's friends and family might be watching content also. And I think Netflix has like
about a 50 sign in limit or something, only 50 different users can sign in with the same ID at the
same time period I don't know how they detect that.
>>: It's an interesting hypothesis in terms of [indiscernible] friends you can get credentials
versus being able to detect. You need people because for privacy sensitive things like porn
people are going into incognito or other methods of managing their footprints.
>> Nick Nikiforakis: My thesis here is that fingerprinting happens on the client side so if you're
like this sophisticated attacker, you can claim that the script run and you can send back a
fingerprint and you can construct this to have exactly what you want in it and when they open
it up on the server side they will say okay. This thing run and it's a user located in that place in
the world, but you can falsify all of these. It's not easy to do it, but you can. For normal people
you can't. They don't know how to use search much less actually modify JavaScript libraries. So
I think that today even though the argument is that fingerprinting can be used a lot for fraud
detection, I think that if someone wants to go through he will easily be able to go through.
Fingerprinting seems to me to be working much more for the large part of the population that
will just browse websites. They will get fingerprinted and then they will get ads based on their
fingerprinting. Yeah?
>>: Do you see a trend in what they've been doing? Has it been going up or, how fast do you
think?
>> Nick Nikiforakis: We don't know. We do plan to use the FPDetective tool to use to do a
longitudinal study where we track it for a year and we tried to see whether companies are
trying to change or whether more sites are coming in. I have this sort of gut feeling that it's
growing, but I cannot give you numbers right now.
>>: Have you reached out to the extension neighbors to share your findings? What would their
reaction be?
>> Nick Nikiforakis: No. I haven't, so I cannot guess their reaction.
>>: [indiscernible]
>> Nick Nikiforakis: The thing is that we do not know whether these people are making, what
we do know that some of them are making, are saying oh this privacy, this privacy, don't use
our extension, but there's other guys that they just offer an extension that changes your user
agent. It could be that they are just trying to offer services towards people who try to access
specific websites that check the user agent and say I will not work unless you have that. But we
did find both academic research and underground guides suggesting the use of these and the
large numbers show that even if you download it and install it to access that website, you, this
thing still can be used against you on other websites that fingerprint you.
>> Ben Livshits: Are there further questions? Okay. Let's thank the speaker.
>> Nick Nikiforakis: Thank you. [applause]
Download