Detecting and Defending Against Third-Party Tracking on the Web Franziska Roesner, Tadayoshi Kohno, David Wetherall April 26, 2012 NSDI Third-Party Web Tracking Advertise.com Tracker.com Bigger browsing profiles = increased value for trackers = reduced privacy for users (Hypothetical tracking relationships only.) April 26, 2012 Franziska Roesner 2 Tracking is Complicated • Much discussion of tracking, but limited understanding of how it actually works. • Our goals: – Understand the tracking ecosystem. • How is tracking actually done in the wild? • What kinds of browsing profiles do trackers compile? • How effective are defenses available to users? – Address gaps with new defense (ShareMeNot). April 26, 2012 Franziska Roesner 3 Outline • How Tracking Works – Tracking Mechanisms – Tracking Taxonomy • Measurements • Defenses April 26, 2012 Franziska Roesner 4 Mechanisms Required By Trackers • Ability to store user identity in the browser – Browser cookies – HTML5 LocalStorage and Flash cookies (LSOs) – Not considering more exotic storage mechanisms or approximate fingerprinting • Ability to communicate visited page and user identity back to tracker – Identity: Cookies attached to requests – Visited page: HTTP referrers – Both: scripts that embed information in URLs April 26, 2012 Franziska Roesner 5 Tracking: The Simple Version • Within-Site: First-party cookies are used to track repeat visits to a site. • Cross-Site: Third-party cookies are used by trackers included in other sites to create profiles. tracker.com http://site1.com http://site2.com Cookie Database tracker.com: id=789 cookie: id=789 <iframe src=tracker.com/a d.html> ad cookie: id=789 </iframe> April 26, 2012 Franziska Roesner processing engine 9:30am: user 789 visited site1.com 9:31am: user 789 visited site2.com logs 6 Our Tracking Taxonomy Name N/A Scope User Visits Directly? Within-Site Yes Site does its own on-site analytics. Evolution: Embedding analytics libraries Analytics Within-Site No Vanilla Cross-Site No Evolution: Third-party cookie blocking Forced Cross-Site Yes (forced) Overview Site uses third-party analytics engine (e.g., Google Analytics). Site embeds third-party tracker that uses third-party storage (e.g., Doubleclick). Site embeds third-party tracker that forced the user to visit directly (e.g., via popup). Referred Cross-Site No Tracker relies on another cross-site tracker to leak unique identifier values. Personal Cross-Site Yes Site embeds third-party tracker that the user otherwise visits directly (e.g., Facebook). April 26, 2012 Franziska Roesner 7 Quirks of Third-Party Cookie Blocking • Option blocks the setting of third-party cookies: all browsers • Option blocks the sending of third-party cookies: only Firefox • Result: Once a third-party cookie is somehow set, it can be used (in most browsers). April 26, 2012 Franziska Roesner 8 Forced Tracking http://site1.com <iframe src=tracker.com/a d.html> tracker.com processing engine High-level point: On most browsers,adif a tracker can ever set a cookie, tracker.com: </iframe> popup id=321 third-party cookie blocking is rendered ineffective. Cookie Database http://site2.com <iframe src=tracker.com/a d.html> ad logs 1:30pm: site1.com: user 321 1:31pm: site2.com: user 321 </iframe> April 26, 2012 Franziska Roesner 9 Our Tracking Taxonomy Type (Name) N/A Scope User Visits Directly? Within-Site Yes Site does its own on-site analytics. Evolution: Embedding analytics libraries Analytics Within-Site No Vanilla Cross-Site No Evolution: Third-party cookie blocking Forced Cross-Site Yes (forced) Evolution: Complex ad networks Overview Site uses third-party analytics engine (e.g., Google Analytics). Site embeds third-party tracker that uses third-party storage (e.g., Doubleclick). Site embeds third-party tracker that forced the user to visit directly (e.g., via popup). Referred Cross-Site No Tracker relies on another cross-site tracker to leak unique identifier values. Personal Cross-Site Yes Site embeds third-party tracker that the user otherwise visits directly (e.g., Facebook). April 26, 2012 Franziska Roesner 10 Referred Tracking http://site1.com tracker.com processing engine <iframe src=tracker.com/a d.html> High-level point: ad One tracker with client-side state can enable tracker.com: </iframe> id=522tracking by partners without client-side state. othertracker.com/tra Cookie Database http://site2.com <iframe src=tracker.com/a d.html> ck?id=522&referrer= site1.com processing engine ad </iframe> othertracker.com/tra ck?id=522&referrer =site2.com April 26, 2012 Franziska Roesner othertracker.com logs 2:34pm: site1.com: user 522 2:35pm: site2.com: user 522 11 Our Tracking Taxonomy Type (Name) Scope N/A User Visits Overview Directly? Within-Site Yes Site does its own on-site analytics. Evolution: Embedding analytics libraries Analytics Within-Site No Vanilla Cross-Site No Evolution: Third-party cookie blocking Forced Cross-Site Yes (forced) Evolution: Complex ad networks Site uses third-party analytics engine (e.g., Google Analytics). Site embeds third-party tracker that uses third-party storage (e.g., Doubleclick). Site embeds third-party tracker that forced the user to visit directly (e.g., via popup). Referred Cross-Site No Tracker relies on another cross-site tracker to leak unique identifier values. Personal Cross-Site Yes Site embeds third-party tracker that the user otherwise visits directly (e.g., Facebook). Evolution: Social networks April 26, 2012 Franziska Roesner 12 Personal Tracking • Just loading these buttons (not clicking on them) enables tracking. • Users visit these sites directly. • This tracking is often not anonymous (linked to accounts). April 26, 2012 Franziska Roesner 13 Our Tracking Taxonomy Type (Name) Scope N/A User Visits Overview Directly? Within-Site Yes Site does its own on-site analytics. Evolution: Embedding analytics libraries Analytics Within-Site No Vanilla Cross-Site No Evolution: Third-party cookie blocking Forced Cross-Site Yes (forced) Evolution: Complex ad networks Site uses third-party analytics engine (e.g., Google Analytics). Site embeds third-party tracker that uses third-party storage (e.g., Doubleclick). Site embeds third-party tracker that forced Anonymous the user to visit directly (e.g., via popup). Referred Cross-Site No Tracker relies on another cross-site tracker to leak unique identifier values. Personal Cross-Site Yes Site embeds third-party tracker that the user otherwise visits directly (e.g., Facebook). Evolution: Social networks April 26, 2012 Franziska Roesner 14 Outline • How Tracking Works – Tracking mechanisms – Tracking taxonomy • Measurements • Defenses April 26, 2012 Franziska Roesner 15 Measurement Tool: TrackingTracker • • • • Firefox add-on Based on taxonomy of client-side mechanisms Crawls the web, automatically categorizing trackers Monitors: – Third-party requests – Cookies, HTML5 LocalStorage, Flash LSOs (considers state that changes across clean measurement runs) – Identifier leaks April 26, 2012 Franziska Roesner 16 Measurement Study • 3 data sets – Alexa Top 500 • 5 pages per domain: main page and up to 4 links – Alexa Non-Top 500 • Sites ranked #501, #601, #701, etc. • 5 pages per domain: main page and up to 4 links – AOL search logs • 300 unique queries for 35 random users April 26, 2012 Franziska Roesner 17 Tracking Prevalence (Top 500) • 524 unique trackers on 500 domains 457 domains (91%) embed at least one tracker. (97% of those include at least one cross-site tracker.) 50% of domains embed between 4 and 5 trackers. One domain includes 43 trackers. April 26, 2012 Franziska Roesner 18 -, ./ 01 21 9- .34 : ; 56 . 75 ?1 / 5.< - 8 5/ 5= $ 65 ; 72 -@ / 5 , - - = / >$ 1@ - 75 9@ - , - 8 / . B: 6/ 1 / 75- $ 12 @5A 8 >6 75 $ /@ - 8 >D C/ 75 $ <E 3</ 1 / @ 8 $ .9 >9 75<8 8 1 8 > 8 $ @D 21 75-@ ,/ 8 .9 @75 $ D <9 8 / 19 @/ 75- $ C/ C6 8 $ @4 5<7 6<2 2/ 19 , 7 >$ 9> 5A< 8 <2 19 675- $ C<> 2 8 / 8 F67 $ 6/ / 5@C 9<1 8 <2 75 $ , 3- 0636 8 $ 19 : >: 759> ; / 8 $ A< 75 65 - 8 ; . 927 $ 8 : / = 5- 8 /9 1 <1 <75- $ G. 8 /F $ 758 $ ,- ! "#$%&"'( "&) #*&+$&',- '. / 0 #1+23' ( %%$ ) %%$ *%%$ ! &%$ ! %%$ &%$ ! / 4'56'! "#$%&"2'/ +'! / 4'766'. / 0 #1+2' ) &%$ H <>A<20I <>/ $ *#+$ J@ - 660I <>/ $KL/ @ 6- 21.M $ J@ - 660I <>/ $KN2- 238 - : 6M $ *&%$ ! " #$ ! &( $! ( #$ ! %#$! %&$ #) $ " ! $ ' %$ ( &$ ( ( $( %$ ) ( $ ) ) $ April 26, 2012 Franziska Roesner ) *$) *$ ) %$*#$ *+$ *' $ %$ 19 *+ , 45 - . ,/ .- .0 8. +) 12 )9 ) 0 - 3" -. 7 59 ) ) 1. ) (9 7 6 ," ; * 8- 5 1. ) 52 9. : 6 " 38 1. -9 )6 3= <- 1. " /> ) 6 ?/- 5 - 91 " ,( 3( . ) 6 /6 6 5 6 3 " 9= 25 1. ) ) 9 7- 6 " ,( 91. = /( ) 6 " 5( 9- 1. ) <- <8 6 " 9@ . /12 8 5( /271 3" ( 3 .) :/ 6" /2 5( 81. ) </3 2 6 - A8 " 8- 6 - 1. ) 9< ( /5 6 " /2 1. 7 ) ?) B8?8 6 " 5( *3* 1. ) ( 3 +- 6 " : / 1. 8. ) 6 + , ( 21 " 6 *- 0 . ) 6 -( 5 /5 /1. ) " C, 6 -A " 1. ) #) 6 " D1 23" () ! "#$%&'( )*&'+, '! - . &/0' 12 3'4 /&"/5'! "#$%&'( )*&/'67'8#9': ; '<"#//=( )>&'8"- ?@&"/' #! ! " ' &! " ' %! " ' $! " ' #! " ' !!" &! " %! " $! " #! " !" Doubleclick: Avg 39% (Max 66%) Facebook: Avg 23% (Max 45%) April 26, 2012 Franziska Roesner Each line represents one user. Google: Avg 21% (Max 61%) 20 LocalStorage and Flash Cookies • Surprisingly little use of these mechanisms! • Of 524 trackers on Alexa Top 500: – Only 5 set unique identifiers in LocalStorage – 35 set unique identifiers in Flash cookies • Respawning: – LS Cookie: 1 case; Cookie LS: 3 cases – Flash Cookie: 6 cases; Cookie Flash: 7 cases April 26, 2012 Franziska Roesner 21 Outline • How Tracking Works – Tracking mechanisms – Tracking taxonomy • Measurements • Defenses April 26, 2012 Franziska Roesner 22 Defenses to Reduce Tracking • We explore several in the paper: – Third-party cookie blocking – Do Not Track header – Popup blocking – Clearing client-side state – Disabling JavaScript – Private browsing mode April 26, 2012 Franziska Roesner 23 &%$ "#$%&"'( &"'( "&) "&) #* #*&+$&',&+$&',- '.'. // 00 #1 #1+23' +23' !! "#$% %%$ //4'56'! "#$% & "2'/+'! +'!& /"#$% 4'766'. #1 +2',! 81 /#1 1 &1& 2'>* / $% &93' ! /!!4'56'! "#$% "2'/ /"#$% 4'766'. // 0/+'! +2',! 81"9:( "9:(/#";<'=/ #";<'=/ /%% 2'>* / $% &93' 4'56'! "#$% "2'/ +'! 4'766'. 0//#1 4'56'! & "2'/ +'! 4'766'. #1 +2' !! //&4'56'! & "2'/ //4'766'. 00 +2' +2' (% %% %$$ &**# (&++# (no cookie blocking) &% &% % +# $$ ()()% *# @ -- JI660 <$ >/ L/ 621.M $ #$ @ 660 I$<K#>/ #$K@ L/ 6-21.O # J@ -LJ660 <->/IK> K</L/ @ 621.M $21.M @ 660 M N/ @ 6-@ !*#+$ ' *# *#+$ )**# %% %$$ !*#+$ ) ( % ++# ( ' +# %%$ !*&% %*# +# $$ !*&% % &%$ %%$ &%$ ,- -, %$ #$ !! ""#$ ! " #$ $%! #!$% !&( &(!#$#!$&' $!(( #$ #$ # $&' ! &% &% $ !$% $% +# $ *#! &( $ Referred Tracker Forced Tracker ! ( #$ %#$ #$ !!% %&$ &$"#)&#$ !!% " &# #) $ ""!!$$ ! % % $ !$++# %%$ $**# ! %#$! %&$ %$$ '' % #) $ " ! $ &$(((($$((% ((&$ %$$)()(&# &% &% % +# $$ ($$))))$$))*$) *#$ % *# *$)*$ *$!)))%%#$$!*#$ " # *+$*'*'$$ ( &# !$&# " # ! ) #' !%*# $ $&# $&# $! # $! #$$# $$# $*#$*#$*# " # ' *+$ # '# ( &$ ( ( $( %$ ) ( $ ) ) $ %$$ % +# ) *$) *$ ) %$*#$ *+$ *' $ *# %$$ !*% ++# % !*% **# - @ >@AE/ 5--88 # 8 A <2 1G9@/ <67755-- 8 #$$ C<>< 266 8 # / 18- F6775- #$ 6/ /: 1 5- 8 @C 9<-17 8 # <2 5 755- #$ , 02H - 8 6 H 8# 3 1A 1 - : 367755- #$ C. 91.> >: DD - 88 # I @ 9>8 ; /75 #$ >/ A<618 75- 8 2A 5 1 - 8 # ; F. 2 92775- #$ 8 : /A=/ @5- 88 # / 9 H 1 75 #$ <1G/ A <75-- 8 ./- 75 8 # F75- #$ - 88 # #$ %%$ J@ - J660 <>/I$<K>/ N2238 @ -M $ @ -I660 #$KN>A/ #$: 6M @ - I660 #M P 238 <B/ @ O J@ -L660 <>/K> $K</N2-#: 6M $ ./ 65 01 21 -@ /5 99-1 .34 1@ :5;/ 567 9@ 75 / 6 ?1 .:/-5-.< 5-- 88 B: 65 / 1 5,/-; ;57=57 ##$ 12- @/ @5A --, -. -28/ >6 51 75<,=- /=7755 >##$ / @@9 - 8>-? -- 8 C/ @/16A $,/./@7 8 ##$ B1:A75- /A1< 755-- 8 B5> 8 1A2<B8 @ 3</ >6>6 $ A67755- ##$ .9 AC2 - 88 # 3- /5@ 8 >CD< /7755- #$ <8 1 <CE: - 88 @D 213 - @ <,/</.@> 1: >1 //@7755- ##$ : 9> .9 < 9@ D8 D 87511-.9C A8C>75 88 ##$ <9@DC 2816> 75-- 8 / -71- 2 ,$-/2 8 # 5@ 19 7 @-.<996/ / @ 8DD@<E @77555---88 #$ C/ @4 1A $<99///775 8 ##$$ 6<2119 @-/ : 755---8 ,97CC.>//2@; C6/575 88 ##$$ 5 F 8-@844/66A<2>2 <7-28/ # < > = 1969/ $2,,77755- #$ &%$ >A< <>/ H I<20 >A< 20 H <>A< <>/I 20I $</I$<#>/ $# JH <>20 <B> 20K> April 26, 2012 Franziska Roesner 24 Personal Tracking Revisited • Most popular, based on measurements: – Facebook, Google, Twitter, AddThis, YouTube, LinkedIn, Digg, Stumbleupon • Third-party cookie blocking is ineffective. • Existing browser extension solutions remove the buttons (undesirable to some users). • Can we reduce tracking but allow use? April 26, 2012 Franziska Roesner 25 ShareMeNot http://sharemenot.cs.washington.edu • A browser extension that protects against tracking from third-party social media buttons while still allowing them to be used. • Firefox version: removes cookies from relevant requests until user clicks button. – Similar: Priv3 Firefox add-on • Chrome version: replace buttons with local stand-in button until user click. April 26, 2012 Franziska Roesner NEW 26 Effectiveness of ShareMeNot (Top 500) Tracker Facebook Without ShareMeNot With ShareMeNot 154 9 Google Twitter AddThis 149 93 34 15 0 0 YouTube LinkedIn Digg Stumbleupon 30 22 8 6 0 0 0 0 April 26, 2012 Franziska Roesner 27 Summary • Introduced taxonomy of tracking behavior for any client-side identifiers. – Analytics, Vanilla, Forced, Referred, Personal • Studied tracking in the wild with browser measurements. – Revealed rich tracking ecosystem. – Results can assist informed broader discussions. • Developed ShareMeNot, a new privacyenhancing defense for personal tracking. http://sharemenot.cs.washington.edu/ April 26, 2012 Franziska Roesner 28