Advertising in Online Social Networks How do ad revenue work Types of Ads • Search Ads • Contextual Ads Learning User profiles Trackers link user activities to form large user profiles SIGCOMM 2013 4 Implications of Giving Users Control • Cons: • Pros: Personalization Better Security Lack of Privacy Revenue for Service SIGCOMM 2013 5 Issues with Ads • Click-Spam • Protecting User Privacy Estimating Click-spam – Main Idea How many ? Equate ratios of buyers to non-spammers Both non-spammers and spammers click ads A fraction of non-spammers buy Black box Both non-spammers and spammers click ads Lose spammers and some non-spammers Some non-spammers buy Estimating Idea DissectingClick-spam Black box – Main Hurdles How many Hurdle Both non-spammers and fraction of non-spammers Spammers and non-spammers ExtraAclick required to view spammers buy click on an click ad ads site ? Equate ratios of buyers to non-spammers Some spammers and Non-spammers see the content Different hurdles have different hardness 5 sec wait, Click to continue Black box Send only a fraction of traffic through hurdles To minimize impact on user experience Lose spammers and some Both non-spammers and block all Perfect hurdle would spam non-spammers 10spammers click ads In reality, some spammers get through (False Negatives) Some non-spammers buy Dissecting Ads[1] Estimating Click-spam Main Idea DissectingBlack Blackbox box- –Bluff Hurdles Bluff Ads How many ? Junk ad text with normal keywords, same targeting Equate ratios of buyers Normal users unlikely to click to non-spammers Hurdle Both non-spammers and Normal fraction of non-spammers Spammers and non-spammers ExtraAclick required to view spammers buy click on an click ad ads site Bluff Some spammers and Non-spammers see the content Different hurdles have different hardness 5 sec wait, Click to continue Hurdle box Black Send only a fraction of traffic through hurdles To minimize impact on user experience Spammers and curious users click on an ad and Both non-spammers 11 Perfect 10spammers Lose spammers and some hurdle would block all spam click ads non-spammers Some spammers and Some buy users maynon-spammers see the content In reality, some spammers get through (False Negatives) [1] Fighting online click fraud using bluff ads [CCR 2010] Dissecting Ads[1] Estimating Click-spam Main Idea DissectingBlack Blackbox box- –Bluff Hurdles Bluff Ads How many ? Junk ad text with Negative normal keywords, targeting Maximum False rate same known for each Equate ratios of buyers Normal users unlikely to click hurdle to non-spammers Hurdle Can be subtracted out Both non-spammers and Normal fraction of non-spammers Spammers and non-spammers ExtraAclick required to view spammers buy click on an click ad ads site Bluff Some spammers and Non-spammers see the content Different hurdles have different hardness 5 sec wait, Click to continue Hurdle box Black Send only a fraction of traffic through hurdles To minimize impact on user experience Spammers and curious users click on an ad and Both non-spammers 12 11 Perfect 10spammers Lose spammers and some hurdle would block all spam click ads non-spammers Some spammers and Some buy users maynon-spammers see the content In reality, some spammers get through (False Negatives) [1] Fighting online click fraud using bluff ads [CCR 2010] Dissecting Ads[1] Estimating Click-spam Main Idea DissectingBlack Blackbox box- –Bluff Hurdles Bluff Ads How many ? Junk ad text with Negative normal keywords, targeting Maximum False rate same known for each Equate ratios of buyers Normal users unlikely to click hurdle to non-spammers Hurdle Can be subtracted out Both non-spammers and Normal fraction of non-spammers Spammers and non-spammers ExtraAclick required to view spammers buy click on an click ad ads site Bluff Some spammers and Non-spammers see the content Different hurdles have different hardness 5 sec wait, Click to continue Hurdle box Black Send only a fraction of traffic through hurdles To minimize impact on user experience Spammers and curious users click on an ad and Both non-spammers 12 11 Perfect 10spammers Lose spammers and some hurdle would block all spam click ads non-spammers Some spammers and Some buy users maynon-spammers see the content In reality, some spammers get through (False Negatives) [1] Fighting online click fraud using bluff ads [CCR 2010] Dissecting box Ads[1] Estimating Click-spam Main Idea Dissecting Uh-oh. Black How Black do box we- –Bluff validate? Hurdles 14 Bluff Ads How many ? Junk ad text with Negative normal keywords, same targeting Maximum False known for each No groundrate truth! Equate ratios of buyers Normal users unlikely to click hurdle to non-spammers Hurdle Can be subtracted out Compare against search ads on Google and Bing Both non-spammers and Normal fraction of non-spammers Spammers and non-spammers ExtraAclick required to view spammers buy click on an click ad ads site Bluff Some spammers and Non-spammers see the content Different hurdles have different hardness 5 sec wait, Click to continue Hurdle box Black Send only a fraction of traffic through hurdles To minimize impact on user experience Spammers and curious users click on an ad and Both non-spammers 12 11 Perfect 10spammers Lose spammers and some hurdle would block all spam click ads non-spammers Some spammers and Some buy users maynon-spammers see the content In reality, some spammers get through (False Negatives) [1] Fighting online click fraud using bluff ads [CCR 2010] Dissecting Black - –Bluff Ads[1] Results – Validation using search ads Estimating Click-spam Main Idea Dissecting Blackbox box Hurdles Bluff Ads Valid Traffic Fraction (Normalized) F r a c tio n v a lid ( n o r m .) 15 Adwith Network’s Junk ad text normalEstimate keywords, Maximum False Negative rate Normal users c unlikely to click 1 .2 5 e le b r ity hurdle Hurdle y o g a la w n m o w e r Can1be subtracted out How many ? Our each Estimate same targeting known for Equate ratios of buyers to non-spammers Both non-spammers and Normal fraction of non-spammers Spammers and non-spammers ExtraAclick required to view 0 . 7 5 spammers buy click on an click ad ads site Bluff Some spammers and Non-spammers see the content 0 .5 Different hurdles have different hardness 5 sec wait, Click to continue Hurdle box Black 0 .2 5 Send only a fraction of traffic through hurdles To minimize impact on user experience Spammers 0 and curious users click on an ad and Both non-spammers Some spammers C and Lose spammers and some Some buy users maynon-spammers see the 12 11 Perfect hurdle would block all spam Ad Networks non-spammers content 10spammers click ads In reality, some spammers through (False Negatives) Clicks charged get are close to the estimated valid clicks A B [1] Fighting online click fraud using bluff ads [CCR 2010] Dissecting Ads[1] Estimating Click-spam Main Idea DissectingBlack Blackbox box- –Bluff Hurdles Bluff Ads How many ? Junk ad text with Negative normal keywords, targeting Maximum False rate same known for each Equate ratios of buyers Normal users unlikely to click hurdle to non-spammers Hurdle Can be subtracted out Both non-spammers and Normal fraction of non-spammers Spammers and non-spammers ExtraAclick required to view spammers buy click on an click ad ads site Bluff Some spammers and Non-spammers see the content Different hurdles have different hardness 5 sec wait, Click to continue Hurdle box Black Send only a fraction of traffic through hurdles To minimize impact on user experience Spammers and curious users click on an ad and Both non-spammers 12 11 Perfect 10spammers Lose spammers and some hurdle would block all spam click ads non-spammers Some spammers and Some buy users maynon-spammers see the content In reality, some spammers get through (False Negatives) [1] Fighting online click fraud using bluff ads [CCR 2010] Dissecting Ads[1] Estimating Click-spam Main Idea DissectingBlack Blackbox box- –Bluff Hurdles Bluff Ads How many ? Junk ad text with Negative normal keywords, targeting Maximum False rate same known for each Equate ratios of buyers Normal users unlikely to click hurdle to non-spammers Hurdle Can be subtracted out Both non-spammers and Normal fraction of non-spammers Spammers and non-spammers ExtraAclick required to view spammers buy click on an click ad ads site Bluff Some spammers and Non-spammers see the content Different hurdles have different hardness 5 sec wait, Click to continue Hurdle box Black Send only a fraction of traffic through hurdles To minimize impact on user experience Spammers and curious users click on an ad and Both non-spammers 12 11 Perfect 10spammers Lose spammers and some hurdle would block all spam click ads non-spammers Some spammers and Some buy users maynon-spammers see the content In reality, some spammers get through (False Negatives) [1] Fighting online click fraud using bluff ads [CCR 2010] Dissecting Ads[1] Estimating Click-spam Main Idea DissectingBlack Blackbox box- –Bluff Hurdles Bluff Ads How many ? Junk ad text with Negative normal keywords, targeting Maximum False rate same known for each Equate ratios of buyers Normal users unlikely to click hurdle to non-spammers Hurdle Can be subtracted out Both non-spammers and Normal fraction of non-spammers Spammers and non-spammers ExtraAclick required to view spammers buy click on an click ad ads site Bluff Some spammers and Non-spammers see the content Different hurdles have different hardness 5 sec wait, Click to continue Hurdle box Black Send only a fraction of traffic through hurdles To minimize impact on user experience Spammers and curious users click on an ad and Both non-spammers 12 11 Perfect 10spammers Lose spammers and some hurdle would block all spam click ads non-spammers Some spammers and Some buy users maynon-spammers see the content In reality, some spammers get through (False Negatives) [1] Fighting online click fraud using bluff ads [CCR 2010] Dissecting Ads[1] Estimating Click-spam Main Idea DissectingBlack Blackbox box- –Bluff Hurdles Bluff Ads How many ? Junk ad text with Negative normal keywords, targeting Maximum False rate same known for each Equate ratios of buyers Normal users unlikely to click hurdle to non-spammers Hurdle Can be subtracted out Both non-spammers and Normal fraction of non-spammers Spammers and non-spammers ExtraAclick required to view spammers buy click on an click ad ads site Bluff Some spammers and Non-spammers see the content Different hurdles have different hardness 5 sec wait, Click to continue Hurdle box Black Send only a fraction of traffic through hurdles To minimize impact on user experience Spammers and curious users click on an ad and Both non-spammers 12 11 Perfect 10spammers Lose spammers and some hurdle would block all spam click ads non-spammers Some spammers and Some buy users maynon-spammers see the content In reality, some spammers get through (False Negatives) [1] Fighting online click fraud using bluff ads [CCR 2010] Dissecting Ads[1] Estimating Click-spam Main Idea DissectingBlack Blackbox box- –Bluff Hurdles Bluff Ads How many ? Junk ad text with Negative normal keywords, targeting Maximum False rate same known for each Equate ratios of buyers Normal users unlikely to click hurdle to non-spammers Hurdle Can be subtracted out Both non-spammers and Normal fraction of non-spammers Spammers and non-spammers ExtraAclick required to view spammers buy click on an click ad ads site Bluff Some spammers and Non-spammers see the content Different hurdles have different hardness 5 sec wait, Click to continue Hurdle box Black Send only a fraction of traffic through hurdles To minimize impact on user experience Spammers and curious users click on an ad and Both non-spammers 12 11 Perfect 10spammers Lose spammers and some hurdle would block all spam click ads non-spammers Some spammers and Some buy users maynon-spammers see the content In reality, some spammers get through (False Negatives) [1] Fighting online click fraud using bluff ads [CCR 2010] Dissecting Ads[1] Estimating Click-spam Main Idea DissectingBlack Blackbox box- –Bluff Hurdles Bluff Ads How many ? Junk ad text with Negative normal keywords, targeting Maximum False rate same known for each Equate ratios of buyers Normal users unlikely to click hurdle to non-spammers Hurdle Can be subtracted out Both non-spammers and Normal fraction of non-spammers Spammers and non-spammers ExtraAclick required to view spammers buy click on an click ad ads site Bluff Some spammers and Non-spammers see the content Different hurdles have different hardness 5 sec wait, Click to continue Hurdle box Black Send only a fraction of traffic through hurdles To minimize impact on user experience Spammers and curious users click on an ad and Both non-spammers 12 11 Perfect 10spammers Lose spammers and some hurdle would block all spam click ads non-spammers Some spammers and Some buy users maynon-spammers see the content In reality, some spammers get through (False Negatives) [1] Fighting online click fraud using bluff ads [CCR 2010] Protecting User privacy Trackers Link User Requests Multiple requests are linkable by remote trackers, if they share the same identifiers. User Req. 1 (128.208.7.x), header: cookie(…) Tracker Req. 2 (128.208.7.x), header: cookie(…) • Important identifiers for Web tracking: – Application info. (cookie, JS localstorage, Flash) – IP Address SIGCOMM 2013 23 Approach: Pseudonym Abstraction • Pseudonym = A set of all identifying features that persist across an activity • Allow a user to manage a large number of unlinkable pseudonyms – User can choose which ones are used for which operations. Pseudonym1 Alice Medical information Tracker Cookie1 IP1 Pseudonym2 Cookie2 Location-related (Alice’s home) IP2 SIGCOMM 2013 24 How We Want to Use Pseudonyms Alice 1. Application-Layer Design Application Policy Engine Pseudonym1 Tracker Medical Cookie1 IP1 IP IP IP IP1 Pseudonym2 OS Cookie2 Location IP2 DHCP Routers 2. Network-Layer Design SIGCOMM 2013 25 Application-Layer Design • Application needs to assign different pseudonyms into different activities. – How to use pseudonyms depends on user and application. – APIs are provided to define policies. • Policy in Web browsing: a function of the request information and the state of the browser. – Window ID, tab ID, request ID, URL, whether request is going to the first-party, etc. SIGCOMM 2013 26 Sample Pseudonym Policies for the Web Article on Politics P1 news.com P2 facebook.com facebook.com P3 • Default: P1 = P2 = P3 • Per-Request: P1 != P2 != P3 • Per-First Party: P1 = P2 != P3 SIGCOMM 2013 27 Sample Pseudonym Policies for the Web Article on Politics P1 news.com P2 facebook.com facebook.com P3 • Default: P1 = P2 = P3 • Per-Request: P1 != P2 != P3 • Per-First Party: P1 = P2 != P3 SIGCOMM 2013 28 Sample Pseudonym Policies for the Web Article on Politics P1 news.com P2 facebook.com facebook.com P3 • Default: P1 = P2 = P3 • Per-Request: P1 != P2 != P3 • Per-First Party: P1 = P2 != P3 SIGCOMM 2013 Facebook cannot know the user’s visit to news.com 29 Pseudonyms in Action Alice Tracker Application Policy Engine Pseudonym1 Cookie1 IP1 IP IP IP IP1 Pseudonym2 OS Cookie2 IP2 DHCP Routers 2. Network-Layer Design SIGCOMM 2013 30 Network-Layer Design Consideration 1. Many IP addresses for an end-host 2. Proper mixing 3. Efficient routing 4. Easy revocation 5. Support for small networks SIGCOMM 2013 31 Network-Layer Design Consideration 1. Many IP addresses for an end-host 2. Proper mixing 3. Efficient routing 4. Easy revocation 5. Support for small networks SIGCOMM 2013 32 1) IPv6 Allows Many IPs per Host 128bits IPv6 Address Small networks get /64 address space (1.8e19) SIGCOMM 2013 33 2, 3) Symmetric Encryption for Mixing and Routing 128bits IPv6 Address Network Prefix To route the packet “within” the network To route the packet “to” the network Networks can use this part as they want SIGCOMM 2013 34 2, 3) Symmetric Encryption for Mixing and Routing 128bits Base Network Prefix Subnet Use symmetric-key encryption Encrypted Network Prefix Host Encrypt Pseudonym Decrypt Encrypted ID • End-hosts know only encrypted IP addresses • Router uses the base addresses to forward packets – By longest-prefix matching with subnet::host, thus, the size of routing table does not change. SIGCOMM 2013 35 Routing Example Prefix Internet Encrypted ID Sub::Host::Pseudo Sub::Host::Pseudo SIGCOMM 2013 ISP ( Prefix :: … ) 36 Determining Worthiness of a user • Categorize the user – Infer intent from web-site visists • Determine intent (willingness to buy) Cost Of Preserving Privacy