Online Ads

advertisement
Lecture 21: Privacy and Online
Advertising
References
• Challenges in Measuring Online Advertising
Systems by Saikat Guha, Bin Cheng, and Paul
Francis
• Serving Ads from localhost for Performance,
Privacy, and Profit by Saikat Guha, Alexey
Reznichenko, Kevin Tang, Hamed Haddadi, and
Paul Francis
Problem
• Online advertising funds many web services
– E.g., all the free stuff we get from Google
• Ad networks gather much user information
• How do they use the user information?
Goals
• Determining how well ad networks target
users
Methodology
• Creating two clients representing two
different user types
• Measuring the different ads each client sees
Challenges
• How to compare ads
• How to collect a representative snapshot of
ads
• Quantifying the differences
• Avoiding measurement artifacts
Comparing Ads is challenging
• Ads don’t have unique IDs
• A & B are semantically the same, but with
different text
• A & C are different, but with same display URLs
How to define two ads are the same?
• Easy but illegal approach: comparing destination
URLs
– FP: flagged as equal but not
– FN: equal but not flagged
• Display URL has the lowest FNs  Use display
URL to define ads equality
Taking a Snapshot
• More ads can be displayed on any single page
• How to determine all Ads that may be fed to a
user?
– Reload the page multiple times
– But too many reloads may lead to ads churn: old
ads expire, new ads show up
Determining the # of reloads
• Reloads every 5 seconds
• Repeated for 200 queries
• Curve becomes linear > 10 reloads
– Ads churns
• Use 10 reloads as the threshold
Quantifying Change
• Metrics
– Jaccard index: | A  B |
| A B |
– Extended Jaccard index (cosine similarity)
Comparing Effectiveness
• Views: # of page reloads containing the ad
• Value: # of page reloads scaled by the position of
the ad
• Overlap: Jaccard index
Comparing Effectiveness
The winner is
• Weight: log(views) or log(value)
Avoiding artifacts
• Different system parameters may lead to different
ads view
– Browsers used different DNS servers
– Browsers receive different cookies
– HTTP proxy
Analysis
• Configure two or more instances to differ by
one parameter
• Comparing results for
– Search Ads
– Website Ads
– Online Social Network Ads
Search Ads
•
•
•
•
A, B: control w/o cookies
C, D: w/ cookies enabled. Seeded w/ different personae
Google 730 random product-related queries for 5 days
No obvious behavioral targeting in search ads. Why?
– Keyword based ads bidding
• Location targeting not studied
Websites Ads
•
•
•
•
Measure 15 websites that show Google ads
A, B: control in NY
C: SF; D: Germany
Location affects web ads
Website Ads
• A, B: control
• C: browse 3 out of 15 websites
• D and E: browse random websites and Google search
random websites
• Google does not use browsing behavior to pick ads
Online social network ads
• Set up three or more Facebook profiles
• A, B: control and identical
• C: differs from A by one profile parameter
Online social network ads
• Use all profile parameters to customize ads
• Age and gender are two primary factors
• Diurnal patterns due to ads churn
– Should it increase or decrease?
• Education and relationship matter less, except
for engaged and non-engaged women
Checking Impact of Sexual Preference
• Six profiles with different sexual preferences
• Two males interested in females (male
control)
• Two females interested in males (female
control)
• One male interested in male
• One female interested in female
Ads differ by sexual preferences
Other results
• Found neutral ads targeted exclusively to gay
men
• Clicking would reveal to the advertiser a
user’s sexual preference
• 66 ads shown exclusively to gay men more
than 50 times during experiments
Summary
• Search ads are largely key-word based so far
• Websites ads use location but probably not
behavior
• Social network ads use all profile attributes to
target users
Question: how can we design a
privacy-preserving online advertising
system?
Goals
• Support online advertising
– A good revenue source to fund online services
• Preserve user privacy
PrivAd
• Serving Ads from a localhost client
• Actors: user, publisher, advertiser, broker, and
dealer
How it works
• Advertisers upload ads to broker
• User client subscribes to a set of the ads
according to the user’s profile to the broker
– Message encrypted with Broker’s public key and
contains a symmetric private key
• The Broker sends filtered ads to the user client
– Ads are encrypted with the symmetric key
• Dealer anonymizes the client’s message to Broker
Ad View/Click Reporting
• When a user clicks an ad, the user client sends
a view/click report containing ad ID and
publisher ID to the broker via the dealer
• Dealer attaches a unique report ID, removes
client identity information, maps the ID to the
user identity information
Click-fraud defense
• Broker provides dealer the record IDs if it
suspects click-fraud
• The dealer finds the user
• The dealer stops relaying ads to user if convinced
• Questions not answered: how to detect by
broker, and what’s the punishment
Defining User Privacy
• Unlinkability
– No single player can link the identity of user with
any piece of user’s profile
– No single player can link together more than some
limited number of pieces of personalization
information of a given user
• The dealer learns User A clicks on some ad
• The broker learns someone clicked on ad X
• Not robust to dealer/broker collusion
Scaling PrivAd
• Ads churn is significant
• 2GB/month of compressed ad data
Discussion
• What challenges does PrivAd may face in a
practical deployment?
Download