Uploaded by gahep21781

Interoperable Private Attribution (IPA) A Non-technical introduction

advertisement
A NON-TECHNICAL INTRODUCTION TO
Interoperable Private
Attribution (IPA)
Ben Savage (Meta), Erik Taubeneck (Meta),
Martin Thomson (Mozilla)
NON-TECHNICAL INTRODUCTION TO IPA
CONTENTS
Introduction
3
The current system
5
Comparing proposals
8
Innovative technologies
11
Explaining IPA in 6 steps
19
●
A single trusted server
20
●
Transforming the Global ID
23
●
Two (semi) trusted servers
26
●
Adding Differential Privacy
29
●
Managing a privacy budget
33
●
Extending the threat model
37
IPA use cases
43
This presentation complements our proposal document published here.
2
INTRODUCTION
Advertisers need accurate reporting about
how their ad campaigns are performing.
Currently, businesses use data about the
people who viewed their ads and bought their
products to determine ‘return on ad spend’.
But the ecosystem is moving towards more
privacy and less personal data sharing.
NON-TECHNICAL INTRODUCTION TO IPA
3
INTRODUCTION
How can we provide companies with accurate
reporting while sharing less data?
Interoperable Private Attribution is a
proposed system that would enable accurate
ad measurement while ensuring user privacy.
NON-TECHNICAL INTRODUCTION TO IPA
4
THE CURRENT SYSTEM
Here’s how ad
measurement is done
today (status quo)
Matching global IDs
Using a Global ID to compare
impressions with purchases
THE CURRENT SYSTEM
In the current system, every user has a unique identifying
number. That identifying number is recorded every time they
click on an ad or make a purchase.
Global ID
6
Ad-tech companies can see those identifying
numbers to determine how many people made a
purchase after seeing an ad.
Global ID
In its present form, this system means sharing a
large volume of personal data with advertisers.
THE CURRENT SYSTEM
One current solution is to ask for consent.
However, asking for consent has
its challenges...
It is difficult for a cookie consent
prompt to explain the full context of
this decision for users.
Without fully understanding what they
are being asked, users might share more
data than they would like, or opt out due
to a lack of understanding.
People have consent fatigue from being
asked too frequently.
7
What are the other proposals for ad
measurement?
THE CURRENT SYSTEM
One existing tool is Apple’s SKAdNetwork (SKAN) which
measures whether ads for mobile apps lead to installations.
1
2
3
4
When a person clicks on an ad for a
mobile app, SKAN makes a note on
that person’s device.
When a person installs an app, SKAN
checks to see whether they have
previously clicked an ad for that app.
If there is a match, SKAN
generates a ‘Postback’ report
which it sends to Apple.
Apple conceals the identity of the
user and sends this Postback report
on to the ad seller and (optionally)
the ad buyer as well.
8
Challenges with the SKAN model
Because SKAN sends out one report for each individual
conversion, the system risks revealing the identity of the buyer.
There are a number of mitigations in place to try to prevent this,
but they do not always work and negatively impact useability.
THE CURRENT SYSTEM
Similar challenges exist for other tools and
proposals like Apple’s “Private Click
Measurement” and Chrome’s “Attribution
Reporting API” proposal.
Timer
Limited
campaign IDs
No cross-device
counting
SKAN delays sending ‘Postback’ reports
by a random duration between 24-48
hours. This means ad buyers don’t get
results on their ads for at least two days,
which makes it hard to be responsive.
Including too much detailed information
in ‘Postback’ reports could identify
individual users, so SKAN limits the
number of times ad buyers can break
down their ad campaign in reporting. This
makes it difficult to get detailed metrics,
and does not completely resolve the
privacy risk.
With approaches like SKAN, ad
impressions and ad conversions are
connected on the user’s device. This
means it is impossible to measure crossdevice conversions.
9
How is the IPA proposal different?
Instead of generating one report per
attributed conversion, IPA generates
aggregate reports for batches of events.
Instead of connecting ad impressions and ad
purchases on a user’s device, IPA makes these
connections within a Secure Multiparty
Computation (MPC).
10
At the core of IPA are two key ideas that
differ from previous approaches
NEW TECHNOLOGY DRIVING IPA
A
B
Match Keys
Matching in MPC
A secure identifier that can be set by
apps and websites people commonly
log-in to across devices.
Matching of ad interactions and
conversions happens server-side,
within MPC, rather than on-device.
11
A
Match Keys
A democratised, write-only identifier that anyone
can set, but also anyone can benefit from.
Since only the browser/OS can read the match
key, and the actual value is never revealed to
anyone, it cannot be used for tracking or profiling.
It can only be used within a specific MPC for the
purpose of aggregate conversion measurement.
NEW TECHNOLOGY DRIVING IPA
12
How match keys work
Apps with large reach may
choose to set a match key when
people log in to their products
(on both app and web).
If people sign in to the same
account across multiple devices,
the same match key can be set.
NEW TECHNOLOGY DRIVING IPA
Any app or website can select a
list of match key providers they
want to use e.g.
[“facebook.com”, “google.com”,
“twitter.com”]
Your selected match keys
13
Encrypted impression and
conversion reports will use the
specified match keys (if they are
set on that device). Conversions
and impressions will match up in
the MPC if at least one matchkey is the same.
How Match Keys improve on
existing solutions
Match keys vs IDFA
The ID For Advertising (IDFA) is a unique number for each iOS device.
NEW TECHNOLOGY DRIVING IPA
Match keys vs third party cookies
Ad-tech companies set third-party cookies in web browsers to
track user behaviour, including ad impressions and purchases.
The IDFA is readable (with user permission), and thus
can be used to profile and track people
Match keys are never seen, so they cannot be used
to profile and track.
Third party cookies can be used for tracking and
profiling people.
Match keys are never seen, so they cannot be used
to profile and track.
Apple sets the value of the IDFA
Match keys can be set by any app or website
Any company can set a third-party cookie, but only
they can read it.
Match keys can be used, not read, by anyone.
The IDFA is device-scoped, so it can’t be used to
measure cross-device purchases.
Match keys can be set to the same value across
multiple devices.
14
B
Matching in MPC
Matching of ad impressions and conversions
happens server-side, within a Secure Multiparty
Computation (MPC).
The actual values of the match-keys are hidden
from the MPC itself.
This approach eliminates an entire category of
privacy risks approaches like SKAN face.
It also enables cross-device conversion
attribution
NEW TECHNOLOGY DRIVING IPA
15
How matching in MPC works
Match keys are stored
privately by the browser /
mobile device. Apps and
websites cannot read the
value.
The browser / mobile device
encrypts information about
the impressions or
conversions, including the
match key. Apps and
websites have to send this
information to the MPC to
perform matching.
NEW TECHNOLOGY DRIVING IPA
Within the MPC, match keys
are scrambled multiple
times, by multiple helper
nodes, while still encrypted.
After decryption, values
from the same person still
match up, but since the
values are scrambled their
identity is unknown.
16
How Matching in MPC
improves on existing solutions
Matching in MPC vs Status Quo
Status quo: ad-companies use unique global identifiers to match up
ad impressions and conversions on their own servers
Status quo: no artificial delays
Matching in MPC: same
Status quo: No artificial limits on number of
campaigns
Matching in MPC: same
Ad-companies can also use unique global identifiers
to track and profile people.
Match keys are never seen, so they cannot be used
to profile and track.
NEW TECHNOLOGY DRIVING IPA
17
Matching in MPC vs On-device attribution
SKAN and other on-device approaches connect ad impressions and
clicks with conversions and generate “anonymous” reports
On-device attribution: Only possible to count
conversions that occur on the same device where
the ad was shown
Matching in MPC: Can be used to measure crossdevice conversions
On-device attribution: Requires delays and
artificial limits on number of campaigns to try to
protect privacy
Matching in MPC: Improved privacy protection
without need for any delays or campaign limits
NEW TECHNOLOGY DRIVING IPA
With IPA, businesses would see accurate ad
reporting without sharing personal data with
ad-tech companies or anyone else.
Here’s how...
18
In order to best explain
how IPA works we will
build up to it in 6 steps
STEP 1
A single trusted server
STEP 2
Transforming the Global ID
STEP 3
Two (semi) trusted servers
STEP 4
Adding Differential Privacy
STEP 5
Managing a privacy budget
Building up to IPA
STEP 6
Extending the threat model
19
How can we make sure
fewer companies have
access to our personal data?
A single trusted server
BUILDING UP TO IPA
20
Using asymmetric encryption to
protect privacy
In this system, instead of sending personal data directly
to ad-tech companies, impression and conversion reports
with match keys are encrypted using asymmetric
encryption and sent to a trusted server.
BUILDING UP TO IPA
The server decrypts the data and matches events up to
count how many times someone saw an ad and then
made a purchase. They share that count with the ad-tech
companies but keep the personal data secret.
21
Metaphorical representation
BUILDING UP TO IPA
22
Asymmetric encryption is familiar to most of us. This is the system we
use when we send our credit card details to a website or use end-toend encrypted messaging apps like iMessage or WhatsApp.
Here’s how it works:
In order to send a secret message
through the mail your friend sends
you an open padlock.
When it’s time to send a message
to your friend, you place it in a box
and secure it with the padlock
supplied.
Only your friend has the key that
opens the padlock, so if the box is
intercepted it would be impossible
for others to open.
Your friend can send a padlock to
anyone around the world, but they
are the only one with the key.
An ad impression / conversion
event that has been encrypted
appears as undecipherable
ciphertext to ad tech companies.
Can we limit the trust required
by ensuring no-one sees our
personal data at all?
Transforming the Global ID
BUILDING UP TO IPA
23
Blinding makes it possible for a server
to process the data without seeing the
identity of the user.
In this system, when the server receives the
encrypted data, they first apply a ‘blinding factor’,
changing the encrypted numbers.
BLIND
Now they decrypt the data - but it has already
been changed. So even once decrypted, the server
can't see the original match key.
DECRYPT
BUILDING UP TO IPA
24
Events originating from the same person still have the
same value of the blinded match key, so it’s still possible
to match up ad impressions and purchases from the same
person, but the value of the blinded match key is unlinkable to that person’s identity.
Metaphorical representation
BUILDING UP TO IPA
Blinding encrypted data is a way for servers to alter user data so
that it is still useful, but can no longer identify people personally.
Here’s how it works
We have a batch of boxes, each with a dial
pointing to a number. That number
represents a match key.
We place the dials in boxes so the dial can
still be turned but the number is hidden.
This is a metaphor for encryption.
The boxes are sent to a trusted helper. The
helper chooses a random number, then
turns all the dials that number of ticks,
before passing them on to an ad company.
The ad-tech company is still able to
compare numbers to see which are the
same, without knowing what the original
values were.
25
Can we avoid having a single
trusted server?
Two (semi) trusted servers
BUILDING UP TO IPA
26
With double encryption, two servers can process the
data without either seeing the identity of the user.
Instead of having one trusted server to decrypt the
The first server removes one layer of encryption, then
data, we now have two. Before data leaves the user’s applies its 'blinding factor' to change the numbers
device, it is encrypted towards both helper servers.
before sending them along to the second server.
Metaphorically, this is like locking it with two
padlocks, one from each server.
BUILDING UP TO IPA
27
The second server removes the second layer of
encryption and applies its own 'blinding factor'. Now the
data has been changed twice. Neither server knows both
“blinding factors” and neither server was ever able to see
the original match key.
Metaphoric representation
BUILDING UP TO IPA
28
With double encryption, the data is encrypted
twice, and two servers are required to decrypt it.
Here’s how it works:
A message is locked in a box with 2
padlocks.
Now two people must collaborate
to unlock the box.
The first person unlocks their
padlock and then sends the box to
the person with the second key.
The second person uses their key
to unlock the second padlock.
The box is now open. Only the
second person is able to see what’s
in the box.
The system is now private.
But how do we defend
against attacks?
Adding Differential Privacy
BUILDING UP TO IPA
29
In the IPA system, ad-tech vendors only see aggregate data about whole
user groups, not data about individuals. However, it’s still possible to find
out information about individuals if you ask for the data multiple times.
Imagine that an ad-tech vendor wants to know if a
particular user who saw an ad purchased that
product. They send a batch of 1000 “source events”
(i.e. ad impressions) and 20 “trigger events” (i.e. ad
conversions) to the IPA system. They receive back
the results: there were 6 ad conversions.
Now imagine that the ad-tech vendor removes just
one of those “source events” (i.e. the “ad impression”
that was shown to Jane Doe) and re-sends the data.
30
If the number of attributed events drops to “5”, the
vendor has just learned that Jane Doe made an ad
conversion. If the number is still “6”, they’ve learned that
Jane Doe did not make an ad conversion. Either way we
have a problem: we don’t want our system to reveal
information about individual people.
One solution is to intentionally add a small
amount of randomness to the results.
The IPA system will add or subtract a small amount from the correct
answer at random. If the correct answer was ‘6 ad conversions’, the
system might feed back any number from 4 to 8, with different results
each time.
This makes it impossible for the adtech vendor to identify the
behaviour of a single individual by
running multiple queries.
BUILDING UP TO IPA
31
Metaphorical representation
BUILDING UP TO IPA
32
Here’s another way to explain how
differential privacy protects from
attacks.
Imagine that a group of people step
on a big set of scales. The scales
read out the combined weight of the
entire group, but you don’t know
how much any individual weighs.
Now imagine we use the scales to
weigh almost the same group of
people, but one person stays off.
We can tell the weight of the
excluded person by looking at the
difference between the two
results.
To keep the weight of the
individuals secret, we can instruct
the scales to provide a slightly
incorrect answer each time you
use it. The scales will add or
subtract a few dozen pounds to
the result at random.
Now we can no longer be sure of
the exact weight of any individual
on the scales, but we still have a
good idea of the aggregate weight
of the whole group.
Can we make it impossible
for ad-tech vendors to game
the system?
Managing a privacy budget
BUILDING UP TO IPA
33
If an ad-tech vendor is able to submit the same data for processing enough
times, gradually the randomness will average out.
Once you have enough queries the average will
slowly converge on the correct answer.
If an ad-tech vendor submits the data with ‘Jane Doe’
ten times, and the data without ‘Jane Doe’ ten times,
they can calculate the average of both sets of data.
x10
34
If the average number of ad conversions they received
when they submitted the data with Jane Doe was 6, and
the average number of conversions without her was 5,
they can assume that Jane Doe purchased the product.
We can make it impossible for ad-tech vendors to game the
system this way by introducing a ‘privacy budget’.
This means that ad-tech vendors can decide how
many requests they want to make, but the more
requests, the more noise is added.
The more requests they want to
submit, the more random noise will
be added to the results.
BUILDING UP TO IPA
35
Metaphoric representation
BUILDING UP TO IPA
36
Let’s see how this works with
the scales metaphor
Imagine that the set of scales has a
way of recognising whether
someone has stood on the scales
before.
The group decides in advance how
many times they will get on the
scales.
If you stand on the scales just once,
only a small amount of randomness
will be added to your results.
If you choose to weigh the group
more times, more randomness is
added to each result.
If you exceed the number of times
you agreed to stand on the scales,
you will get a result of ‘zero’ with
the same amount of randomness
added.
How can we privately
determine the value of ad
conversions?
Extending the threat model
BUILDING UP TO IPA
37
Up until now we’ve only discussed counting events. Now let’s extend it to
support adding up purchase values. To do this we will have to add more
information to the reports.
When impression and conversion reports are generated, we can
include additional metadata within the encrypted report, such as
the conversion value.
Ad-tech Server
38
After the matching stage, the metadata from matched
conversions can be aggregated to produce an output
report, like the sum of the conversion values.
Matching Stage
Aggregation
If ad buyers and helper nodes can see individual sales values, they might
be able to identify customers. Here’s how IPA ensures individual
purchase values are never visible to anyone
Imagine John spends $188 on a product, and he’s
the only customer to spend that amount. When we
see $188 in the data, we know it refers to John,
even if we never see his match key.
We want to make sure the exact value of John’s
purchase is never visible to anyone. Before any data
leaves John’s device, we generate a random number
(A). Then we choose a second number (B) which
yields 188 when added to the first. (A + B = 188)
This happens for every purchase. The value of the
purchase is split into two numbers (A and B) that
combined together give you the correct value.
39
IPA ensures individual purchase values are
never visible to anyone
A batch of the first numbers (A) are
sent to the first helper node. The
helper node adds them together
into a full sum for that batch.
The B numbers are sent to the
second helper node, who adds
them into a full sum.
BUILDING UP TO IPA
Each helper node sends their sum value
back to the ad buyer. When the ad buyer
combines the two sum values, they get
the correct value of all the ad purchases.
40
No-one at any stage of the
process can see the value
of an individual purchase.
This is the basis of Interoperable Private Attribution.
41
HELPER
NODE 1
Decrypt &
blind
HELPER
NODE 2
Shuffle &
swap
Decrypt &
blind
Match
Sum of secret
shares
In designing IPA, we set out to find a win-win-win
solution for cross platform conversion
measurement that met our goals across privacy,
utility, and competition.
Our privacy goal is to limit the total amount of information
IPA releases about an individual over a given period of time.
Our utility goal is to support all the major aggregate
conversion measurement use-cases
Our competition goal is to ensure equal function for all
existing and new ad-tech players.
42
A NON-TECHNICAL INTRODUCTION TO
Interoperable Private
Attribution Use Cases
Interoperable Private Attribution (IPA) is a proposed
system that utilises privacy-enhancing technology to
make online ad measurement secure.
It would allow businesses to see the conversions from
ad impressions to purchases, without sharing the
personal data of customers.
The following are two key examples of how IPA can
improve on the current system.
IPA USES CASES
44
Use case 1
Cross Device Measurement
Looking at how the IPA proposal can allow accurate
measurement across multiple devices, while
preserving privacy.
IPA USES CASES
45
Making sense of cross device
impressions
Today, many of us use multiple devices, like
phones, laptops or tablets. We see an ad in an
app on our phone, and then make a purchase
in our web browser on a laptop.
IPA USES CASES
46
How can we connect purchases with ad
impressions if they happen on different devices?
This is difficult / impossible for most
ad tech providers today.
IPA USES CASES
With IPA any app/website can choose to use
a match keys set by a company like Google or
Facebook; services many people log-in to
across multiple devices.
47
IPA USES CASES
With IPA, we can level the playing field so that every
ad tech provider can get data on cross device
conversions, not just the large companies.
48
Use case 2
Cross Publisher Attribution
Looking at how the IPA proposal could potentially
enable cross-publisher attribution; while preserving
individual privacy.
IPA USES CASES
49
As an ad buyer, you want to figure
out where to spend your ad money to
have the maximum impact.
Since many people will see your ads
across multiple apps and websites,
you want a system that allocates
credit in a sensible way, and doesn’t
double (or triple) count conversions.
IPA USES CASES
50
Problem:
Customers may see multiple ads for a product
before they purchase it.
Jane sees an ad for a mobile wellness
app on Instagram.
Later she sees another ad for the app
while reading a newspaper article.
Jane does a Google search for the
app. The first result is a sponsored
search result. Jane clicks on it.
IPA USES CASES
!
51
Which ad impression
should get the
credit?
Proposed solution:
Multi-Touch Attribution
With ‘multi-touch attribution’,
everyone gets a fraction of the credit.
If Jane saw an ad on Instagram, a newspaper article and Google
search, each of those services might get a third of the credit.
IPA USES CASES
In the current system, this is hard to
impossible. But with IPA, it might be
feasible.
52
Problem: Ad buyers see overlapping
reporting across ad-platforms
When ad-buyers spend across multiple adplatforms, multiple companies may take credit
for the same conversion.
IPA USES CASES
!
This makes it difficult to understand and
compare effectiveness of ad campaigns
These numbers
don’t seem to
add up.
53
Proposed Solution: Ad Buyers get their own
“Privacy Budget” to spend as they wish
Gemma purchases ads on multiple
ad platforms. They forward her the
encrypted impression reports for
the ads she paid for.
Gemma has her own privacy budget - and she
decides how to spend it. She runs her own
queries (or pays an independent vendor to
help her).
IPA USES CASES
Gemma can now see all her
results in one interface! No
more double counting!
Ok great, these
results actually
make sense
12 (+/- 2)
6 (+/- 2)
9 (+/- 2)
54
IPA USES CASES
With IPA, ad buyers can run their own queries to measure their ad
conversions. Because everything is connected there is no double
counting. Ad buyers can choose how to apportion credit in cases of
multiple touch points. This means one interface to get reporting
across all your channels and less need to trust the results an adplatform tells you.
55
Could IPA support your advertising
use-cases?
Would you like to help us improve this
proposal?
Do you have any concerns or
questions we can address?
Get in touch to let us know by
participating in the conversation in
the W3C PATCG Github issue.
56
Related documents
Download