A NON-TECHNICAL INTRODUCTION TO Interoperable Private Attribution (IPA) Ben Savage (Meta), Erik Taubeneck (Meta), Martin Thomson (Mozilla) NON-TECHNICAL INTRODUCTION TO IPA CONTENTS Introduction 3 The current system 5 Comparing proposals 8 Innovative technologies 11 Explaining IPA in 6 steps 19 ● A single trusted server 20 ● Transforming the Global ID 23 ● Two (semi) trusted servers 26 ● Adding Differential Privacy 29 ● Managing a privacy budget 33 ● Extending the threat model 37 IPA use cases 43 This presentation complements our proposal document published here. 2 INTRODUCTION Advertisers need accurate reporting about how their ad campaigns are performing. Currently, businesses use data about the people who viewed their ads and bought their products to determine ‘return on ad spend’. But the ecosystem is moving towards more privacy and less personal data sharing. NON-TECHNICAL INTRODUCTION TO IPA 3 INTRODUCTION How can we provide companies with accurate reporting while sharing less data? Interoperable Private Attribution is a proposed system that would enable accurate ad measurement while ensuring user privacy. NON-TECHNICAL INTRODUCTION TO IPA 4 THE CURRENT SYSTEM Here’s how ad measurement is done today (status quo) Matching global IDs Using a Global ID to compare impressions with purchases THE CURRENT SYSTEM In the current system, every user has a unique identifying number. That identifying number is recorded every time they click on an ad or make a purchase. Global ID 6 Ad-tech companies can see those identifying numbers to determine how many people made a purchase after seeing an ad. Global ID In its present form, this system means sharing a large volume of personal data with advertisers. THE CURRENT SYSTEM One current solution is to ask for consent. However, asking for consent has its challenges... It is difficult for a cookie consent prompt to explain the full context of this decision for users. Without fully understanding what they are being asked, users might share more data than they would like, or opt out due to a lack of understanding. People have consent fatigue from being asked too frequently. 7 What are the other proposals for ad measurement? THE CURRENT SYSTEM One existing tool is Apple’s SKAdNetwork (SKAN) which measures whether ads for mobile apps lead to installations. 1 2 3 4 When a person clicks on an ad for a mobile app, SKAN makes a note on that person’s device. When a person installs an app, SKAN checks to see whether they have previously clicked an ad for that app. If there is a match, SKAN generates a ‘Postback’ report which it sends to Apple. Apple conceals the identity of the user and sends this Postback report on to the ad seller and (optionally) the ad buyer as well. 8 Challenges with the SKAN model Because SKAN sends out one report for each individual conversion, the system risks revealing the identity of the buyer. There are a number of mitigations in place to try to prevent this, but they do not always work and negatively impact useability. THE CURRENT SYSTEM Similar challenges exist for other tools and proposals like Apple’s “Private Click Measurement” and Chrome’s “Attribution Reporting API” proposal. Timer Limited campaign IDs No cross-device counting SKAN delays sending ‘Postback’ reports by a random duration between 24-48 hours. This means ad buyers don’t get results on their ads for at least two days, which makes it hard to be responsive. Including too much detailed information in ‘Postback’ reports could identify individual users, so SKAN limits the number of times ad buyers can break down their ad campaign in reporting. This makes it difficult to get detailed metrics, and does not completely resolve the privacy risk. With approaches like SKAN, ad impressions and ad conversions are connected on the user’s device. This means it is impossible to measure crossdevice conversions. 9 How is the IPA proposal different? Instead of generating one report per attributed conversion, IPA generates aggregate reports for batches of events. Instead of connecting ad impressions and ad purchases on a user’s device, IPA makes these connections within a Secure Multiparty Computation (MPC). 10 At the core of IPA are two key ideas that differ from previous approaches NEW TECHNOLOGY DRIVING IPA A B Match Keys Matching in MPC A secure identifier that can be set by apps and websites people commonly log-in to across devices. Matching of ad interactions and conversions happens server-side, within MPC, rather than on-device. 11 A Match Keys A democratised, write-only identifier that anyone can set, but also anyone can benefit from. Since only the browser/OS can read the match key, and the actual value is never revealed to anyone, it cannot be used for tracking or profiling. It can only be used within a specific MPC for the purpose of aggregate conversion measurement. NEW TECHNOLOGY DRIVING IPA 12 How match keys work Apps with large reach may choose to set a match key when people log in to their products (on both app and web). If people sign in to the same account across multiple devices, the same match key can be set. NEW TECHNOLOGY DRIVING IPA Any app or website can select a list of match key providers they want to use e.g. [“facebook.com”, “google.com”, “twitter.com”] Your selected match keys 13 Encrypted impression and conversion reports will use the specified match keys (if they are set on that device). Conversions and impressions will match up in the MPC if at least one matchkey is the same. How Match Keys improve on existing solutions Match keys vs IDFA The ID For Advertising (IDFA) is a unique number for each iOS device. NEW TECHNOLOGY DRIVING IPA Match keys vs third party cookies Ad-tech companies set third-party cookies in web browsers to track user behaviour, including ad impressions and purchases. The IDFA is readable (with user permission), and thus can be used to profile and track people Match keys are never seen, so they cannot be used to profile and track. Third party cookies can be used for tracking and profiling people. Match keys are never seen, so they cannot be used to profile and track. Apple sets the value of the IDFA Match keys can be set by any app or website Any company can set a third-party cookie, but only they can read it. Match keys can be used, not read, by anyone. The IDFA is device-scoped, so it can’t be used to measure cross-device purchases. Match keys can be set to the same value across multiple devices. 14 B Matching in MPC Matching of ad impressions and conversions happens server-side, within a Secure Multiparty Computation (MPC). The actual values of the match-keys are hidden from the MPC itself. This approach eliminates an entire category of privacy risks approaches like SKAN face. It also enables cross-device conversion attribution NEW TECHNOLOGY DRIVING IPA 15 How matching in MPC works Match keys are stored privately by the browser / mobile device. Apps and websites cannot read the value. The browser / mobile device encrypts information about the impressions or conversions, including the match key. Apps and websites have to send this information to the MPC to perform matching. NEW TECHNOLOGY DRIVING IPA Within the MPC, match keys are scrambled multiple times, by multiple helper nodes, while still encrypted. After decryption, values from the same person still match up, but since the values are scrambled their identity is unknown. 16 How Matching in MPC improves on existing solutions Matching in MPC vs Status Quo Status quo: ad-companies use unique global identifiers to match up ad impressions and conversions on their own servers Status quo: no artificial delays Matching in MPC: same Status quo: No artificial limits on number of campaigns Matching in MPC: same Ad-companies can also use unique global identifiers to track and profile people. Match keys are never seen, so they cannot be used to profile and track. NEW TECHNOLOGY DRIVING IPA 17 Matching in MPC vs On-device attribution SKAN and other on-device approaches connect ad impressions and clicks with conversions and generate “anonymous” reports On-device attribution: Only possible to count conversions that occur on the same device where the ad was shown Matching in MPC: Can be used to measure crossdevice conversions On-device attribution: Requires delays and artificial limits on number of campaigns to try to protect privacy Matching in MPC: Improved privacy protection without need for any delays or campaign limits NEW TECHNOLOGY DRIVING IPA With IPA, businesses would see accurate ad reporting without sharing personal data with ad-tech companies or anyone else. Here’s how... 18 In order to best explain how IPA works we will build up to it in 6 steps STEP 1 A single trusted server STEP 2 Transforming the Global ID STEP 3 Two (semi) trusted servers STEP 4 Adding Differential Privacy STEP 5 Managing a privacy budget Building up to IPA STEP 6 Extending the threat model 19 How can we make sure fewer companies have access to our personal data? A single trusted server BUILDING UP TO IPA 20 Using asymmetric encryption to protect privacy In this system, instead of sending personal data directly to ad-tech companies, impression and conversion reports with match keys are encrypted using asymmetric encryption and sent to a trusted server. BUILDING UP TO IPA The server decrypts the data and matches events up to count how many times someone saw an ad and then made a purchase. They share that count with the ad-tech companies but keep the personal data secret. 21 Metaphorical representation BUILDING UP TO IPA 22 Asymmetric encryption is familiar to most of us. This is the system we use when we send our credit card details to a website or use end-toend encrypted messaging apps like iMessage or WhatsApp. Here’s how it works: In order to send a secret message through the mail your friend sends you an open padlock. When it’s time to send a message to your friend, you place it in a box and secure it with the padlock supplied. Only your friend has the key that opens the padlock, so if the box is intercepted it would be impossible for others to open. Your friend can send a padlock to anyone around the world, but they are the only one with the key. An ad impression / conversion event that has been encrypted appears as undecipherable ciphertext to ad tech companies. Can we limit the trust required by ensuring no-one sees our personal data at all? Transforming the Global ID BUILDING UP TO IPA 23 Blinding makes it possible for a server to process the data without seeing the identity of the user. In this system, when the server receives the encrypted data, they first apply a ‘blinding factor’, changing the encrypted numbers. BLIND Now they decrypt the data - but it has already been changed. So even once decrypted, the server can't see the original match key. DECRYPT BUILDING UP TO IPA 24 Events originating from the same person still have the same value of the blinded match key, so it’s still possible to match up ad impressions and purchases from the same person, but the value of the blinded match key is unlinkable to that person’s identity. Metaphorical representation BUILDING UP TO IPA Blinding encrypted data is a way for servers to alter user data so that it is still useful, but can no longer identify people personally. Here’s how it works We have a batch of boxes, each with a dial pointing to a number. That number represents a match key. We place the dials in boxes so the dial can still be turned but the number is hidden. This is a metaphor for encryption. The boxes are sent to a trusted helper. The helper chooses a random number, then turns all the dials that number of ticks, before passing them on to an ad company. The ad-tech company is still able to compare numbers to see which are the same, without knowing what the original values were. 25 Can we avoid having a single trusted server? Two (semi) trusted servers BUILDING UP TO IPA 26 With double encryption, two servers can process the data without either seeing the identity of the user. Instead of having one trusted server to decrypt the The first server removes one layer of encryption, then data, we now have two. Before data leaves the user’s applies its 'blinding factor' to change the numbers device, it is encrypted towards both helper servers. before sending them along to the second server. Metaphorically, this is like locking it with two padlocks, one from each server. BUILDING UP TO IPA 27 The second server removes the second layer of encryption and applies its own 'blinding factor'. Now the data has been changed twice. Neither server knows both “blinding factors” and neither server was ever able to see the original match key. Metaphoric representation BUILDING UP TO IPA 28 With double encryption, the data is encrypted twice, and two servers are required to decrypt it. Here’s how it works: A message is locked in a box with 2 padlocks. Now two people must collaborate to unlock the box. The first person unlocks their padlock and then sends the box to the person with the second key. The second person uses their key to unlock the second padlock. The box is now open. Only the second person is able to see what’s in the box. The system is now private. But how do we defend against attacks? Adding Differential Privacy BUILDING UP TO IPA 29 In the IPA system, ad-tech vendors only see aggregate data about whole user groups, not data about individuals. However, it’s still possible to find out information about individuals if you ask for the data multiple times. Imagine that an ad-tech vendor wants to know if a particular user who saw an ad purchased that product. They send a batch of 1000 “source events” (i.e. ad impressions) and 20 “trigger events” (i.e. ad conversions) to the IPA system. They receive back the results: there were 6 ad conversions. Now imagine that the ad-tech vendor removes just one of those “source events” (i.e. the “ad impression” that was shown to Jane Doe) and re-sends the data. 30 If the number of attributed events drops to “5”, the vendor has just learned that Jane Doe made an ad conversion. If the number is still “6”, they’ve learned that Jane Doe did not make an ad conversion. Either way we have a problem: we don’t want our system to reveal information about individual people. One solution is to intentionally add a small amount of randomness to the results. The IPA system will add or subtract a small amount from the correct answer at random. If the correct answer was ‘6 ad conversions’, the system might feed back any number from 4 to 8, with different results each time. This makes it impossible for the adtech vendor to identify the behaviour of a single individual by running multiple queries. BUILDING UP TO IPA 31 Metaphorical representation BUILDING UP TO IPA 32 Here’s another way to explain how differential privacy protects from attacks. Imagine that a group of people step on a big set of scales. The scales read out the combined weight of the entire group, but you don’t know how much any individual weighs. Now imagine we use the scales to weigh almost the same group of people, but one person stays off. We can tell the weight of the excluded person by looking at the difference between the two results. To keep the weight of the individuals secret, we can instruct the scales to provide a slightly incorrect answer each time you use it. The scales will add or subtract a few dozen pounds to the result at random. Now we can no longer be sure of the exact weight of any individual on the scales, but we still have a good idea of the aggregate weight of the whole group. Can we make it impossible for ad-tech vendors to game the system? Managing a privacy budget BUILDING UP TO IPA 33 If an ad-tech vendor is able to submit the same data for processing enough times, gradually the randomness will average out. Once you have enough queries the average will slowly converge on the correct answer. If an ad-tech vendor submits the data with ‘Jane Doe’ ten times, and the data without ‘Jane Doe’ ten times, they can calculate the average of both sets of data. x10 34 If the average number of ad conversions they received when they submitted the data with Jane Doe was 6, and the average number of conversions without her was 5, they can assume that Jane Doe purchased the product. We can make it impossible for ad-tech vendors to game the system this way by introducing a ‘privacy budget’. This means that ad-tech vendors can decide how many requests they want to make, but the more requests, the more noise is added. The more requests they want to submit, the more random noise will be added to the results. BUILDING UP TO IPA 35 Metaphoric representation BUILDING UP TO IPA 36 Let’s see how this works with the scales metaphor Imagine that the set of scales has a way of recognising whether someone has stood on the scales before. The group decides in advance how many times they will get on the scales. If you stand on the scales just once, only a small amount of randomness will be added to your results. If you choose to weigh the group more times, more randomness is added to each result. If you exceed the number of times you agreed to stand on the scales, you will get a result of ‘zero’ with the same amount of randomness added. How can we privately determine the value of ad conversions? Extending the threat model BUILDING UP TO IPA 37 Up until now we’ve only discussed counting events. Now let’s extend it to support adding up purchase values. To do this we will have to add more information to the reports. When impression and conversion reports are generated, we can include additional metadata within the encrypted report, such as the conversion value. Ad-tech Server 38 After the matching stage, the metadata from matched conversions can be aggregated to produce an output report, like the sum of the conversion values. Matching Stage Aggregation If ad buyers and helper nodes can see individual sales values, they might be able to identify customers. Here’s how IPA ensures individual purchase values are never visible to anyone Imagine John spends $188 on a product, and he’s the only customer to spend that amount. When we see $188 in the data, we know it refers to John, even if we never see his match key. We want to make sure the exact value of John’s purchase is never visible to anyone. Before any data leaves John’s device, we generate a random number (A). Then we choose a second number (B) which yields 188 when added to the first. (A + B = 188) This happens for every purchase. The value of the purchase is split into two numbers (A and B) that combined together give you the correct value. 39 IPA ensures individual purchase values are never visible to anyone A batch of the first numbers (A) are sent to the first helper node. The helper node adds them together into a full sum for that batch. The B numbers are sent to the second helper node, who adds them into a full sum. BUILDING UP TO IPA Each helper node sends their sum value back to the ad buyer. When the ad buyer combines the two sum values, they get the correct value of all the ad purchases. 40 No-one at any stage of the process can see the value of an individual purchase. This is the basis of Interoperable Private Attribution. 41 HELPER NODE 1 Decrypt & blind HELPER NODE 2 Shuffle & swap Decrypt & blind Match Sum of secret shares In designing IPA, we set out to find a win-win-win solution for cross platform conversion measurement that met our goals across privacy, utility, and competition. Our privacy goal is to limit the total amount of information IPA releases about an individual over a given period of time. Our utility goal is to support all the major aggregate conversion measurement use-cases Our competition goal is to ensure equal function for all existing and new ad-tech players. 42 A NON-TECHNICAL INTRODUCTION TO Interoperable Private Attribution Use Cases Interoperable Private Attribution (IPA) is a proposed system that utilises privacy-enhancing technology to make online ad measurement secure. It would allow businesses to see the conversions from ad impressions to purchases, without sharing the personal data of customers. The following are two key examples of how IPA can improve on the current system. IPA USES CASES 44 Use case 1 Cross Device Measurement Looking at how the IPA proposal can allow accurate measurement across multiple devices, while preserving privacy. IPA USES CASES 45 Making sense of cross device impressions Today, many of us use multiple devices, like phones, laptops or tablets. We see an ad in an app on our phone, and then make a purchase in our web browser on a laptop. IPA USES CASES 46 How can we connect purchases with ad impressions if they happen on different devices? This is difficult / impossible for most ad tech providers today. IPA USES CASES With IPA any app/website can choose to use a match keys set by a company like Google or Facebook; services many people log-in to across multiple devices. 47 IPA USES CASES With IPA, we can level the playing field so that every ad tech provider can get data on cross device conversions, not just the large companies. 48 Use case 2 Cross Publisher Attribution Looking at how the IPA proposal could potentially enable cross-publisher attribution; while preserving individual privacy. IPA USES CASES 49 As an ad buyer, you want to figure out where to spend your ad money to have the maximum impact. Since many people will see your ads across multiple apps and websites, you want a system that allocates credit in a sensible way, and doesn’t double (or triple) count conversions. IPA USES CASES 50 Problem: Customers may see multiple ads for a product before they purchase it. Jane sees an ad for a mobile wellness app on Instagram. Later she sees another ad for the app while reading a newspaper article. Jane does a Google search for the app. The first result is a sponsored search result. Jane clicks on it. IPA USES CASES ! 51 Which ad impression should get the credit? Proposed solution: Multi-Touch Attribution With ‘multi-touch attribution’, everyone gets a fraction of the credit. If Jane saw an ad on Instagram, a newspaper article and Google search, each of those services might get a third of the credit. IPA USES CASES In the current system, this is hard to impossible. But with IPA, it might be feasible. 52 Problem: Ad buyers see overlapping reporting across ad-platforms When ad-buyers spend across multiple adplatforms, multiple companies may take credit for the same conversion. IPA USES CASES ! This makes it difficult to understand and compare effectiveness of ad campaigns These numbers don’t seem to add up. 53 Proposed Solution: Ad Buyers get their own “Privacy Budget” to spend as they wish Gemma purchases ads on multiple ad platforms. They forward her the encrypted impression reports for the ads she paid for. Gemma has her own privacy budget - and she decides how to spend it. She runs her own queries (or pays an independent vendor to help her). IPA USES CASES Gemma can now see all her results in one interface! No more double counting! Ok great, these results actually make sense 12 (+/- 2) 6 (+/- 2) 9 (+/- 2) 54 IPA USES CASES With IPA, ad buyers can run their own queries to measure their ad conversions. Because everything is connected there is no double counting. Ad buyers can choose how to apportion credit in cases of multiple touch points. This means one interface to get reporting across all your channels and less need to trust the results an adplatform tells you. 55 Could IPA support your advertising use-cases? Would you like to help us improve this proposal? Do you have any concerns or questions we can address? Get in touch to let us know by participating in the conversation in the W3C PATCG Github issue. 56