Markets for Database Privacy Sara Krehbiel Georgia Institute of Technology

advertisement

Markets for Database Privacy

Sara Krehbiel

Georgia Institute of Technology sarak@gatech.edu

August 8, 2014

Abstract

Database privacy has garnered a recent surge in interest from the theoretical computer science

community following the seminal work of [DMNS06], which proposed the strong notion of

differential privacy . Classical differentially private mechanisms produce a noisy statistic on an input database that is

ε -differentially private for some input parameter ε . This work continues a new line of research seeking to develop mechanisms that endogenously produce a value of ε that depends on the privacy preferences of the data owners and analyst. For such mechanisms, I propose a new notion of endogenous differential privacy , which requires a mechanism to satisfy ε -differential privacy on neighboring databases only for the ε in the mechanism’s privacy support of one of those databases. I develop a framework for markets that 1) determine ε using the privacy and accuracy preferences of a heterogeneous body of data owners and a single analyst, 2) compute a noisy statistic on the database, and 3) collect and distribute payments for the chosen level of ε . I present and analyze one simple class of markets for database privacy in this framework. These mechanisms directly elicit data owners’ true preferences, they determine a level of ε that is efficient given data owner and analyst preferences, and they are endogenously differentially private with respect to data owners’ privacy preferences as well as their private data.

1

1 Introduction

Database curators put various sanitation methods in place to ensure against leaking individuals’ private data while permitting meaningful statistical analysis. A mechanism is said to be differentially private if no row of the database has too much effect on the output of the mechanism, where the magnitude of this effect is given by a privacy parameter ε ∈

R

+

. A growing body of literature connects database privacy guarantees to data

owners’ utilities [GR11, DFI11, LR12, FL12, NST12, NOS12, CCK

+

13, Xia13, GL13, NVX14]. In several

of these works, analysts pay data owners that participate in an ε -differentially private mechanism. Analysts prefer small payouts and a high value of ε (which permits more accurate statistical analysis), while data owners prefer large payouts and small ε (greater privacy). However, almost all of the existing literature treats

ε as an input parameter of the mechanism, with the implicit assumption that it is determined exogenously prior to running the mechanism.

Different applications intuitively require different levels of privacy; medical records, for example, may require a higher standard of privacy than Netflix preferences. But what are the analytical attributes of an optimal privacy compromise, and how can it be implemented in the presence of conflicting interests between data owners and analysts? Attempts at legislation for privacy requirements often suffer from legislators being ill-informed of their theoretical or practical consequences. A database curator might set ε according to some expert discretion about the relative needs for privacy versus meaningful data analysis, but he may not have good information about the privacy preferences of the data owners or the value of accurate data for analysts.

Indeed, we would expect owners and analysts to inflate their stated respective needs for privacy and accurate data, as there is no downside to doing so if no monetary transfers are made. For a non-optimal value of ε , data owners may want to buy more privacy than the current level offered, or analysts may be willing to pay data owners to accept less privacy and in turn gain access to more accurate data.

As we argue below, existing frameworks for such monetary transfers do not provide a satisfactory solution to the market problem of finding an optimal level of privacy. Mechanism design and computational market equilibrium techniques have been applied to similar settings in which multiple parties’ competing preferences jointly determine socially efficient pricing and allocation of goods. However, their application to the privacy setting has not yet been fully investigated. This work bridges this gap between database privacy and market economics by 1) formalizing a better framework for mechanisms that make monetary transfers and provide corresponding privacy guarantees, and 2) adapting an existing mechanism for efficient allocation of public goods to this privacy setting, resulting in a class of mechanisms that are simultaneously incentive compatible, individually rational, accurate, and private with respect to private data and privacy valuations.

Prior Works.

As mentioned, a number of works explore the issue of modeling the utility of database privacy to data owners. A primary focus of many of these works is incentive compatibility , also referred to as truthfulness , which is the property that each data owner maximizes his utility by revealing his true private

data. Xiao [Xia13] provides a generic transformation of any incentive compatible mechanism into one that is

differentially private, but incentive compatibility breaks if privacy itself has a value. In the truthful and private

mechanisms of [NST12, NOS12, CCK

+

13], data owner utility depends on the output of the mechanism as

well as privacy, so the mechanism itself may provide value to the data owners. Other works assume that private data has already been collected, so mechanisms need not truthfully elicit private data. Ligett and

Roth [LR12] and Fleischer and Lyu [FL12] design survey mechanisms that incentivize enough voluntarily

participation to guarantee accurate differentially private analysis, despite correlations between data owners’ private data and their privacy valuations. Because of these correlations, their mechanisms are private with

respect to data owner valuations as well as private data. In [LR12], some data owners experience a net

negative utility, i.e., the mechanism is not individually rational

, while [FL12] relies on knowledge of a prior distribution of privacy valuations for each private data type. Nissim, Vadhan, and Xiao [NVX14] develop an

individually rational mechanism that truthfully elicits privacy valuations directly, but they only guarantee

2

a relaxed notion of privacy, motivated by a strong assumption about the specific relationship between data owners’ private data and their privacy valuations.

In all of the above mechanisms, ε must be chosen exogenously, before looking at the data owners’ private

data or privacy valuations. In contrast, Ghosh and Roth [GR11] propose a model, called the

insensitive value model , in which mechanisms solicit these privacy preferences, determine an appropriate value of ε , charge the analyst some payment in exchange for a noisy statistic on the data, and distribute this payment among data owners to compensate for their ε loss of privacy. Differential privacy is guaranteed at any level output by the mechanism, but privacy is with respect to the private data only, so information about the data may be leaked if data and valuations are correlated. They present two incentive compatible and individually rational mechanisms in this model that estimate a counting query. A mechanism that estimates a linear predictor

function in this model is given in [DFI11]. Ghosh and Ligett [GL13] propose an alternate model in which an

analyst proposes some differentially private computation designed so the privacy parameter ε decreases with the number of data owners that opt in. Differential privacy is only with respect to the private data and not with respect to data owners’ decisions to opt in or out.

Most closely related to this work, Ghosh and Roth [GR11] also propose the

sensitive value model , in which differential privacy is with respect to privacy valuations as well as private data. However, they

prove a strong impossibility result (Theorem 5.1, [GR11]), that no individually rational mechanism that is

non-trivially accurate can protect the privacy of both private data and privacy valuations while charging the analyst any finite quantity. The intuition for this result is that if a data owner can have arbitrarily high privacy valuation v i

, an individually rational mechanism must be prepared to charge the analyst an arbitrarily high payment for any fixed ε . Maintaining differential privacy with respect to these privacy valuations means this high lower bound on the analyst payment must hold even when v i is moderate.

Contributions.

Our first main contribution is a new definition of endogenous differential privacy ( see

Definition 3.1). We argue that the standard definition of differential privacy is inappropriate in the setting of

privacy markets that choose ε depending on their inputs, because it requires the outputs of the mechanism on any two neighboring databases to be ε close for the smallest ε the mechanism may output on any input database, even if both neighboring databases only yield much higher values of ε . Our new definition requires outputs on any reference and neighboring database to be only as close as the smallest ε supported by the mechanism running on the reference database .

In Section 3, we modify the sensitive value model of Ghosh and Roth [GR11], permitting us to model

data owners’ utilities with the view of privacy as a valuable public good that data owners must purchase. This

is in contrast to [GR11] and other works that view loss of privacy as a cost that must be compensated. We

note that any standard moneyless differentially private mechanism is trivially individual rational if its privacy provides data owners with nonnegative utility, and such a mechanism is accurate and differentially private in the standard sense for some exogenous ε . Strictly speaking, this circumvents the impossibility result in the

model of [GR11], albeit in a way that does not make any progress towards determining a sensible value of

ε endogenously. We further show that only trivial mechanisms are possible in this positive-value model. If we allow mechanisms to run a deficit at most half the time, we get a slightly weaker but still unsatisfying negative result in this model. Ultimately, our endogenous privacy definition (along with valuing privacy positively and permitting the mechanism to sometimes run a deficit) allows our framework to admit meaningful privacy

markets, circumventing the key negative result of [GR11].

Our second main contribution is a positive result in the form of a class of mechanisms that receive the private data, solicit privacy preferences from the data owners and analyst, choose a level of privacy consistent with these preferences, make monetary transfers, and get a noisy statistic by running a standard differentially private mechanism on the data at the previously determined level of privacy. While individual rationality is much easier to maintain with the view of privacy as a good, the new challenge is to construct a payment scheme that discourages data owners from understating their individual sensitivities, letting others pay for the

3

privacy enjoyed by all. The cumulative works of [Vic61, Cla71, Gro70, GL77] provide an elegant solution

to this “free-rider problem” as it exists more generally in neoclassical economies, achieving the market goal of a Pareto efficient

(see Definition 3.5) level of production while incentivizing consumers to report

their true preferences. The class of mechanisms presented in Section 4 adapts this solution to the free-rider

problem to our privacy market framework. A mechanism M in our class computes the level of privacy that maximizes the value of privacy to data owners less the cost to the analyst, and it charges data owners individual payments that align individual utility with social utility. It then runs some standard differentially private M q to approximate the desired query on the database at the market-determined level of ε . We prove that our mechanisms are endogenously differentially private, incentive compatible, individually rational when sum of data owner valuations is not too small relative to analyst cost, Pareto efficient for appropriate choices of parameters, and have a balanced budget in expectation.

Future work.

We remark that our class of endogenously private mechanisms is a special case of the schema

for Pareto efficient allocation of goods described in [GL77]. Their framework allows for multiple public

goods with different production prices. This generality could be readily exploited to create markets for privacy with multiple analysts, possibly with different levels of ε for different databases or different queries.

2 Preliminaries

Let

R denote the set of real numbers and

R

+ denote the nonnegative reals. Let x

+

= max(0 , x ) for x ∈

R

.

Let P i ∈ [ n ]

, and j ∈ [ n ] \{ i } the i th entry, and denote the vector without v i as v

− i

. Let v

− i k v n

P or P i denote

-dimension vectors v , v

0

P j = i

, let v ∼ v

0 denotes P . For any

0 i n -dimensional vector denote v with v i v , let v replaced with i denote v i

0

. For denote that they differ on only one row, and we call such vectors neighboring.

The following is the standard definition of differential privacy [DMNS06] with respect to neighboring

databases differing in at most one row:

Definition 2.1.

A mechanism M : D n d , d

0

∈ D n and for all subsets S ⊆ R ,

→ R is ( ε, δ ) -differentially private if for all neighboring databases

Pr[ M ( d ) ∈ S ] ≤ exp( ε ) · Pr[ M ( d

0

) ∈ S ] + δ.

all x

For any

∈ b ∈

R

+

, let Lap ( b ) denote the real random variable with Pr[ Lap ( b ) = x ] =

1

2 b exp( −

| x − b | b

) for

R

. We use a simple and powerful mechanism based on the Laplace distribution that approximates any real-valued query and ensures privacy by adding noise proportional to the sensitivity of the query:

Proposition 2.2

max d , d

0

∈D n , d ∼ d

0

(Laplace mechanism [DMNS06])

.

For any query q : D n →

R and ∆ ∈

R

+ such that

| q ( d ) − q ( d

0

) | ≤ ∆ , the Laplace mechanism defined by M ( d ) = q ( d ) + Lap (∆ /ε ) is

( ε, 0) -differentially private.

The following well-known lemma establishes the differential privacy of the composition of multiple differentially private mechanisms:

Proposition 2.3

(Composition [DRV10])

.

Let ε, δ > 0 . If M

1

, . . . , M k are each ( ε, δ ) -differentially private mechanisms, then the mechanism M ( d ) = ( M

1

( d ) , . . . , M k

( d )) releasing the concatenation of the outputs of each mechanism is ( kε, kδ ) -differentially private.

4

3 A Framework for Privacy Markets

[GR11] creates a model for privacy markets in which each data owner has a verifiable private bit and a privacy

valuation indicating the costliness of a differentially privacy mechanism. A mechanism takes this data owner

information along with some analyst preference information,

1

and then it outputs an estimate of the sum of the private bits, a charge to the analyst, and payments to each data owner. A given run of the mechanism has an underlying privacy guarantee for each data owner, which is formalized in the following subsection.

Model 1

Model of [GR11]

1:

2:

3:

4:

5:

Upon initialization, there exists a mechanism and a family of cost functions preference v

∗ i

R

+ c :

R

+ and a verifiable bit

× b i

R

+

∈ {

M

0 , 1

:

}

{ 0 , 1 } n

× (

R

+

) n ×

R

+ →

R

+ × (

R

+

)

, and an analyst has a budget preference n

B

×

R

×

R

+

.

R n

R

+

. Each data owner i ∈ [ n ] has some true privacy

M receives the verifiable data, and data owners and the analyst report their preferences to M .

M publishes statistic ˆ ∈

R

+ and privately outputs privacy guarantees ε i

M makes monetary transfers P ∈

R and p ∈

R n

R

+ for each i ∈ [ n ] .

from and to the analyst and data owners, respectively.

Each data owner i ∈ [ n ] realizes utility p i

− c ( v i

, ε i

) .

The new model generalizes that of [GR11] by allowing data owners’ private data and the statistic released

to the analyst to be of general forms, and, more significantly, by allowing the mechanism’s privacy guarantee to be parametrized by some y of a general form Y . Privacy valuations and costs are from families of real functions on Y .

Model 2 New Model

1: Upon initialization, there exists a mechanism

D , R , Y and V , C ⊆ {Y →

M : D n × V n × C → R × Y ×

R

×

R n for general types

R

} . Each data owner i ∈ [ n ] has some true privacy preference v

∗ i

∈ V and verifiable data d i

∈ D , and an analyst has a budget preference c ∈ C .

2:

3:

4:

5:

M receives the verifiable data, and data owners and the analyst report their preferences to M .

M publishes statistic

ˆ ∈ R and privately outputs endogenous privacy parameter y ∈ Y .

M makes monetary transfers P ∈

R and p ∈

R n

Each data owner i ∈ [ n ] realizes utility v i

( y ) − p i to and from the analyst and data owners, respectively.

.

Interpretations of Y To most directly compare the two models, we can restrict Y = (

R

+

) n and interpret y i

= ε i

. The syntax of the new data owner utility functions, however, reflects a new view of privacy as a good with positive value, i.e., v i

( y ) ≥ 0 , rather than privacy loss as having negative value, i.e., − c ( v

∗ i

, ε i

Whereas in [GR11], the analyst must pay data owners for use of their private data, we can design the

v

∗ i

) ≤ 0 so that

.

we can charge a data owner in exchange for some privacy guarantee implied by y . Note that positive values of P and p

in Model 1 correspond to payments from the analyst to the data owners, while they correspond to payments to the analyst from the data owners in Model 2. By permitting both privacy utility and payments to

data owners to be either positive or negative, we increase the class of mechanisms that can guarantee both

privacy and nonnegative utility (see Section 3.3 and Theorem A.2).

For guidance, we should think of y as describing the type and magnitude of noise used by a mechanism that privately outputs y . Typically, the public description of the mechanism M determines the type of noise, and its inputs, namely privacy valuations v and analyst cost c , determine the magnitude. In the

1

In [GR11], the analyst submit a maximum budget

B ∈ consider only budget preferences.

R

+ or minimum accuracy α ∈ [0 , 1] , but for simplicity of syntax, we

5

case of Laplace noise, magnitude roughly corresponds to some level of ε -differential privacy. However,

[NOS12, CCK

+

13, NVX14] argue that

ε alone can only provide an upper bound on true privacy cost of a mechanism. Such a guarantee may not provide a tight bound on the information leaked by the mechanism, for example, when output distributions of a mechanism on neighboring databases are only ε apart for an extremely unlikely set of events and closer otherwise, or when the analyzed upper bound on ε may itself be loose. By allowing y to be of a general form, our mechanisms can potentially release more specific information about their privacy policies that may allow tighter analysis of privacy loss. Furthermore, data owners’ utilities for a mechanism outputting a general privacy parameter need not be limited to the mechanism’s privacy properties. For example, y may include the public outputs of the mechanism, allowing the new framework

to model outcome-dependent utility [NOS12, NST12, CCK

+

13, NVX14], which depends on differential

privacy guarantees and public outputs. Of equal importance to designing a mechanism M

in Model 2 is

designing Y and V that both capture how data owners value the mechanism and allow provable privacy meaningfully related to the endogenous privacy parameter y output by M .

Notation.

We let M ( b , v ) and M ( d , v , c ) denote the joint random variables representing the distribution of outputs of M

in Model 1 running on inputs

b ∈ { 0 , 1 } n

, v ∈ (

R

+

) n

(possibly with B ∈

R

+ fixed) and of

M

in Model 2 running on inputs

d ∈ D distribution of public outputs, (ˆ ) and n , v

R,

∈ V n , c ∈ C , respectively. If unspecified, they denote the joint

ˆ

) , respectively.

3.1

Endogenous Privacy

Next we motivate a new concept of privacy to be required by mechanisms in Model 2. We first describe

privacy in [GR11], and then we illustrate how the strength of this definition counterintuitively breaks the

relationship between the privacy guarantee output by a mechanism in Model 1 and a data owner’s utility for

that run of the mechanism.

Following Definition 2.1, [GR11] says that for any

i ∈ [ n ] and ε i

(possibly with B fixed) is ε i

R

+

, a mechanism

-differentially private for i if for all neighboring ( b , v ) ∼ ( b

0

M

, v

0

)

in Model 1

differing on row i and any event E ⊆

R

+ ×

R

,

Pr[ M ( b , v ) ∈ E ] ≤ exp( ε i

) · Pr[ M ( b

0

, v

0

) ∈ E ]

.

2

Note that one can establish a privacy guarantee for a mechanism for fixed ε i unrelated to the ε i privately

output by the mechanism. However, the proofs in [GR11] (see especially Theorem 5.1) indicate that a

mechanism M in their framework must be ε i

-differentially private for i for any ε i it privately outputs. In particular, M must be private at the level specified by the infimum ε i, inf

− c ( v i

, ε i, inf of its privacy support on any inputs.

We argue that with this privacy requirement, i should not receive utility p i

− c ( v instead realize utility p i

)

∗ i

, ε i

)

as in [GR11] but should

, where the payment is input-dependent but the realized privacy cost is

not. In this case, any mechanism in Model 1 could equivalently be modified to always output

ε inf

, making such a mechanism not endogenous at all. Because we want the utility of a particular run of the mechanism to depend on the ε determined by the inputs, we must endogenize the privacy guarantee itself.

The usual definition of differential privacy captures the idea that an individual cares about the difference between two output distributions: that of the mechanism run on the true database, and that of the mechanism run on the same database with his row changed. We argue that the requirement of privacy for all ε i in the support of the mechanism goes far beyond the true concerns of a data owner. We propose the following new privacy definition, which endogenizes the privacy guarantee by requiring that the output distribution of the mechanism on any set of reference inputs is close to the output distribution of the mechanism on any

2

Privacy in the insensitive value model

[GR11] does not consider how changes in privacy valuations affect a mechanism’s outputs,

requiring only Pr[ M ( b , v ) ∈ E ] ≤ exp( ε i

) · Pr[ M ( b

0

, v ) ∈ E ] for all v and b ∼ b

0

. We seek privacy with respect to privacy valuations as well as private data, our new privacy definition is only motivated by the stronger sensitive value model

[GR11].

6

neighboring set of inputs, where closeness is determined only by the ε supported by mechanism running on the reference inputs . We simultaneously relax privacy in the usual way to permit additive slack δ . Using the syntax of our new model, we define privacy with respect to fixed functions ε, δ : Y → (

R

+

) n that connect the mechanism’s output to its provable privacy:

Definition 3.1

(Endogenous differential privacy) .

For fixed ε, δ ∈ Y → (

R

+

) n

, a mechanism is ( ε, δ ) -endogenously differentially private if for all i ∈ [ n ] , neighboring ( d , v ) ∼ ( d

0

, v

0

)

M

in Model 2

differing on row i , c ∈ C , y in the privacy support of M ( d , v , c ) , and event E ⊆ R ×

R

,

Pr[ M ( d , v , c ) ∈ E ] ≤ exp( ε i

( y )) · Pr[ M ( d

0

, v

0

, c ) ∈ E ] + δ i

( y ) , and

Pr[ M ( d

0

, v

0

, c ) ∈ E ] ≤ exp( ε i

( y )) · Pr[ M ( d , v , c ) ∈ E ] + δ i

( y ) .

Note that ( ε, δ ) -endogenous privacy for constant functions ε, δ coincides with the standard exogenous differential privacy setting where the mechanism does not choose its own privacy level at all; we seek mechanisms in which V captures data owners’ utilities for the mechanism’s output in Y , and the mechanism should be ( ε, δ ) -endogenously private for non-constant functions ε, δ of y .

Compare privacy in the two frameworks by restricting Y = (

R

+ isomorphic to

R

+ functioning as a budget constraint. Let ε

) n

, D = { 0 , 1 } , R be the identity function and let

=

δ

R

+

, and C be the zero

function. Then privacy in [GR11] of

M with B fixed is equivalent to ( ε, δ ) -endogenous differential privacy where probability guarantees for each set of inputs d , v are over all y in the support of M ( · , · , B ) rather than over y only in the support of M ( d , v , B ) . Thus the new definition is strictly weaker than privacy in the

corresponding model of [GR11], but it more accurately encompasses an individual’s privacy concerns.

3.2

Other Properties: Incentive Compatibility, Individual Rationality, Balanced Budget,

Statistical Accuracy, and Market Efficiency

The principal mechanism design goal of both [GR11] and the current work is to elicit data owners’ true

privacy valuations v i

. In order to reasonably assume that data owners report these private types truthfully, we would like our mechanisms to be incentive compatible , meaning that each data owner maximizes his expected utility by reporting his true type:

Definition 3.2

(Incentive compatibility) .

A mechanism M

in Model 2 is incentive compatible if for any

i ∈ [ n ] , privacy valuation v i

∈ V , and any inputs d ∈ D n

, v

− i

∈ V n − 1

, c ∈ C , v i

∈ arg max v i

E[ v i

( y ) − p i

] , where ( p i

, y ) ← M ( d , v

− i k v i

, c ) and the expectation is over the randomness of M .

Note that this definition is only consistent with utility models in which the value of privacy is tied to the privacy output of the mechanism rather than a potentially input-independent overall privacy property of the mechanism. The same is true for the definition of individual rationality , the property that no one is worse off having participated in the mechanism:

Definition 3.3

(Individual rationality) .

A mechanism M

in Model 2 is individually rational if for any

i ∈ [ n ] and inputs d ∈ D n

, v ∈ V n

, c ∈ C ,

E[ v i

( y ) − p i

] ≥ 0 , where ( p i

, y ) ← M ( d , v , c ) and the expectation is over the randomness of M .

7

As in [GR11], we require incentive compatibility of privacy valuations but not of private data, assuming

instead that the true private data is already held somewhere. In this case it may not be possible for data owners to opt out of the privacy market even if they do not expect it to be individually rational for them. For this reason, we consider individual rationality to be a secondary fairness goal and permit mechanisms with qualified individual rationality. We refer to a mechanism as individually rational for a restricted set of inputs if every data owner’s expected utility is nonnegative for any inputs in this restricted set.

Mechanisms in [GR11] are required to be

budget-balanced , i.e., P ≥ P p i for any run of the mechanism

so the mechanism always raises enough funds to pay the data owners. We show in Section 3.3 that this

requirement plays a crucial role in their key impossibility result. We therefore relax this requirement:

Definition 3.4

(Expected balanced budget) .

A mechanism M

in Model 2 is budget-balanced in expectation

if for any inputs d ∈ D n

, v ∈ V n

, c ∈ C ,

E[ P p i

ˆ

] ≥ 0 , where ( ˆ p ) ← M ( d , v , c ) and the expectation is over the randomness of M .

A mechanism with a balanced budget in expectation can be thought of as having a cyclically balanced budget, in that its surpluses will offset its deficits over time across many runs. The left tail of P p i

− ˆ should be tightly bounded so the mechanism is unlikely to ever run a large deficit. The particular class of mechanisms we study will further ensure that the analyst is fairly compensated in expectation, i.e., E[ ˆ ] = c ( y ) , where the expectation is over the randomness of M conditioned on the event that y was privately output.

In [GR11], a mechanism releasing statistic

The statistic

ˆ ≈ P b i is called α accurate if Pr[ | ˆ − P b i

| > αn ] ≤ 1 / 3 .

ˆ ∈ R output by a mechanism in our framework is typically an approximation of some generally-typed query q : D n → R , so the accuracy of

ˆ should be query-specific. Standard differential privacy mechanisms for such queries may inform the accuracy goals of mechanisms in our framework.

The mechanisms discussed in [GR11] that protect the privacy of data owners’ private bits but not of their

privacy valuations seek to either minimize analyst payment subject to some minimum accuracy requirement or maximize accuracy subject to some maximum budget. Such mechanisms do not take into account an analyst’s desired tradeoff between money and accuracy. Mechanisms in our new framework solicit some c ∈ C ⊆ {Y →

R

} from the analyst that indirectly describes this tradeoff. A privacy level y ∈ Y typically corresponds to the noisiness of the statistic

ˆ

, so c ( y ) represents the cost to the analyst of

ˆ generated by the mechanism running on y compared to the noiseless statistic. Given the preferences of data owners and the analyst, our mechanisms seek to find a Pareto efficient (or Pareto optimal ) level of privacy, meaning one where no data owner can be made strictly better off without making another strictly worse off, subject to collecting enough total funds for the analyst.

V and C should be chosen so that for any v ∈ V n

, c ∈ C , there exists some Pareto efficient y ∈ Y .

Definition 3.5

(Pareto efficiency) .

Privacy level y ∈ Y is Pareto efficient for v ∈ V n

, c ∈ C if there exist payments p ∈

R n such that P p i

≥ c ( y ) , and for all y

0 and p v i

( y ) − p i for some i ∈ [ n ] , there exists some j ∈ [ n ] with v j

( y

0

0

) such that

− p

0 j

P p

0 i

≥ c

< v j

( y ) − p j

.

( y

0

) and v i

( y

0

) − p

0 i

>

3.3

Lower Bounds in the Two Frameworks

After providing mechanisms that satisfy standard differential privacy with respect to data owners’ private

data but not of privacy valuations, Ghosh and Roth [GR11] illustrate why mechanisms that are differentially

private in the standard sense, individually rational, budget-balanced, and non-trivially accurate

3

are not

possible in their framework when privacy of data owners’ privacy valuations is required (Theorem A.1).

3

Note that all lower bounds stated formally in Appendix A and discussed informally here hold even for non-truthful mechanisms.

8

When privacy has positive value to data owners, individual rationality is easy to achieve with these other properties in mechanisms that make no monetary transfers. For example, any classically differentially private mechanism would have a trivially balanced budget and nonnegative value to data owners, and it would satisfy differential privacy and non-trivial accuracy according to some exogenously chosen ε . Yet without incentivizing truthful reporting of privacy valuations and tying ε to these inputs, it would be impossible

to make any non-trivial market efficiency claim such as Pareto efficiency. Unfortunately, Theorem A.2

shows that essentially only trivial moneyless mechanisms are possible when privacy has positive value if, as

in [GR11], we require the standard exogenous definition of privacy and a balanced budget.

Relaxing the balanced budget requirement weakens this negative result somewhat, but Theorem A.3

shows that standard privacy requires that a mechanism on any inputs must pay the analyst nothing almost half

the time. After proving this result in Appendix A, we show that endogenous differential privacy is enough to

circumvent this probabilistic lower bound. In the following section, we develop general mechanisms in our new framework that are simultaneously incentive compatible, endogenously differentially private for a Pareto

efficient level of privacy, individually rational for most inputs

4

, budget-balanced in expectation, and accurate.

4 Positive-Value Endogenous Privacy Markets

In this section, we present a class of mechanisms in Model 2 satisfying the previously discussed properties.

We first illustrate an incentive compatibility challenge that arises when the mechanism provides positive

privacy value to the data owners (in contrast to [GR11] and other works modeling disutility of privacy loss), and we describe an existing solution to it. In [GR11], highly sensitive data owners must be paid more to

compensate them for costlier privacy losses. To discourage data owners from overstating their costs to extract

greater payments, the [GR11] mechanisms simulate a second price auction for a given privacy guarantee.

Some affordable v max

, ε is determined, and all data owners with reported v i

< v max receive p i

= c ( v max

, ε while the others get nothing and their data is not used in the summary statistic. The smallest valuation among

) the excluded data owners determines v max to guarantee incentive compatibility. However, since the valuations are used to determine this cutoff, this solution approach cannot protect privacy of the valuations.

In contrast, our new framework allows us consider a baseline where data is used in some way not associated with any differential privacy guarantee, so we can charge highly sensitive data owners more to offset the cost associated with strengthening the privacy guarantee to accommodate them. Interestingly, this significantly changes the incentives of the data owners. The new challenge is that data owners may

try to avoid higher taxes by understating their sensitivity, letting others pay for the privacy enjoyed by all.

5

This “free-rider problem” is solved in a much more general public goods setting in the cumulative works

of [Vic61, Cla71, Gro70, GL77].

In the setting studied by [Cla71, Gro70, GL77], consumers communicate to some central body, called

the government , their valuation v i

( · ) of a certain public good. The government chooses the level of public good that optimizes social utility, and it levies taxes designed to align individual consumers’ utilities with

social utility in order to avoid free-riding [Cla71]. Specifically, the government pays a producer

c ( y ) to produce y ≥ 0 units of the good for the level y maximizing consumer surplus, P v i

( y ) − c ( y ) . Each consumer c ( y ) − P i receives utility j = i v v j

( y ) + max y i

( y ) for the public good and is charged the amount he diminishes others’ surplus:

− i

( P j = i v j

( y

− i

) − n − 1 n c ( y

− i

)) . With these allocation and tax rules, consumers

4

By restricting the set of inputs to that for which our mechanisms are individually rational (see Lemma B.2), standard differential

privacy still subjects us to only slightly weaker lower bound analogs of Theorems A.2 and A.3. See note at end of Appendix A.

5

We focus on mechanisms providing a single guarantee for all data owners (rather than an n -dimensional Y where each v i depends only on the i th coordinate of y ), noting that the Laplace mechanism and generalizations guarantee ε -differential privacy for all i by adding noise to the true query answer. Outputting a single noisy query answer while guaranteeing heterogeneous ε i is an interesting question for future research.

9

are incentivized to communicate their true preferences, and sufficient funds are raised to produce a Pareto

efficient level of the public good.

6

4.1

A Class of Pareto Efficient Private Mechanisms

In our class of mechanisms, we leave the database d ∈ D n and published statistic

ˆ ∈ R of general types.

We assume that the statistical analysis goal is characterized by some query q : D n the existence of some mechanism M q

: D n → R , which is parametrized by ε q

→ R . We further assume

, δ q from some legal set of privacy parameters that includes arbitrarily small ε q the standard sense) when instantiated on any ε q

, δ q

> 0 , and which is ( ε q

, δ q

) -differentially private (in in the legal set. The mechanism should also guarantee

M q

( d ) ≈ q ( d ) , where the approximation is a statistically meaningful metric on R with accuracy depending reasonably on the privacy parameters ε q

We fix the set Y =

R

+

, δ q

.

of privacy parameters and the set C = { c ( y ) = cy : c ∈

R

+ } of cost functions,

following [Cla71, GL77]. We fix the set

V = { v i

( y ) = v i ln( y + 1) : v i

R

+ } of privacy valuations for simplicity, and we discuss this choice in the following section. We identifying V and C with the natural way. We also fix a neighborhood parameter ∆ ∈

R

+ and noise function f :

R

+

R

+

→ in

R

+

.

Mechanism 3 first truncates the privacy valuations and then computes the Pareto efficient privacy level

y and

incentive-compatible charges to the data owners, using the allocation and tax rules from [GL77]. The analyst

payment

ˆ includes noise that is a function of y , and the statistic

ˆ is computed by M q running with privacy parameters ε q

, δ q that are some other functions of y .

Mechanism 3 Privacy Market

Inputs: Database d ∈ D n

, privacy valuations v ∈ (

R

+

) n

1:

2: v y i

← min( v i

P

← ( i c v i

, c · ∆)

− 1)

+

.

for all i ∈ [ n ] .

3:

4:

5:

6: and cost c ∈

R

+

.

p i

← cy − P j = i

¯ j ln( y + 1) + max y

− i

≥ 0

P j = i

¯ j ln( y

− i

+ 1) − n − 1 n cy

− i for all i ∈ [ n ] .

ˆ ← c ( y + γ ) with γ drawn from Lap ( f ( y )) .

ˆ ← M q

( d ) with privacy parameters ε q

=

∆ f ( y − ∆)

Publish

ˆ

, pay analyst

ˆ

, collect p i from each i ∈ [ and n ]

δ q

.

, and privately output y .

In Section 4.2, we show that if privacy valuations are logarithmic, truncating privacy valuations as in

Step 1 ensures that the

y

chosen in Step 2 has sensitivity

∆ while preserving incentive compatibility and

individual rationality on most inputs. Section 4.3 uses the sensitivity of

y to argue the endogenous privacy of

Step 4 in Lemma 4.3. Our main theorem below is an immediate corollary of these results.

Theorem 4.1.

For any fixed M q differentiable, concave f :

R

+ → differentially private in the standard sense, ∆ ∈

R

+

R

+ with f

0

(0) ≤ 1

, Mechanism 3 is

( ε, δ )

, and increasing,

-endogenously differentially

6

To verify incentive compatibility, note that max y

− i utility, i should report arg max v

P j = i v j

( y ) − c ( y ) i v

∗ i

( y ) − ( c ( y ( v

− i k v i

, c

( P

))

, this quantity is indeed maximized when j = i

− v i

= v

P j v

(

∗ i y j = i

.

) − v j n − 1 n

( y ( v

− i c ( y

− i

)) is independent of v i

, so to maximize his k v i

)) . Because y ( v

− i k v

∗ i

) = arg max y v

∗ i

( y ) +

A sufficient condition for Pareto efficiency is the Samuelson condition [Sam54], that the sum of the marginal benefit of a public

good over all consumers equals its marginal cost. With this allocation rule, the level of y maximizing consumer surplus is y such that

P d v i

( y ) = d c ( y ) . Assuming incentive compatibility, this is equivalent to the condition that the sum of the marginal benefit of y dy dy is equal to the marginal cost, so the allocation rule is Pareto efficient.

It can be easily verified that the sum of payments is at least c ( y ) . Since it may be strictly greater, the payments collected are

not guaranteed to be Pareto efficient. This is a problem addressed in [GL77] through different tax and allocation rules, but these

modifications complicate the privacy utility model in our setting.

10

private for ε ( y ) =

3∆ f ( y − ∆) and δ ( y ) = δ q

+ 1 / exp(1 /f

0

( y − ∆))

. Mechanism 3 is incentive compatible

with respect to privacy valuations and individually rational when c ≤ P ¯ i

/e .

4.2

Sensitivity of

y

After fixing c ( y ) = cy for c, y ∈

R

+

following [Cla71, GL77], the main goal is to release

ˆ ≈ cy in an

( ε, δ ) -endogenously differentially private manner for some appropriate (positive and decreasing) functions

ε ( y ) and δ ( y ) . Classical differential privacy techniques would suggest first bounding the sensitivity of arg max y ≥ 0

P v i

( y ) − cy by some ∆ , and then

ˆ

= c ( y + Lap (∆ /ε )) is ε -differentially private for some fixed target ε . While our endogenous privacy parameter ε will be a function of y , we nonetheless first attempt to bound the sensitivity of y .

For consistency with the perspective of y as a public good, willingness-to-pay functions v i

( y ) should be nonnegative, increasing, and concave. With c ( y ) = cy , the unique consumer surplus-maximizing level of privacy will have by y = ( P v i

P

/c − 1)

+ immediate from | ( P v i dy

as in Step 2. By first truncating the

v

/c d v

− i

( y

1)

) =

+ − c

( unless

P

0 i

/c y

= 0

1) +

. When

| = | ( v i v i

( y

− v

) = i

0 i to v

¯ i i ln( y

= min(

) /c | ≤ ∆ :

+ 1) v

, this optimal level is given i

, c · ∆) , sensitivity of y is

Lemma 4.2

(Sensitivity of y ) .

For any c ∈

R

+ and neighboring v ∼ v

0

, we have:

| (

X

¯ i

/c − 1)

+

− (

X i

0

/c − 1)

+

| ≤ ∆

We may worry that bounding v i

will generate sample bias, as argued in [GR11]. Indeed, if data owners

have negative utilities for privacy loss, a mechanism operating on truncated costs will not be able to adequately compensate data owners, and if these privacy-sensitive data owners are able to opt out of the mechanism, this may bias the data. Alternatively, we may worry that truncation might break truthfulness. However,

Lemma B.1 shows that truncation preserves the incentive compatibility argument of [GL77], and Lemma B.2

shows that our positive-value mechanism is individually rational for all data owners as long as analyst cost is not too high relative to data owner valuations.

4.3

Heteroskedastic Noise for Endogenous Privacy

It remains to determine the functions ε ( y ) and δ ( y ) for which

ˆ ≈ cy is endogenously differentially private, given that the optimal y has sensitivity ∆ . The standard deviation of the Laplace noise added to y should be small enough that the analyst is not over- or under-compensated by too much, but it should be an appropriate function of y that permits provable privacy for ε ( y ) and δ ( y ) approaching zero as y approaches infinity. Our mechanism leaves this noise parameter as a general positive function f ( y ) , and we derive functions ε ( y ) and

δ ( y ) depending on ∆ and f

for which we can guarantee endogenous differential privacy in Lemma 4.3. Note

that in our mechanism, noise is heteroskedastic in that the variance of the Laplace noise added to y is not uniform across all values of y . This deviates significantly from the usual privacy scenario and creates a new challenge in proving endogenous privacy.

Lemma 4.3

(Privately publishing y ) .

Define ε ( y ) = 2∆ /f ( y − ∆) and δ ( y ) = exp( − 1 /f

0

( y − ∆)) for increasing, differentiable, concave f :

R

+ c ∈

R

+ and neighboring v ∼ v

0

, let y

= (

R

P

+

¯ i with

/c − f

0

1)

(0) ≤ 1 , fixed ∆ ∈

R

+

+

, y random variables with distributions Lap ( f ( y )) and Lap ( f ( y

0

0

= ( P i

0

/c − 1)

, and any

+

, and let y

γ

R and

)) , respectively. Then for any T ⊆

+

R

γ

,

. For any

0 denote

Pr[ y + γ ∈ T ] ≤ exp( ε ( y )) · Pr[ y

0

Pr[ y

0

+ γ

0

∈ T ] ≤ exp( ε ( y )) · Pr[ y

+

+ γ

γ

0

T

T ] +

] + δ

δ

(

( y y

) .

) , and (4.1)

(4.2)

11

Proof sketch, see Lemma B.3 for full proof.

If we show 4.1 for

y < y

0 and then for y

0

< y

, then 4.2 follows

by symmetry. First note that if y < y

0

, then Pr[ y + γ = t ] ≤ Pr[ y otherwise their ratios differ maximally at t = y

0

+ γ

0

= t ] for t sufficiently far from y

0

. Then it is enough to show Pr[ y + γ = y ] / Pr[ y

0

+ γ

0

=

, and y ] ≤

ε ( y ) , which can be shown using the sensitivity bound on y with the concavity and other assumptions about f .

(

δ

ε,

( y

0)

)

Now consider

/ y

0

< y . Since Pr[

-differential privacy for any

2 for t

= y + f ( y ) /f

0

ε y + γ = t ] / Pr[ y

0

+ γ

. Instead note that

R

∞ t

0

= t ] grows with t , we will not be able to achieve

Pr[ y + γ = t ] dt = exp(

( y − ∆) . Then we can bound Pr[ y + γ = t ] / Pr[ y

0

− 1

+

/f

γ

0

0

( y

=

− t ]

∆)) / 2 = only for t ∈ [ y − f ( y ) /f

0

( y − ∆) , t

] . Since t

∗ is the point in this range where the pdfs of y + γ and y

0 maximally, it is enough to show Pr[ y + γ = t

] / Pr[ y

0

+ γ

0

= t

] ≤ ε ( y )

+ γ

0 differ

. As in the first case, this bound is achieved using the sensitivity of y and the assumptions about f .

Note that y = ( P ¯ i

/c − 1)

+ is the unique y in the privacy support of M ( d , v , c ) for any inputs d , v , c .

Therefore, Lemma 4.3 establishes endogenous differential privacy (Definition 3.1) of

ˆ

= c ( y + Lap ( f ( y ))) for the ε, δ

in the lemma statement. Endogenous differential privacy of the overall mechanism (Theorem 4.1)

is an immediate corollary assuming the differential privacy of M q composition theorem for differential privacy.

in the standard sense and using the general

4.4

Choosing

and

f

for Pareto Efficiency and Accuracy

The internally chosen consumer surplus maximizing privacy level y is noiseless, so its Pareto efficiency

follows immediately by the arguments of [GL77] whenever truncation is avoided, i.e., when each

v i

≤ ∆ · c .

If we expect constant v i and c , we should set ∆ = ω (1) to avoid truncation. The chosen level of privacy differs from the truly Pareto efficient level by

Accuracy of the released statistic and the overall mechanism is (3∆ /f (

P ( v i

− ¯ i

) /c , which grows with the total amount truncated.

R is inherited directly from M y

ˆ

− ∆) , δ q

+ 1 / exp(1 /f

0

( y − q with the chosen privacy parameters,

∆))) -endogenously private. For v i

, c constant, we will expect y linear in n , so we should have ∆ = o ( f ( n )) to ensure that ε decreases with n , and f

0

( n ) = o (1 / ln n ) will ensure δ ( y ) ≤ 1 / poly( n ) for small enough δ q

. Since y = 0 results in zero privacy utility for any data owner, we should have ε (0) = ∞ with ε ( y ) finite for y > 0 , which we get with f ( − ∆) = 0 and strictly increasing. Note that an immediate consequence of using the taxation scheme

of [GL77] is that the budget balances in expectation since

P p i

≥ cy and

ˆ

= c ( y + Lap ( f ( y ))) . To ensure that the mechanism is not likely to run too great a deficit, we should have f ( n ) = o ( n ) . In summary:

Lemma 4.4.

Let ∆ = ω (1) , f (Θ( n )) = ω (∆) , f (Θ( n )) = o ( n ) with f increasing, differentiable and concave with f ( − ∆) = 0 , f

0

(0) ≤ 1 , and f

0

(Θ( n )) = o (1 / ln n ) . If v i

, c i

= Θ(1)

, then Mechanism 3 is

Pareto efficient accurate as determined by M q with ε q

= Θ(∆ /f (Θ( n ))) and δ q

, endogenously private for

ε ( y ) = Θ(∆ /f (Θ( n ))) = o (1) and δ ( y ) = δ q

+ 1 / exp(Θ( f ( n ))) ≤ δ q

+ 1 / poly( n ) , incentive compatible, individually rational, and has a balanced budget in expectation and a deficit greater than t with probability at most exp( − t/f (Θ( n ))) / 2 .

As a concrete example, ∆ = ln n and f ( y ) =

√ y + ∆ are reasonable choices.

4.5

Generalized Privacy Valuations

Mechanism 3 relies on the assumption each data owner’s utility for the level of

y provided by the mechanism is represented by some v i

( y ) = v i ln( y + 1) . This choice of logarithmic utility functions was the convenient one, since it allows us to easily bound the sensitivity of y using a simple truncation rule. However, many other non-negative, increasing, concave functions of y may be appropriate models of the utility to data owners of y .

Consider the case that each data owner has valuation function v i

( y ) = v i y

1 /a for a > 1 . As before, we first truncate the v i so that ¯ i

= max( v i

, v max

) for some v max to be determined later to adequately

12

control the sensitivity of y , which is the level of privacy that maximizes consumer surplus, i.e., y ( v , c ) := arg max y ≥ 0

P v i

( y ) − cy . Then we have: y ( v , c ) =

P i

0 ac

| y ( v

− i k v i

, c ) − y ( v , c ) | = ( ac ) a − 1 a a a − 1

· | ( v i

0

+

X v j

) a a − 1 j = i

− ( v i

+

X v j

) a a − 1

| j = i

 

≤ ( ac ) a − 1 a

·

( v max

+

X v j

) a a − 1 j = i

− (

X v j

) a a − 1

 j = i

≤ ( ac ) a − 1 a

· ( nv max

) a a − 1

− (( n − 1) v max

) a a − 1

(

( n n

1) ac

1) ac v v max max a a − 1

· (1 +

1 n − 1

) a a − 1

− 1 a a − 1

·

 a a − 1

(1 + 1 /n )

1 a − 1

 n

 c ∆

In the case that a ≥ 2 , we have | y ( v

− i a − 1 a k v i

, c ) − y ( v , c ) | ≤ ( nv max

/c ) a a − 1 /n . Then if we set v max

=

/n

1 /a

, the sensitivity of y is ∆ for some fixed ∆

as before, and privacy follows as in Lemma 4.3.

Incentive compatibility also follows as in Lemma B.1. Individual rationality, however, does not appear to

hold for agents with low privacy sensitivity. In particular, when v i

= 0 , i will always be charged p i

( v , c ) > 0 whenever P j = i v j

> 0

. With the exception of individual rationality, the other properties of Lemma 4.4 hold

with Θ( n ) replaced with Θ( n a/ ( a − 1)

) . It remains an open problem to identify further classes of valuation functions for which our mechanism or variants of it satisfy all desired properties for endogenous privacy markets.

References

[CCK

+

13] Yiling Chen, Stephen Chong, Ian A. Kash, Tal Moran, and Salil Vadhan. Truthful mechanisms for agents that value privacy. In Proceedings of the Fourteenth ACM Conference on Electronic

Commerce , EC ’13, pages 215–232, New York, NY, USA, 2013. ACM.

[Cla71] Edward H. Clarke. Multipart pricing of public goods.

Public Choice , 11(1):17–33, 1971.

[DFI11] Pranav Dandekar, Nadia Fawaz, and Stratis Ioannidis. Privacy auctions for inner product disclosures.

CoRR , abs/1111.2885, 2011.

[DL09] Cynthia Dwork and Jing Lei. Differential privacy and robust statistics. In Proceedings of the

41st annual ACM symposium on Theory of computing , pages 371–380. ACM, 2009.

[DMNS06] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Shai Halevi and Tal Rabin, editors, Theory of Cryptography , volume 3876 of Lecture Notes in Computer Science , pages 265–284. Springer Berlin Heidelberg,

2006.

13

[DRV10] Cynthia Dwork, Guy N. Rothblum, and Salil Vadhan. Boosting and differential privacy. In

Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on , pages 51–60.

IEEE, 2010.

[FL12] Lisa Fleischer and Yu-Han Lyu. Approximately optimal auctions for selling privacy when costs are correlated with data. In ACM Conference on Electronic Commerce , pages 568–585, 2012.

[GL77] Theodore Groves and John O Ledyard. Optimal allocation of public goods: a solution to the

“free rider” problem.

Econometrica , 45(4):783–809, May 1977.

[GL13] Arpita Ghosh and Katrina Ligett. Privacy and coordination: computing on databases with endogenous participation. In Proceedings of the Fourteenth ACM Conference on Electronic

Commerce , EC ’13, pages 543–560, New York, NY, USA, 2013. ACM.

[GR11] Arpita Ghosh and Aaron Roth. Selling privacy at auction. In Proceedings of the 12th ACM

Conference on Electronic Commerce , EC ’11, pages 199–208, New York, NY, USA, 2011. ACM.

[Gro70] Theodore Groves.

The Allocation of Resources Under Uncertainty: The Informational and

Incentive Roles of Prices and Demands in a Team . Technical report (University of California,

Berkeley. Center for Research in Management Science). University of California, 1970.

[LR12] Katrina Ligett and Aaron Roth. Take it or leave it: running a survey when privacy comes at a cost. In Proceedings of the 8th International Conference on Internet and Network Economics ,

WINE’12, pages 378–391, Berlin, Heidelberg, 2012. Springer-Verlag.

[NOS12] Kobbi Nissim, Claudio Orlandi, and Rann Smorodinsky. Privacy-aware mechanism design. In

Proceedings of the 13th ACM Conference on Electronic Commerce , EC ’12, pages 774–789, New

York, NY, USA, 2012. ACM.

[NST12] Kobbi Nissim, Rann Smorodinsky, and Moshe Tennenholtz. Approximately optimal mechanism design via differential privacy. In Proceedings of the 3rd Innovations in Theoretical Computer

Science Conference , ITCS ’12, pages 203–213, New York, NY, USA, 2012. ACM.

[NVX14] Kobbi Nissim, Salil Vadhan, and David Xiao. Redrawing the boundaries on purchasing data from privacy-sensitive individuals. In Proceedings of the 5th Conference on Innovations in Theoretical

Computer Science , ITCS ’14, pages 411–422, New York, NY, USA, 2014. ACM.

[Sam54] Paul A. Samuelson. The pure theory of public expenditure.

The Review of Economics and

Statistics , 36(4):387–389, November 1954.

[Vic61] William Vickrey. Counterspeculation, auctions, and competitive sealed tenders.

Journal of

Finance , 16(1):8–37, 03 1961.

[Xia13] David Xiao. Is privacy compatible with truthfulness? In In Proc. ITCS 2013 , pages 67–86, 2013.

A

Proofs of Section 3.3 Lower Bounds

The impossibility result in[GR11] assumes for simplicity that

c ( v i

, ε i

) = v i

· ε i and shows that nontrivial accuracy and privacy together require P ε i

≥ ln(4 / 3) . They could more generally assume that the privacy support of the mechanism is nontrivial, i.e., it contains some nonzero ε ∈ (

R

+

) n

, and that the cost function family is unbounded, i.e. for any P > 0 and nonzero ε ∈ (

R

+ ) n

, there exists some v ∈ (

R

+ ) n with

14

P c ( v i

, ε i

) > P . We present the proof for their result in this more general case for comparison to the

new lower bounds in our framework. Our new results concern mechanisms in Model 2 with

Y = (

R

+ ) n

, corresponding to a vector of ε i

, and arbitrarily small value to the data owners, i.e., V such that for any P > 0 , there exists a v ∈ V n with P v i

( ε i

) < P for any y = ε in the privacy support of M on v .

Theorem A.1

(Theorem 5.1 [GR11])

.

Any mechanism M

in Model 1 with nontrivial privacy support and

unbounded cost function family that is ε i

-differentially private for each i ∈ [ n ] for every ε it outputs, individually rational, and budget-balanced can never charge the analyst any finite payment.

Proof summary.

Assume for contradiction that there exists some private, individually rational, and budgetbalanced mechanism M that charges the analyst some finite P

0 on some fixed v

0 and other inputs with positive probability. Consider v with P c ( v i

, ε i

) > P

0 for some nontrivial ε in the privacy support of M .

By individual rationality and the balanced budget assumption, M running on v must charge the analyst more than P outputs

0

, i.e., M charges at most P

P

0

0 with probability zero. Then by privacy, the probability that is also zero, contradicting the initial assumption.

M on v

0

Clearly any moneyless (trivially budget-balanced) standard differentially private mechanism with some fixed privacy parameter (that it privately outputs) satisfies differential privacy and individual rationality when the mechanism has non-negative utility for the data owners. The following theorem shows that these are the only positive-value approximation mechanisms that satisfy all these properties.

Theorem A.2.

Any mechanism M

in Model 2 with

Y = (

R

+ ) n and arbitrarily small value to the data owners that is ε i

-differentially private for each i ∈ [ n ] for every y = ε it outputs, individually rational, and budget-balanced can never make any positive payment to the analyst.

Proof.

Fix any P > 0 and let v be a set of valuations such that P v i

( ε i

) < P for ε in the privacy support of

M on v . Then by individual rationality and the balanced budget assumption, M must pay the analyst less than P when running on v , and by privacy, the probability that M pays the analyst at least P when running on any inputs is also zero.

Underlying the result of Theorem A.2 is the zero-probability event that a strictly budget-balanced

mechanism pays the analyst more than it can charge data owners for some fixed inputs and corresponding privacy guarantees. By permitting the mechanism to sometimes lose money by paying the analyst more than no longer forces an arbitrarily small upper bound on

P the individual rationality-mandated maximum that the data owners can be charged, arbitrarily small v i

( ε i

)

ˆ

. However, if we make only the weak assumption that for any fixed inputs, the mechanism loses money at most half the time, the standard definition of differential privacy still imposes a strict upper bound on the probability of the analyst receiving any positive payment.

Theorem A.3.

Let M

be any mechanism in Model 2 with

Y = (

R

+

) n and arbitrarily small value to the data owners that is ε i

-differentially private for each i ∈ [ n ] for every y = ε it outputs, individually rational, and satisfies Pr[ ˆ P p i

] ≤ 1 / 2 for any inputs. Let ε i, inf denote the infimum of y i

= ε i in the support of the mechanism. Then for any fixed P > 0 and any inputs, M pays the analyst more than P with probability at most exp( P ε i, inf

) / 2 .

Proof.

Fix any P > 0 and let v be a set of valuations such that P v i

( ε i

) < P for ε in the privacy support of

M on v . By individual rationality, we have P p i

≤ P v i

( ε i

) , so we must have Pr[ ˆ ] ≤ 1 / 2 . Then by privacy, we have that for any inputs, Pr[ ˆ ] ≤ exp( P ε i, inf

) / 2 .

In other words, any mechanism capable of making strong privacy guarantees must pay the analyst nothing almost half the time if the standard notion of input-independent differential privacy is used . The probability bound arises from collapsing the bounds in probabilities of the event that the mechanism pays

15

the analyst

( v

1

, . . . , v j

, v

ˆ on neighboring pairs of databases in the chain of databases v

(0)

0 j +1

, . . . , v

0 n

) , . . . , v ( n ) = v where v is such that P v i

( ε i

) < P and v

0

= v

0

, . . . , v

( j )

= is arbitrary. With

( ε, δ ) -endogenous privacy for ε the identity function and δ the zero function, the reference databases and not

{ ε i, inf

} determine the probability differences across neighboring databases. Let minimum privacy parameters in the support of M ( v

( j )

ε

( j ) denote the entry-wise

) . Then endogenous privacy yields the bound:

Pr

M ( v 0 )

[ ˆ ] ≤ exp( ε

(1)

1

) Pr

M ( v

(1)

)

[ ˆ ] ≤ · · · ≤ exp( P ε

( i ) i

) Pr

M ( v )

[ ˆ ] ≤ exp( P ε

( i ) i

) / 2 .

For v small enough for not trivial.

7

P v i

( ε i

) < P , we expect the ε

( i ) i for large i to be large, making this bound loose if

Lower bounds for restricted valuation classes.

The above bounds consider mechanisms with arbitrarily small value to the data owners, i.e. for any P > 0 there exists some v ∈ V n with P v i

( ε i

) < P for any ε in the support of M on v . If instead we only require v with P v i

( ε i

) < P

0 for some fixed (rather than arbitrarily small) P

0

, then such a mechanism that satisfies standard differential privacy and individual rationality can never pay the analyst more than P

0 and maintain a balanced budget, and it can pay the analyst more than P

0 with probability at most exp( P ε i, inf

) / 2 while maintaining Pr[ ˆ P p i

] ≤ 1 / 2 . As before, endogenous privacy escapes this bound.

B

Mechansim 3 Properties and Proofs

For notational simplicity, we define the following functions for the optimal privacy level and individual taxes computed by the mechanism for inputs v ∈ (

R

+

) n

, c ∈

R

+

, recalling that ¯ i

= min( v i

, c · ∆) : y ( v , c ) =

P ¯ i c

− 1

+ p i

( v , c ) = cy ( v , c ) −

X j = i

¯ j ln( y ( v , c ) + 1) + max y

− i

(

X j = i

¯ j ln( y

− i

+ 1) − n − 1 n cy

− i

) .

Lemma B.1.

Mechanism 3 is incentive compatible.

Proof.

Fix any i ∈ [ n ] and v i

R

+

, and denote

U i

( v , c ) = v i

( y ( v , c )) − p i

( v , c )

= ( v i

+

X j = i

¯ j

) ln( y ( v , c ) + 1) − cy ( v , c ) − max y

− i

≥ 0

(

X j = i

¯ j ln( y

− i

+ 1) − n − 1 n cy

− i

) .

We need to show that for any

Observing that increases with y max until optimal value of y if v y

∗ i y

− i

= (

≥ 0 v

∗ i v

+ i

∈ (

( P j = i

P

¯ j = i

R j

¯

+ j

) ln(

) n − 1 y

/c i

− and

+ 1)

1 c

∈ n

R

+ n − 1

, we have v

∗ i

. Therefore, by declaring arg max v i

= v v i

U i

( v

− i k v i

, c ) .

cy

− i

) has no dependence on v i

, we see that U i

( v , c ) i

, y ( v , c ) coincides with i ’s

≤ c · ∆ , and it it maximizes i ’s utility subject to truncation otherwise.

Lemma B.2.

Mechanism 3 is individually rational on inputs

v ∈ (

R

+

) n

, c ≤ P ¯ i

/e .

7

We also get a similar bound for every permutation π on [ n ] identifying a different chain of hybrid databases between v

0 but in all of these cases, the ε

( i )

π ( i ) for large i should be large.

and v ,

16

Proof.

First note that v i ln( y ( v , c ) + 1) = v i ln p i as follows:

P c

¯ i ≥ ¯ i

, so it is enough to show that p i

( v , c ) ≤ v i

. Bound p i

( v , c ) = cy ( v , c ) −

X j = i

¯ j ln( y ( v , c ) + 1) + max y

− i

(

X j = i

¯ j ln( y

− i

+ 1) − n − 1 cy

− i

) n

= (

X v i

− c ) −

X v j ln

P v i c j = i

+

X v j j = i ln

P j = i n − 1 n c

¯ j

!

− 1

!

+ n − 1 c n v i

− c n

X v j

(1 + ln j = i

P v i c

− ln

P j = i n − 1 n c

¯ j

) v i

− c n

X v j j = i ln e n − 1 n

P

P j = i v j v i

.

Since the ¯ i are nonnegative, it is enough to show that e n − 1 n

≥ 1 , which clearly holds for any n ≥ 2 .

Note that the conditions for Lemma B.2 hold whenever

c ≤ n and P min( v i

, c · ∆) /n ≥ e . The mechanism could easily be modified to enforce c ≤ n and in many scenarios it may be a reasonable to assume a distribution on v i satisfying the latter requirement. Note that these qualifications do not

affect the impossibility result in [GR11], which relies on the existence of

v i implying arbitrarily high costs for any fixed ε i

. They do

affect Theorem A.2 since a mechanism running on this restricted set of

inputs cannot output a privacy level with arbitrarily small value to the data owners. However, note that v = (0 and P

, . . . , 0 , ce ) satisfies c ≤ P v i

/e for ∆ ≥ 1

. Mechanism 3 outputs

y = P v i

/c − 1 = e − 1 on v , c , v i

( e − 1) = ce ln( e − 1 + 1) = ce

. Then the note about restricted valuation classes in Appendix A

illustrates that with standard instead of endogenous differential privacy, individually rational mechanisms running on the restricted set of inputs can pay the analyst at most ce with almost 1/2 probability.

8

Lemma B.3

(Lemma 4.3, privacy)

.

Define ε ( y ) = 2∆ /f ( y − ∆) and δ ( y ) = exp( − 1 /f

0

( y − ∆)) for increasing, differentiable, concave f :

R

+ c ∈

R

+ and neighboring v ∼ v

0

, let y = with distributions Lap ( f ( y )) and Lap ( f ( y

0 y

(

)) v

R

, c

+

) with

, y

0

= f y

0

(

(0) v

0

, c )

1 , fixed

, and let

, respectively. Then for any

γ

T

∈ and

R

+

R

γ

,

0

, and any y ∈

R

+

. For any denote random variables

Pr[ y + γ ∈ T ] ≤ exp( ε ( y )) · Pr[ y

0

Pr[ y

0

+ γ

0

∈ T ] ≤ exp( ε ( y )) · Pr[ y

+

+ γ

γ

0

T

T ] +

] + δ

δ

(

( y y

) .

) , and (B.1)

(B.2)

Proof.

then Pr[

We show B.1 for

y + γ = t ] ≤ Pr[ y < y y

0

0

+ γ

0 and then for y

0

< y

. Then B.2 follows by symmetry. First note that if

= t ] for t sufficiently far from y

0 y < y

0

, and otherwise their ratios differ maximally

,

8

If qualified individual rationality is undesired, one might consider applying the propose-test-release strategy of [DL09] and

P

aborting as a first step if the conditions of Lemma B.2 are not met. However, note that

¯ i

/c has sensitivity ∆ . Adding noise

Lap (∆ /ε ) for differential privacy (although there would be some modifications to make this endogenously private) would overwhelm the threshold e when ∆ = ω (1) as in the usual case, so this strategy seems unlikely to work directly. We leave the issue of unqualified individual rationality as a question for future work.

17

at t = y . Then it is enough to show Pr[ y + γ = y ] / Pr[ y

0

+ γ

0

= y ] ≤ ε ( y ) :

Pr[ y + γ = y ]

Pr[ y

0

+ γ

0

= y ]

=

1

2 f ( y )

· exp(

− 0 f ( y )

)

1

2 f ( y

0

) f ( y

0

) exp(∆ /f ( y

0

)) f ( y )

· exp(

− ( y

0 f ( y

− y )

0

)

)

≤ exp(ln f ( y

0

) f ( y )

+

∆ f ( y 0 )

)

≤ exp( f ( y )

( f

0

( y ) + 1)) .

The final expression is at most

Pr[ y

Now consider

Since

+

Pr[ y + γ = t ]

Pr[ y

γ

0

+ γ

= t

0

= t ]

] / y

Pr[

0 y

< y

0

+

. Set

γ

0 t

∗ exp( increases with

= t

]

=

≤ y

ε

ε

(

+

( y f t > y y

))

)

(

: y as long as

) /f

0

( y − f

∆)

0

( y ) ≤ f so that

0

(0) ≤ 1 , which is true by assumption.

R

∞ t

Pr[ y + γ = t and these probabilities decrease with

] dt t

= exp( − 1 /f

0

( y − ∆)) / 2

, it is enough to show that

.

Pr[ y + γ = t

]

Pr[ y 0 + γ 0 = t ∗ ]

Pr[ γ 0 f ( y

0

) f ( exp

Pr[ γ = f ( y ) /f

0

( y − ∆)] y )

= f exp

( y ) f

/f

( y

0

)

( y

/f

0

− ∆) + ∆]

( y − ∆) + ∆ f ( y 0 )

− f ( y

0

) + ∆ f

0

( y

0 f ( y 0 )

)

/f

0

( y − ∆) + f ( y ) /f

0

( y − ∆) f (

∆ y 0 ) f (

− y

1

)

/f

0

( y − ∆)

≤ exp

∆ f ( y 0 )

· f

0

( y

0

) /f

0

( y − ∆) + 1

2∆

≤ exp( f ( y 0 )

) , where the final inequality follows from f increasing and concave.

18

Download