Privacy-Preserving Computation and Verification of Aggregate Queries on Outsourced Databases Brian Thompson1, Stuart Haber2, William G. Horne2, Tomas Sander2, and Danfeng Yao1 1 Rutgers University Dept. of Computer Science Piscataway, NJ 2 Hewlett-Packard Labs 5 Vaughn Dr., Suite 301 Princeton, NJ PETS 2009 Contributions • An efficient, distributed architecture for outsourcing databases • A privacy-preserving protocol for computing aggregate queries that is resistant to collusion of dishonest service providers • A mechanism that allows users to verify the integrity and correctness of aggregate query responses PDAS: Privacy-Preserving Database-As-a-Service Outline • Motivation • PDAS Architecture and Protocol • Secure Computation of Aggregate Queries • Correctness Verification • Conclusions and Future Work Outline • Motivation • PDAS Architecture and Protocol • Secure Computation of Aggregate Queries • Correctness Verification • Conclusions and Future Work PETS 2009 Simple Client-Server Model Client Data Owner Client query response Client Client Client What if data owner has insufficient time or resources to answer all queries? PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 Database-As-a-Service • Outsource database to a trusted third-party service provider (SP). • SP supports and maintains DBMS infrastructure, stores data and responds to queries. • Applications: Census data, medical records, network monitoring, recommendation systems. • Data may be private or sensitive. – Only answer queries that follow a pre-defined outside scope inference control policy. of our work PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 Database-As-a-Service Security threat! What if server is compromised or SP is malicious? Data Owner sensitive data, inference control policy Service Provider query result AQ rejected! query Q Client Integrity issue! How does Client know that results are correct? PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 Database-As-a-Service • Encryption [HIM02, MT06] – When client is the original data owner. • Publish only statistics – Limits utility for complex data mining apps. • Publish representative subset – Good for approximate query results. – No privacy for individuals in released dataset. PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 Our Solution: Privacy-Preserving Database-As-a-Service (PDAS) • Outsource database to m service providers. • Each SP gets a “share” of each data item. • Each share gives zero information, but the shares can be combined to reconstruct the original data. [Shamir ’79] • A homomorphic commitment scheme is used to guarantee correctness. [Pedersen ’91] PDAS: Privacy-Preserving Database-As-a-Service Outline • Motivation • PDAS Architecture and Protocol • Secure Computation of Aggregate Queries • Correctness Verification • Conclusions and Future Work PETS 2009 PDAS Architecture Data Owner request shares of AQ SP1 calculate share AQ1 SP2 SP3 calculate share result AAQQ2 calculate share AQ3 aggregate query Q result AQ, proof of correctness Client PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 PDAS Protocol 1. COMMIT: Data owner generates commitment values, signs root of Merkle hash tree. 2. DISTRIBUTE: Shares of each data item are distributed to SPs using Shamir secret-sharing. 3. QUERY: Client submits aggregate query to SP. 4. RESPOND: SP requests shares of aggregate from other SPs, recovers result, returns to Client. 5. VERIFY: Client checks commitments against signed root hash, verifies commitment for result. PDAS: Privacy-Preserving Database-As-a-Service Outline • Motivation • PDAS Architecture and Protocol • Secure Computation of Aggregate Queries • Correctness Verification • Conclusions and Future Work PETS 2009 Secret Sharing with Polynomials [Shamir ’79] • Construct a random (k-1)-degree polynomial P with P(0) = S. • Each share is a point on the curve. • k points are both necessary and sufficient to uniquely determine the polynomial. Note: Computation in the field Fq Note: Allows for threshold scheme PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 Secret Sharing with Polynomials PA(x) (x1, PA(x1)) (0, A) x1 x2 (x2, PA(x2)) x3 (x3, PA(x3)) PETS 2009 Secret Sharing with Polynomials (x1, PB(x1)) (0, B) (x2, PB(x2)) (x3, PB(x3)) x1 PB(x) x2 x3 PETS 2009 Secret Sharing with Polynomials Task: secure Task: compute computation of A+B PA(x) (x1, PB(x1)) (0, B) (x2, PB(x2)) (x3, PB(x3)) (x1, PA(x1)) (0, A) x1 PB(x) x2 (x2, PA(x2)) x3 (x3, PA(x3)) PETS 2009 Secret Sharing with Polynomials (x1, PA+B(x1)) (0, A+B) Determined Player 312 calculates: the sum A+B without PA(x revealing A321)or B ! 1) + PB(x 3 2 PA(x) (x1, PB(x1)) PA+B(x) (x2, PB(x2)) (x2, PA+B(x2)) (x1, PA(x1)) x1 PB(x) (x3, PB(x3)) (x3, PA+B(x3)) x2 (x2, PA(x2)) x3 (x3, PA(x3)) PETS 2009 Secret Sharing in PDAS • A secret-sharing polynomial Pj is constructed for each data element Dj , i.e. P j ( 0 ) D j • The share of data Dj for SPi is ( i , P j ( i )) • Suppose client queries for SUM ( D1 , , D n ) • SPi computes and broadcasts Pˆ (i) Pj (i) • Using polynomial interpolation, the SPs can derive the polynomial Pˆ ( x) Pj ( x) • Pˆ (0) Pj (0) SUM ( D1 ,, Dn ) PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 Secret Sharing in PDAS • Honest SPs only contribute to a computation if the query follows the data owner’s policy. • PDAS allows for a (k,m) threshold scheme, where any k of m SPs can answer a query. If less than k collaborate, they learn nothing. • If there are less than k dishonest SPs, the system has information theoretic security. • Privacy is preserved* – no information is leaked besides the query results! PDAS: Privacy-Preserving Database-As-a-Service Outline • Motivation • PDAS Architecture and Protocol • Secure Computation of Aggregate Queries • Correctness Verification • Conclusions and Future Work PETS 2009 Verification in PDAS The Pedersen Commitment Scheme [’91] Prover: COMMIT(x ) • Publish generators g , h of group G p • Choose random r x r • Calculate commitment value: C r ( x ) g h Verifier: VERIFY( x , r , c ) • Check commitment: c C r ( x ) g x h r PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 Verification in PDAS • Owner computes commitment to each data entry C r ( D j ) and signs to authenticate. j • Given D j , r j , C j , the client verifies the D r commitment: C j C r ( D j ) g h . j j j • This requires access to sensitive data D j ! • Problem: How to verify an aggregate query result without access to individual entries? Use a homomorphic commitment scheme! PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 Verification in PDAS Pedersen commitment scheme is homomorphic: Cr1 ( x1 ) Cr2 ( x2 ) g What is x1+ x2? x1 x2 r r2 h1 Cr1 r2 ( x1 x2 ) commitments signed by data owner Verify: C rˆ ( xˆ ) C r1 ( x1 ) C r2 ( x 2 ) Service Provider xˆ x1 x 2 rˆ r1 r2 C r1 ( x1 ) g h x1 r1 C r2 ( x 2 ) g h x2 r2 xˆ , rˆ C r1 , C r2 PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 Verification in PDAS • Use Merkle hash tree to improve efficiency. • Data owner only signs once: the root hash. hroot hroot h0 h00 C r1 ( x1 ) C r2 ( x 2 ) h1 h01 C r3 ( x 3 ) C r4 ( x 4 ) h10 C r5 ( x 5 ) C r6 ( x 6 ) h11 C r7 ( x 7 ) PDAS: Privacy-Preserving Database-As-a-Service C r8 ( x 8 ) Outline • Motivation • PDAS Architecture and Protocol • Secure Computation of Aggregate Queries • Correctness Verification • Conclusions and Future Work PETS 2009 Security Properties of PDAS • Secrecy: Only query results are revealed. • Security: Commitments are computationally binding and unconditionally hiding. • Correctness: Accuracy, integrity guaranteed. • Collusion resistance: Privacy is protected against k-1 collaborating adversaries. • Accountability: Malicious SPs will be caught. In practice, may relax some properties to achieve greater functionality. Details in corrected version of paper. PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 Efficiency of PDAS • Setup cost is O(nm) time* for data owner, but there is no maintenance cost. • Space required is O(n) for each SP. • Time complexity to compute a query over subset S is only O(|S|) for each SP, plus O(|S| log n) communication cost. • Verification has computational and communication cost O(min(|S| log n, n)). PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 Extensions • Dynamic databases – Support efficient addition/deletion • Multiple data owners • Load balancing • Selection over insensitive attributes – “Mixed” databases – Guaranteeing completeness PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 Future Work • Complex queries – Nested queries – Selection over sensitive attributes – MAX, MIN • Inference control – Differential privacy [Dwork06] • Private Information Retrieval – [Chor, Goldreich, Kushilevitz, Sudan ‘95] PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 Conclusions PDAS accomplishes the following goals: • A distributed architecture for computing aggregate queries over sensitive data in outsourced databases. • An efficient protocol for verifying the accuracy and integrity of query results. • A secure system that is robust against a network of k-1 collaborating adversaries. PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 Thank you! Corrected version to be available soon: http://www.cs.rutgers.edu/~danfeng/ PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 Extra Slides PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 Our Solution: Secret Sharing • How to enforce a query response policy? Please give me your share of Σ Dj! SUM =? Okay, sure! PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 Our Solution: Secret Sharing • How to enforce a query response policy? Please give me your share of x! No, I’m not supposed to. . . PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 Secret Sharing PDAS: Privacy-Preserving Database-As-a-Service PETS 2009 Related Work • H. Hacigümüs, B. Iyer, S. Mehrotra. “Efficient Execution of Aggregation Queries over Encrypted Relational Databases.” DASFAA, 2004. • F. Chin. “Security Problems on Inference Control for SUM, MAX, and MIN Queries.” Journal of ACM, 1986. • G. Jagannathan, R. Wright. “Private Inference Control for Aggregate Database Queries.” PADM, 2007. PDAS: Privacy-Preserving Database-As-a-Service