IEEE INFOCOM 2012, March Orlando, USA Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b, and Guojun Wanga Central South University, China b Temple University, USA a 2012-3-26 Introduction Cloud Computing Model o Cloud computing as a new commercial paradigm enables users to outsource data to a cloud oData is described by a set of keywords oUsers retrieve files with a set of keywords F1: { A, B} A, B F1 Cloud F2 Bob F2: {B, D} F3: {C, D} … o Cloud will learn user’s search pattern and access pattern Private search (Ostrovsky et al, CRYPTO 2005) oGiven a public dictionary that contains all keywords, e.g., dictionary=<A,B,C,D> F1: { A, B} F2: {B,D} F3: {C,D} … [1] [1] [0] [0] key trick: map unmatched files to 0 [1] [1] [0] [0] Bob F1 F2 0 NA A compressed version of all files Homomorphic encryption E(x)*E(y) = E(x+y) E(x)^y = E(x*y) Cloud F1 F1 E(0)*E(0)=E(0+0)=E(0) E(0)^F3=E(0*F3)=E(0) F1 F2 0 NA F2 F3 NA F2 E(F2)* E(0) =E(F2) 0 survival collision survival unmatched Problem: Cost Grows Linearly o Processing each query is expensive. Given n users, the cloud needs to execute n queries o Performance bottleneck oCloud will return all matched files, even if a user is interested in smaller percentage o Waste bandwidth Our Solutions: EIRQ Scheme Efficient Information retrieval for Ranked Query o A proxy server (ADL) is introduced between the users and the cloud (trusted) o Aggregate user queries o Distribute searching results o Support ranked query Cloud … ADL Rank queries o Queries are classified into ranks o ADL constructs a mask matrix o Cloud filters a certain percentage of matched files Rank-0 query: 100% Rank-1 query: 50% {A, B} Rank 0 Alice F1 Bob … Mask matrix F2 {A, C} Rank 1 F1 F1: { A, B} F2: {B, D} F3: {C, D} F3 F1 F2 F3 Cloud ADL F3 is filtered with 50% Challenges: the cloud oCannot know which files are filtered/returned oCannot know each queries’ rank Scheme Description Intuition of EIRQ o Key techniques: oConstruct a mask matrix to protect query ranks oFilter files without knowing which files are filtered User Step 1: QueryGen Keywords, rank Step 2: ADL Matrix Construct Cloud Mask matrix Step 3: Step 4: File Recovery Certain percentage of files matching user keywords Buffer FileFilter Goal o Queries are classified into 0,1,…,r-1 ranks. o Rank-i query retrieves (1-i/r) percentage of matched files … Files that match rank 0 queries Will not be filtered … … Files that match rank 1 queries Filtered with probability 1/r … Files that match rank i queries Filtered with probability i/r The cloud oCannot know which files are filtered/returned oCannot know each queries’ rank Construct Mask Matrix oADL constructs a mask matrix that is encrypted with its publics key, and sends it to the cloud {A, B} Rank 0 Alice {A, C} Rank 1 ADL Bob A [1] [1] B [1] [1] C [1] [0] D [0] [0] … … [0] [0] Cloud Number of keywords Number of ranks, r=2 For a keyword: Number of 1s is determined by the rank of query it appears: r-i High rank takes over Ratio of 1s to r determines the probability of a file containing it to be returned: (r-i)/r High ratio takes over Filter Files The cloud chooses a random column for each file For F3: 50% F1: { A, B} F2: {B, D} F3: {C, D} … 50% E(0)*E(0)=E(0) E(0)*E(0)=E(0) A [1] [1] E(0)^F3 =E(0) E(1)^ F3 =E(F3) B [1] [1] C [1] [0] D [0] [0] … … [0] [0] A file, matched rank i query, the probability to be filtered i/r buffer ADL F1 and F2 will be returned F3 will be filtered with 50% Cloud … Evaluation Setup o Our simulations are conducted with MATLAB R2010a, running on a local machine with an Intel Core 2 Duo E8400 3.0 GHz CPU and 8 GB RAM. We summarize the parameters in Table. Percentage of Returned Files o Queries are classified into 0 to 3 ranks o Rank-0: 100% o Rank-1: 75% o Rank-2: 50% o Rank-3: 25% o Our results: o Rank-0: 100% o Rank-1: 75% o Rank-2: 52% o Rank-3: 29% Computation Cost o ADL: 14.8270s-14.8788s o EIRQ:14.8664s-14. 9269s Communication Cost Communication cost o EIRQ works better when only a few users o 5 users in each rank, 4 common keywords o EIRQ : 439KB buffer o ADL: 834KB buffer Conclusion 1 An ADL is introduced to avoid performance bottleneck of the cloud 2 EIRQ scheme allows the queries with higher rank to retrieve higher percentage of matched files 3 Our solution protects access pattern, search pattern, and rank privacy from the cloud Thank you! Background System Model Adversary Model Ostrovsky Scheme System model oUsers in the organization send queries to ADL oADL will aggregate user queries and query cloud with a combined query o Cloud will return the files matching the combined query to ADL oADL distributes results to each user Cloud ADL Users Organization Adversary Model oADL is assumed to be trusted by all users o Cloud is the only adversary oHonest but curious oObey our schemes, but still want to know some additional information o Our goal is to protect from the cloud oAccess pattern oSearch pattern oRank privacy: hiding the rank of each user query Ostrovsky Scheme (CRYPTO 2005) Alice F1 : A, B [1], [1], [0], [0], [0] Cloud Public dictionary: <A, B, C, D, E> Alice’s keywords: A, B F2 : B F3 : C Alice’s query is a string of 0s and 1s Encrypted using homomorphic encryption Let E() be encryption • E(x)*E(y) = E(x+y) • E(x)^y = E(x*y) Ostrovsky Scheme (CRYPTO 2005) F1 : A, B Cloud F2 : B F3 : C Alice’s query The magic is that unmatched file F3 is processed to 0 [1], [1], [0], [0], [0] * [2][1] [0] [2] ^F1 [1] ^F2 [0] ^F3 Alice’s Buffer [2,2* F1] [1, 1*F2] [0,0] Ostrovsky Scheme (CRYPTO 2005) Alice [2,2* F1] [1,1*F2], [0,0] Cloud Decrypts to obtain F2 directly F1 is obtained by dividing 2* F1 by 2 The buffer size only relates to the number of matched files Cloud Security oThe cloud may leak user privacy oSearchable encryption oWill not reveal what the users are searching for (search pattern) o Will reveals whether two users are interested in the same files (access pattern) F1: {A, B} {A, B} Alice F1 F2 {A, C} Bob F1 Cloud F2: {B} F3: {C} F3 Construction of EIRQ oStep 1. Each user runs the QueryGen algorithm to send keywords and query rank to the ADL Dictionary: <A, B, C, D> 0~2 ranks: Rank 0: 100% Rank 1: 50%, Rank 2: 0% File 1: { A, B} File 2: {B} File 3: {C} A, B, Rank 1 Alice Cloud B, C, Rank 1 Bob ADL