ppt - Temple University

advertisement
IEEE INFOCOM 2012, March Orlando, USA
Efficient Information Retrieval for
Ranked Queries in Cost-Effective
Cloud Environments
Presenter: Qin Liu a,b
Joint work with
Chiu C. Tan b, Jie Wu b, and Guojun Wanga
Central South University, China
b Temple University, USA
a
2012-3-26
Introduction
Cloud Computing Model
o Cloud computing as a new commercial paradigm
enables users to outsource data to a cloud
oData is described by a set of keywords
oUsers retrieve files with a set of keywords
F1: { A, B}
A, B
F1
Cloud
F2
Bob
F2: {B, D}
F3: {C, D}
…
o Cloud will learn user’s search pattern and
access pattern
Private search (Ostrovsky et al, CRYPTO 2005)
oGiven a public dictionary that contains all
keywords, e.g., dictionary=<A,B,C,D>
F1: { A, B} F2: {B,D} F3: {C,D}
…
[1] [1] [0] [0]
key trick: map
unmatched files to 0
[1] [1] [0] [0]
Bob
F1 F2
0 NA
A compressed
version of all files
Homomorphic encryption
E(x)*E(y) = E(x+y)
E(x)^y = E(x*y)
Cloud
F1
F1
E(0)*E(0)=E(0+0)=E(0)
E(0)^F3=E(0*F3)=E(0)
F1 F2
0 NA
F2
F3
NA
F2
E(F2)* E(0) =E(F2)
0
survival collision survival unmatched
Problem: Cost Grows Linearly
o Processing each query is expensive. Given n users,
the cloud needs to execute n queries
o Performance bottleneck
oCloud will return all matched
files, even if a user is
interested in smaller
percentage
o Waste bandwidth
Our Solutions: EIRQ Scheme
Efficient Information retrieval for Ranked Query
o A proxy server (ADL) is introduced between the
users and the cloud (trusted)
o Aggregate user queries
o Distribute searching results
o Support ranked query
Cloud
…
ADL
Rank queries
o Queries are classified into ranks
o ADL constructs a mask matrix
o Cloud filters a certain percentage of matched files
Rank-0 query: 100%
Rank-1 query: 50%
{A, B} Rank 0
Alice
F1
Bob
…
Mask matrix
F2
{A, C} Rank 1
F1
F1: { A, B} F2: {B, D} F3: {C, D}
F3
F1
F2
F3
Cloud
ADL
F3 is filtered with 50%
Challenges: the cloud
oCannot know which files are filtered/returned
oCannot know each queries’ rank
Scheme Description
Intuition of EIRQ
o Key techniques:
oConstruct a mask matrix to protect query ranks
oFilter files without knowing which files are
filtered
User
Step 1:
QueryGen
Keywords,
rank
Step 2:
ADL
Matrix
Construct
Cloud
Mask
matrix
Step 3:
Step 4:
File
Recovery
Certain percentage of files
matching user keywords
Buffer
FileFilter
Goal
o Queries are classified into 0,1,…,r-1 ranks.
o Rank-i query retrieves (1-i/r) percentage of
matched files
…
Files that match
rank 0 queries
Will not be filtered
…
…
Files that match
rank 1 queries
Filtered with
probability 1/r
…
Files that match
rank i queries
Filtered with
probability i/r
The cloud
oCannot know which files are filtered/returned
oCannot know each queries’ rank
Construct Mask Matrix
oADL constructs a mask matrix that is encrypted
with its publics key, and sends it to the cloud
{A, B} Rank 0
Alice
{A, C} Rank 1
ADL
Bob
A
[1]
[1]
B
[1]
[1]
C
[1]
[0]
D
[0]
[0]
…
…
[0]
[0]
Cloud
Number of
keywords
Number of ranks, r=2
For a keyword:
Number of 1s is determined by the rank of query it appears: r-i
High rank takes over
Ratio of 1s to r determines the probability of a file containing it to be
returned: (r-i)/r
High ratio takes over
Filter Files
The cloud chooses a random
column for each file
For F3: 50%
F1: { A, B} F2: {B, D}
F3: {C, D}
…
50%
E(0)*E(0)=E(0) E(0)*E(0)=E(0)
A
[1]
[1]
E(0)^F3 =E(0) E(1)^ F3 =E(F3)
B
[1]
[1]
C
[1]
[0]
D
[0]
[0]
…
…
[0]
[0]
A file, matched rank i query,
the probability to be filtered i/r
buffer
ADL
F1 and F2 will be returned
F3 will be filtered with 50%
Cloud
…
Evaluation
Setup
o
Our simulations are conducted with MATLAB
R2010a, running on a local machine with an Intel
Core 2 Duo E8400 3.0 GHz CPU and 8 GB RAM.
We summarize the parameters in Table.
Percentage of Returned Files
o Queries are classified into 0 to 3 ranks
o Rank-0: 100%
o Rank-1: 75%
o Rank-2: 50%
o Rank-3: 25%
o Our results:
o Rank-0: 100%
o Rank-1: 75%
o Rank-2: 52%
o Rank-3: 29%
Computation Cost
o ADL: 14.8270s-14.8788s
o EIRQ:14.8664s-14. 9269s
Communication Cost
Communication cost
o EIRQ works better when only a few users
o 5 users in each rank, 4 common keywords
o EIRQ : 439KB buffer
o ADL: 834KB buffer
Conclusion
1
An ADL is
introduced to avoid
performance
bottleneck of the
cloud
2
EIRQ scheme allows
the queries with
higher rank to
retrieve higher
percentage of
matched files
3
Our solution
protects access
pattern, search
pattern, and rank
privacy from the
cloud
Thank you!
Background
System Model
Adversary Model
Ostrovsky Scheme
System
model
oUsers in the organization send queries to
ADL
oADL will aggregate user queries and query cloud
with a combined query
o Cloud will return the files matching the
combined query to ADL
oADL distributes results to each user
Cloud
ADL
Users
Organization
Adversary Model
oADL is assumed to be trusted by all users
o Cloud is the only adversary
oHonest but curious
oObey our schemes, but still want to know some
additional information
o Our goal is to protect from the cloud
oAccess pattern
oSearch pattern
oRank privacy: hiding the rank of each user query
Ostrovsky Scheme (CRYPTO 2005)
Alice
F1 : A, B
[1], [1], [0], [0], [0]
Cloud
Public dictionary: <A, B, C, D, E>
Alice’s keywords: A, B
F2 : B
F3 : C
Alice’s query is a string of 0s and 1s
Encrypted using homomorphic encryption
Let E() be encryption
• E(x)*E(y) = E(x+y)
• E(x)^y = E(x*y)
Ostrovsky Scheme (CRYPTO 2005)
F1 : A, B
Cloud
F2 : B
F3 : C
Alice’s query
The magic is that
unmatched file F3 is
processed to 0
[1], [1], [0], [0], [0]
*
[2][1] [0]
[2] ^F1 [1] ^F2 [0] ^F3
Alice’s Buffer
[2,2* F1] [1, 1*F2] [0,0]
Ostrovsky Scheme (CRYPTO 2005)
Alice
[2,2* F1] [1,1*F2], [0,0]
Cloud
Decrypts to obtain
F2 directly
F1 is obtained by
dividing 2* F1 by 2
The buffer size only relates to the
number of matched files
Cloud
Security
oThe cloud may leak user privacy
oSearchable encryption
oWill not reveal what the users are searching
for (search pattern)
o Will reveals whether two users are interested
in the same files (access pattern)
F1: {A, B}
{A, B}
Alice
F1
F2
{A, C}
Bob
F1
Cloud
F2: {B}
F3: {C}
F3
Construction of EIRQ
oStep 1. Each user runs the QueryGen algorithm
to send keywords and query rank to the ADL
Dictionary: <A, B, C, D>
0~2 ranks: Rank 0: 100%
Rank 1: 50%, Rank 2: 0%
File 1: { A, B}
File 2: {B}
File 3: {C}
A, B, Rank 1
Alice
Cloud
B, C, Rank 1
Bob
ADL
Download