Fuzzy Keyword Search over Encrypted Data in Cloud Computing 作者:Jin Li, Qian Wang, Cong Wang, Ning Cao, Kui Ren, and Wenjing Lou 出處:IEEE Transactions on Knowledge and Data Engineering(2011) 日期:2012/05/15 報告人:葉瑞群 Outline 1. INTRODUCTION 2. RELATED WORK 3. PROBLEM FORMULATION 4. THE STRAIGHTFORWARD APPROACH 5. CONSTRUCTIONS OF EFFECTIVE FUZZY KEYWORD SEARCH IN CLOUD 6. CONCLUSION 2 1. INTRODUCTION(1/2) As Cloud Computing becomes prevalent, more and more sensitive information are being centralized into the cloud. Although traditional searchable encryption schemes allow a user to securely search over encrypted data through keywords and selectively retrieve files of interest,these techniques support only exact keyword search. 3 1. INTRODUCTION(2/2) On the other hand, are typical user searching behavior and happen very frequently. This significant drawback makes existing techniques unsuitable in Cloud Computing as it greatly affects system usability, rendering user searching experiences very frustrating and system efficacy very low. 4 2. RELATED WORK(1/2) Enabling fuzzy keyword search service that aims at accommodating various typos and representation inconsistencies in different user searching inputs is of crucial importance for the high system usability and overall user search experience. 5 2. RELATED WORK(2/2) 6 3. PROBLEM FORMULATION(1/4) We consider a cloud data system consisting of data owner, data user and cloud server. Given a collection of n encrypted data files C = (F1, F2, . . . , FN) stored in the cloud server, a predefined set of distinct keywords W = {w1, w2, ...,wp}, the cloud server provides the search service for the authorized users over the encrypted data C. 7 3. PROBLEM FORMULATION(2/4) Edit Distance: The edit distance ed(w1, w2) between two words w1 and w2 is the number of operations required to transform one of them into the other. 1) Substitution:changing one character to another in a word; 2) Deletion:deleting one character from a word; 3) Insertion:inserting a single character into a word. 8 3. PROBLEM FORMULATION(3/4) Edit Distance Example: AAACAGC→AAATTGAGTC (distance=4) AAACAGC AAAGAGTC AAATGAGTC AAATTGAGTC 9 3. PROBLEM FORMULATION(4/4) Fuzzy Keyword Search 1.C=(F1,F2,…,Fn) 2.W={W1,W2,…,Wn} 3.Edit distance d 4.A searching input (w, k) (k<=d) Π=(Setup(1λ), Enc(sk, ·), Dec(sk, ·)) Twi = f(sk,wi) 10 4. THE STRAIGHTFORWARD APPROACH(1/2) Data Owner FIDwi Enc(sk,FIDwi||wi) {({Tw’i}w’i ∈Swi,d,Enc(sk,FIDwi||wi))}wi∈W Server Tw = f(sk,wi) 、w Users Enc(sk,FIDwi||wi) 11 4. THE STRAIGHTFORWARD APPROACH(2/2) Assume d=1 d=2 2k*26 2k2*26 For example, assume there are 104 keywords in the file collection with average keyword length 10, d = 2,and the output length of hash function is 160 bits, then, the resulted storage cost for the index will be 30GB. Therefore, it brings forth the demand for fuzzy keyword sets with smaller size. 12 5. CONSTRUCTIONS OF EFFECTIVE FUZZY KEYWORD SEARCH IN CLOUD(1/3) For example,for the keyword CASTLE with the pre-set edit distance 1, SCASTLE,1 = {CASTLE, *CASTLE, *ASTLE, C*ASTLE,C*STLE, · · · , CASTL*E, CASTL*, CASTLE*}. 13 × 26 + 1 = (2 L+ 1) × 26 + 1 The wildcard-based fuzzy set of wi with edit distance d is denoted as Swi,d={Swi,0, Swi,1, · · · , Swi,d}. d=1 (2L+1)*26+1 d=2 C1L+1+C1L*C1L+2C2L+2 13 5. CONSTRUCTIONS OF EFFECTIVE FUZZY KEYWORD SEARCH IN CLOUD(2/3) Reduce the storage of the index from 30GB to approximately 40MB. Π=(Setup(1λ), Enc(sk, ·), Dec(sk, ·)) Twi = f(sk,wi) 14 Tw’i= f(sk,w’i) for each w’i∈ Swi,d 5. CONSTRUCTIONS OF EFFECTIVE FUZZY KEYWORD SEARCH IN CLOUD(3/3) Data Owner FIDwi Enc(sk,FIDwi||wi) {({Tw’i}w’i ∈Swi,d,Enc(sk,FIDwi||wi))}wi∈W Server {Tw’}w’ ∈ Sw,k Users Enc(sk,FIDwi||wi) 15 6. CONCLUSION(1/1) We design an advanced technique (i.e., wildcard-based technique) to construct the storage-efficient fuzzy keyword sets by exploiting a significant observation on the similarity metric of edit distance. Based on the constructed fuzzy keyword sets, we further propose an efficient fuzzy keyword search scheme. Through rigorous security analysis, we show that our proposed solution is secure and privacy-preserving, while correctly realizing the goal of fuzzy keyword search. 16 END 17