22nd International Conference on Selected Areas in Cryptography Dynamic Symmetric Searchable Encryption with Minimal Leakage and Efficient Updates on Commodity Hardware Attila A. Yavuz Oregon State University attila.yavuz@oregonstate.edu SAC 2015 Jorge Guajardo Robert Bosch LLC – RTC, USA Jorge.GuajardoMerchan@us.bosch.com Dr. Attila Altay Yavuz August 13, 2015 1 Challenge: Privacy versus Data Utilization Dilemma Sensitive information Client (encrypted) Outsource the data Standard Encryption SEARCH? ANALYZE? CAN’T SEARCH! CAN’T ANALYZE! IMPACT 2 Storage on the cloud Searchable Encryption (Generic Framework) Client f1 Cloud fn . . . c1 Extract keywords . . . w1 wn t1 . . . cn Data Structure . . . tn Trapdoors t1 . . . Searchable Representation tn Search keyword: w 1 t1 t1 f1 Update file: fi (zi,V) 3 (zi,V) c1 Prior Work on Searchable Encryption (Milestones) Curtmola et al. (CCS 2006) Single linked list Variants of CCS 2006 with various properties: (+) Updates: New files can be added/removed (-) Update leaks information (insecure updates) Kamara et. al. (FC 2013) Red-black trees 5 Ranked, multi-keyword, wildcard, … (-) No update and inefficient Kamara et. al. (CCS 2012) Multi-linked list (+) Efficient encrypted searches (-) No update on files (addition/removal not possible) (+) Secure updates (-) Searchable words are fixed (cannot add a new keyword later) (-) Extremely large cloud storage (multi TBs, impractical) Prior Work on Searchable Encryption (Milestones) Stefanov et al. (NDSS 2014) Multi-arrays + Oblivious sort Cash et al. (NDSS 2014) Generic dictionaries (+) Update efficient, secure updates, leaks less than above (-) Client storage, slower search Naveed et al. (S&P 2015) Blind storage 6 (+) Conjunctive, boolean queries, balanced and efficient search/update, (+) Tests on very large scale DBMS (-) Database grows linearly with update, client permanent storage, leaks more than Stefanov et al. (NDSS 2014) Hann et al. (CCS 2014) (+) Higher security, efficient searches (-) Larger client storage and transmission, high server storage (+) High security, search/update efficiency (-) Single keyword only, interactive (e.g., network delays), cannot update file content, add/remove them only Contribution: A New Dynamic Symmetric SE Scheme (+) The highest privacy among all compared alternatives (+) Simple design (+) Low update communication overhead, one round only (+) Low server storage 1 bits - per keyword/file pair No growth with updates, no revocation lists… (+) Dynamic keywords, parallelism (-) Linear search w.r.t # of files, O(m/b)/p (-) O(n+m) client storage due to hash tables (e.g., n=m=10^7, ~160 MB) Can store/fetch from cloud Monster Inc 2. game on Iphone ~ 200 MB… 7 (+) Efficient practicality on commodity hardware Our Scheme: Searchable Representation Searchable Representation: Binary matrix I Row i, {1,…,m} keyword wi, column j, {1,…,n} file fj If I[i,j]=1 then keyword wi appears in file fj, otherwise not Files Keywords w1 w2 . . . wm 8 f1 f2 (i,j) 1 2 1 1 0 2 1 . . . . . fn . . n 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 . 0 0 0 1 0 1 . 0 0 1 1 0 0 m 0 0 0 0 0 1 Integrates index and inverted index, simple yet efficient Search via row operations inverted index Update via column operations index Our Scheme: Map keyword/file to the matrix Keyword w {1,…, m} and file f {1, … , n} : Dynamic and efficient Map a keyword to a row i: t x MACk1 (wx ) , 160 bit number {1,..., m 106 } Open address hash tables: i TW (t x ) Collision-free (one-to-one), O(1) access Map a file to column j: z f MACk1 ( f id ) and j TF(z f id ) TF (i,j) 1, z100 1 2,z250 2 ... 128,zl ... 128 … 257,zr … n,z6 ... 256 … n TW 9 1,t55 1 0 0 ... 1 ... 0 ... 1 2, t300 2 0 0 ... 0 ... 1 ... 0 . . 1 0 ... 0 ... 0 ... 1 m, t2 m 1 0 ... 0 ... 1 ... 1 Our Scheme: Encrypt Searchable Representation (basics) Derive row key ri KDFk2 (i || pad ), pad is rand. Encrypt each row i with ri (b=1, or AES b=128 CTR mode) (i,j) . . . 10 ... 128 ... 256 0 ... 1 ... 0 ... 1 . 1 ... 0 ... 0 ... 0 . 0 ... 1 ... 0 ... 1 1 ... 0 ... 1 ... 1 I '[1,*] Er1 ( I [1,*], st ) 1 r1 rm 1 I '[m,*] Erm ( I [m,*], st ) m Achieving Dynamic Keywords: Static schemes: Derived keys from keywords ... ri KDFk (wi ) Break static relation between keys and keywords ri KDFk2 (i || pad ), link ri to a w via TW n Our Scheme: Search on Encrypted Representation (only basics) Cloud Client (k1 , k2 , k3 , k4 , TW , TF) ( I ' , TW , TF) Search keyword w on I’ : 1. tw MACk1 ( w), 2. i TW(tw ), 3. ri KDFk2 (i || pad ) (i, ri ) Decrypt i’th row of I’[i,*] with ri I[i,*] I’ 1 ... 128 ... n 1 0 ... 1 ... 1 . 1 ... 0 ... 0 I [i,*] Dri ( I '[i,*], st ) i 0 ... 1 ... 1 1 ... 0 ... 1 m I[i,j]=1 then ciphertext cj contains tw Decrypt with k4 Get f1,f55,…,fn 11 I 1 .. 55 .. 253 254 .. n i 1 0 1 0 1 0 1 c1 c55 c253 0 cn Our Scheme: Update on Encrypted Representation (b=1) Cloud z MACk1 ( f ) Client Add a new file f to I’ : ( I ' , TW , TF) j TF ( z ) f w1 , w2 , , wl Replace new column with j’th column of I’ MACk1 ( . ) t1 t2 ... tl TW (.) a1 a2 ... al r1 KDFk2 (1 || pad ) E(.) rm KDFk 2 (m || pad ) 12 a1 a2 al I’ 1 ... j ... n 1 0 ... 1 ... 1 0 0 … … . 1 ... 0 ... 0 1 1 . 0 ... 1 ... 1 1 1 m 1 ... 0 ... 1 0 0 1 1 … … 0 0 Our Scheme: Update on Encrypted Representation (b=128) Cloud z MACk1 ( f ) Client Add a new file f to I’ : ( I ' , TW , TF) j TF ( z ) f w1 , w2 , , wl Overrides on b-1 regions! Inconsistency MACk1 ( . ) t1 t2 ... tl TW (.) a1 a2 ... al r1 KDFk2 (1 || pad ) E(.) rm KDFk 2 (m || pad ) 13 a1 a2 al ? 0 ? ? … … … … … … 1 ? ? 1 ? ? 1 ? ? 1 ? ? 0 ? ? 0 ? ? 1 ? ? 1 ? ? … … … … … … 0 ? 0 ? 1 ... j ... n 1 0 ... 1 ... 1 . 1 ... 0 ... 0 . 0 ... 1 ... 1 m 1 ... 0 ... 1 ? 0 ? I’ ? b=128 Our Scheme: Update on Encrypted Representation (b=128) Cloud z MACk1 ( f ) Client Add a new file f to I’ : ( I ' , TW , TF) j TF ( z ) f w1 , w2 , , wl One round of interaction and key renewal MACk1 ( . ) t1 t2 ... tl I’ TW (.) a1 a2 ... al r1 KDFk2 (1 || pad ) 1) … D(B_j) Renew keys a1 a2 rm KDFk 2 (m || pad ) 1 al ... 0 j ... n … 1 0 ... 1 1 ... 1 . 1 ... 1 0 ... 0 1 ... 1 0 ... 1 0 . 0 ... 1 1 0 2) E(B_j’) 14 0 1 m 1 ... … 0 1 … 0 b=128 Search-Update Coordination for High Privacy F_j, Update 100 Various regions, various distinct keys! F_n, Update 1000 I 1 ... j ... n 1 0 K_1 K_3 ... K_5 . 1 ... 0 K_x 0 K_2 K_3 1) # of search on row i 2) # update on column j w=“email”, searched 100 3) Sequence of operations w=“EU-CMA”, searched 1 Update Search Search Exposed Re-encrypt . 0 ... 1 ... 1 m 1 K 0 ... 1 gc Update Update No expose Re-encrypt Search Update Key update encrypt 15 K_4 TW[i].st TF[j].st, state bit ri KDFk2 (i || st ) Security Analysis of Our DSSE (Very Brief) Confidentiality focus (integrity/auth can be added) Access Pattern: File identifiers that satisfy a search query (search results) Search Pattern: History of searches (whether a search token used at past) 16 IND-CKA2 (Adaptive Chosen Keyword Attacks): Given {I’, c0,..,cn, z0, …,zn, t0,…,tm}, no adversary can learn any information about f0,…,fn and w0,…,wm other than the access and search pattern, even if queries are adaptive. Leakage Functions are critical for updates Theorem 1: Our DSSE scheme (L1,L2)-secure in ROM based on INDCKA2, where L1 and L2 leak access and search pattern, respectively. Real and simulated views are indistinguishable due to PRF and IND-CPA cipher. High-Level Comparison /3 17 Implementation Details of Our DSSE 18 C/C++ Own Lines of code : 10528 Tomcrypt API Symmetric Key Encryption: AES-CTR 128-bit MAC: CMAC-128 Key Derivation Function : CMAC-128 File encryption : CCM (Counter with CBC-MAC) Intel AESNI sample library For AES implementation using assembly language instructions. As KDF, we further exploit AES-ASM by using CMAC. Hash tables, Google open source static C++ data structure Implementation ( Benchmarking Results ) Operation Avg time (msec) Avg time (msec) Avg time (msec) #keyword : 1,000,000 #file : 5,000 #keyword : 200,000 #file : 50,000 #keyword : 2,000 #file : 2,000,000 Build Index 822.6 493 461 Search Keyword 0.01 0.27 10.02 Add File 2772 472 8.83 Delete File 2362 329 8.77 Enron email dataset, Ubuntu 13.10 OS, 4 GB RAM, Intel i5 processor, 256 GB harddisk All operations are practical Search under a msec, and only 10 msec for 2 millions of files Update various 8 msec to 2 sec 19 Conclusion A new DSSE with various desirable properties (+) The highest level of privacy (+) Simple yet efficient, compact updates and storage (+) Keyword updates, parallelism, extendable to multiple keyword queries (-) Asymptotically linear search and client storage But still quite practical on commodity hardware TAKEAWAYS: Simplicity wins! 20 Asymptotic results are not enough to assess the practicality (actual implementation, details, hidden constants) Practical storage at the client is NOT evil (actually beneficial) Thank You! 21