0 - Oregon State University

advertisement
22nd International Conference on Selected Areas in Cryptography
Dynamic Symmetric Searchable Encryption with Minimal
Leakage and Efficient Updates on Commodity Hardware
Attila A. Yavuz
Oregon State University
attila.yavuz@oregonstate.edu
SAC 2015
Jorge Guajardo
Robert Bosch LLC – RTC, USA
Jorge.GuajardoMerchan@us.bosch.com
Dr. Attila Altay Yavuz
August 13, 2015
1
Challenge: Privacy versus Data Utilization Dilemma
Sensitive information
Client
(encrypted)
Outsource the data
Standard Encryption
SEARCH?
ANALYZE?
CAN’T SEARCH!
CAN’T ANALYZE!
IMPACT
2
Storage on the cloud
Searchable Encryption (Generic Framework)
Client
f1
Cloud
fn
. . .
c1
Extract keywords
. . .
w1
wn
t1
. . .
cn
Data
Structure
. . .
tn
Trapdoors
t1
. . .
Searchable
Representation
tn
Search keyword: w 1  t1
t1
f1
Update file: fi  (zi,V)
3
(zi,V)
c1
Prior Work on Searchable Encryption (Milestones)

Curtmola et al. (CCS 2006)  Single linked list



Variants of CCS 2006 with various properties:




(+) Updates: New files can be added/removed
(-) Update leaks information (insecure updates)
Kamara et. al. (FC 2013)  Red-black trees



5
Ranked, multi-keyword, wildcard, …
(-) No update and inefficient
Kamara et. al. (CCS 2012)  Multi-linked list


(+) Efficient encrypted searches
(-) No update on files (addition/removal not possible)
(+) Secure updates
(-) Searchable words are fixed (cannot add a new keyword later)
(-) Extremely large cloud storage (multi TBs, impractical)
Prior Work on Searchable Encryption (Milestones)

Stefanov et al. (NDSS 2014)  Multi-arrays + Oblivious sort



Cash et al. (NDSS 2014)  Generic dictionaries





(+) Update efficient, secure updates, leaks less than above
(-) Client storage, slower search
Naveed et al. (S&P 2015)  Blind storage


6
(+) Conjunctive, boolean queries, balanced and efficient search/update,
(+) Tests on very large scale DBMS
(-) Database grows linearly with update, client permanent storage, leaks more than
Stefanov et al. (NDSS 2014)
Hann et al. (CCS 2014)


(+) Higher security, efficient searches
(-) Larger client storage and transmission, high server storage
(+) High security, search/update efficiency
(-) Single keyword only, interactive (e.g., network delays), cannot update file content,
add/remove them only
Contribution: A New Dynamic Symmetric SE Scheme

(+) The highest privacy among all compared alternatives

(+) Simple design

(+) Low update communication overhead, one round only

(+) Low server storage  1 bits - per keyword/file pair

No growth with updates, no revocation lists…

(+) Dynamic keywords, parallelism

(-) Linear search w.r.t # of files, O(m/b)/p

(-) O(n+m) client storage due to hash tables  (e.g., n=m=10^7, ~160 MB)
 Can store/fetch from cloud
 Monster Inc 2. game on Iphone ~ 200 MB…

7
(+) Efficient  practicality on commodity hardware
Our Scheme: Searchable Representation

Searchable Representation: Binary matrix I
 Row i, {1,…,m}  keyword wi, column j, {1,…,n}  file fj
 If I[i,j]=1 then keyword wi appears in file fj, otherwise not
Files
Keywords
w1
w2
.
.
.
wm

8
f1
f2
(i,j)
1
2
1
1
0
2
1
.
.
.
.
.
fn
.
.
n
1
0
0
0
0
0
0
0
1
0
0
1
0
0
0
.
0
0
0
1
0
1
.
0
0
1
1
0
0
m
0
0
0
0
0
1
Integrates index and inverted index, simple yet efficient
 Search via row operations  inverted index
 Update via column operations  index
Our Scheme: Map keyword/file to the matrix

Keyword w  {1,…, m} and file f  {1, … , n} : Dynamic and efficient

Map a keyword to a row i: t x  MACk1 (wx ) , 160 bit number  {1,..., m  106 }


Open address hash tables: i  TW (t x )
 Collision-free (one-to-one), O(1) access
Map a file to column j: z f  MACk1 ( f id ) and j  TF(z f id )
TF
(i,j)
1, z100
1
2,z250
2
...
128,zl
...
128
…
257,zr
…
n,z6
...
256
…
n
TW
9
1,t55
1
0
0
...
1
...
0
...
1
2, t300
2
0
0
...
0
...
1
...
0
.
.
1
0
...
0
...
0
...
1
m, t2
m
1
0
...
0
...
1
...
1
Our Scheme: Encrypt Searchable Representation (basics)

Derive row key ri  KDFk2 (i || pad ), pad is rand.

Encrypt each row i with ri (b=1, or AES b=128 CTR mode)
(i,j)
.
.
.


10
...
128
...
256
0
...
1
...
0
...
1
.
1
...
0
...
0
...
0
.
0
...
1
...
0
...
1
1
...
0
...
1
...
1
I '[1,*]  Er1 ( I [1,*], st ) 1
r1
rm
1
I '[m,*]  Erm ( I [m,*], st ) m
Achieving Dynamic Keywords:
 Static schemes: Derived keys from keywords
...
ri  KDFk (wi )
Break static relation between keys and keywords
ri  KDFk2 (i || pad ), link ri to a w via TW
n
Our Scheme: Search on Encrypted Representation (only basics)
Cloud
Client (k1 , k2 , k3 , k4 , TW , TF)
( I ' , TW , TF)
Search keyword w on I’ :
1. tw  MACk1 ( w),
2. i  TW(tw ),
3. ri  KDFk2 (i || pad )
(i, ri )
Decrypt i’th row of I’[i,*] with ri  I[i,*]
I’
1
...
128
...
n
1
0
...
1
...
1
.
1
...
0
...
0
I [i,*]  Dri ( I '[i,*], st ) i
0
...
1
...
1
1
...
0
...
1
m
I[i,j]=1 then ciphertext cj contains tw
Decrypt with k4
Get f1,f55,…,fn
11
I
1
..
55
..
253
254 ..
n
i
1
0
1
0
1
0
1
c1
c55
c253
0
cn
Our Scheme: Update on Encrypted Representation (b=1)
Cloud
z  MACk1 ( f )
Client
Add a new file f to I’ :
( I ' , TW , TF)
j  TF ( z )
f  w1 , w2 ,  , wl
Replace new column
with j’th column of I’
MACk1 ( . )
t1
t2
... tl
TW (.)
a1 a2 ... al
r1  KDFk2 (1 || pad )


E(.)




rm  KDFk 2 (m || pad )
12
a1
a2
al
I’
1
...
j
...
n
1
0
...
1
...
1
0
0
…
…
.
1
...
0
...
0
1
1
.
0
...
1
...
1
1
1
m
1
...
0
...
1
0
0
1
1
…
…
0
0
Our Scheme: Update on Encrypted Representation (b=128)
Cloud
z  MACk1 ( f )
Client
Add a new file f to I’ :
( I ' , TW , TF)
j  TF ( z )
f  w1 , w2 ,  , wl
Overrides on b-1 regions!  Inconsistency
MACk1 ( . )
t1
t2
... tl
TW (.)
a1 a2 ... al
r1  KDFk2 (1 || pad )


E(.)




rm  KDFk 2 (m || pad )
13
a1
a2
al
?
0
?
?
…
… … … … …
1
?
?
1
?
?
1
?
?
1
?
?
0
?
?
0
?
?
1
?
?
1
?
?
…
… … … … …
0
?
0
?
1
...
j
...
n
1
0
...
1
...
1
.
1
...
0
...
0
.
0
...
1
...
1
m 1
...
0
...
1
?
0
?
I’
?
b=128
Our Scheme: Update on Encrypted Representation (b=128)
Cloud
z  MACk1 ( f )
Client
Add a new file f to I’ :
( I ' , TW , TF)
j  TF ( z )
f  w1 , w2 ,  , wl
One round of interaction and key renewal
MACk1 ( . )
t1
t2
... tl
I’
TW (.)
a1 a2 ... al
r1  KDFk2 (1 || pad )




1)
…
D(B_j)
Renew keys
a1
a2
rm  KDFk 2 (m || pad )
1
al
... 0
j
...
n
…
1
0
... 1
1
...
1
.
1
...
1
0
...
0
1
...
1
0
...
1
0
.
0
... 1
1
0
2) E(B_j’)


14
0
1
m 1
...
…
0
1
…
0
b=128
Search-Update Coordination for High Privacy
F_j, Update 100
Various regions, various distinct keys!
F_n, Update 1000
I
1
...
j
...
n
1
0
K_1
K_3
...
K_5
.
1
...
0
K_x
0
K_2
K_3
1) # of search on row i
2) # update on column j
w=“email”,
searched 100
3) Sequence of operations
w=“EU-CMA”,
searched 1
Update
Search
Search
Exposed
Re-encrypt
.
0
...
1
...
1
m
1
K
0
...
1
gc
Update
Update
No expose
Re-encrypt
Search
Update
Key update
encrypt
15
K_4
TW[i].st
TF[j].st, state bit
ri  KDFk2 (i || st )
Security Analysis of Our DSSE (Very Brief)

Confidentiality focus (integrity/auth can be added)

Access Pattern: File identifiers that satisfy a search query (search results)

Search Pattern: History of searches (whether a search token used at past)




16
IND-CKA2 (Adaptive Chosen Keyword Attacks): Given {I’, c0,..,cn, z0, …,zn,
t0,…,tm}, no adversary can learn any information about f0,…,fn and w0,…,wm
other than the access and search pattern, even if queries are adaptive.
Leakage Functions are critical for updates
Theorem 1: Our DSSE scheme (L1,L2)-secure in ROM based on INDCKA2, where L1 and L2 leak access and search pattern, respectively.
Real and simulated views are indistinguishable due to PRF and IND-CPA
cipher.
High-Level Comparison
/3
17
Implementation Details of Our DSSE
18

C/C++

Own Lines of code : 10528

Tomcrypt API
 Symmetric Key Encryption: AES-CTR 128-bit
 MAC: CMAC-128
 Key Derivation Function : CMAC-128
 File encryption : CCM (Counter with CBC-MAC)

Intel AESNI sample library
 For AES implementation using assembly language instructions.
 As KDF, we further exploit AES-ASM by using CMAC.

Hash tables, Google open source static C++ data structure
Implementation ( Benchmarking Results )
Operation
Avg time (msec)
Avg time (msec)
Avg time (msec)
#keyword : 1,000,000
#file : 5,000
#keyword : 200,000
#file : 50,000
#keyword : 2,000
#file : 2,000,000
Build Index
822.6
493
461
Search
Keyword
0.01
0.27
10.02
Add File
2772
472
8.83
Delete File
2362
329
8.77
Enron email dataset, Ubuntu 13.10 OS, 4 GB RAM, Intel i5 processor, 256 GB harddisk

All operations are practical

Search under a msec, and only 10 msec for 2 millions of files

Update various 8 msec to 2 sec
19
Conclusion

A new DSSE with various desirable properties

(+) The highest level of privacy
(+) Simple yet efficient, compact updates and storage
(+) Keyword updates, parallelism, extendable to multiple keyword queries



(-) Asymptotically linear search and client storage
 But still quite practical on commodity hardware

TAKEAWAYS:
 Simplicity wins!


20
Asymptotic results are not enough to assess the practicality (actual
implementation, details, hidden constants)
Practical storage at the client is NOT evil (actually beneficial)
Thank You!
21
Download