Vanish : Increasing Data Privacy with Self-Destructing Data

advertisement
Vanish: Increasing Data Privacy
with Self-Destructing Data
Roxana Geambasu, Tadayoshi Kohno, Amit Levy, et al.
University of Washington
USENIX Security Symposium, 2009
--Presented by Joseph Del Rocco
University of Central Florida
Outline
• Distributed Hash Tables (DHT)
• Data Destruction w/ Vanish
– Motivations
– Architecture
– Implementations
– Results
– Contributions / Weakness / Improvement
• References
2
Hash Tables (review) [5]
• Tag or data
hashed into
table index
• Hashing
functions:
MD5, SHA-1,
CRC32, FSB,
HAS, etc.++++
• Hash collisions
unavoidable, so
linked lists used
(see birthday paradox)
3
Distributed Hash Tables
• Hash Tables… split-up across machines
• Key-space astronomically large (2128, 2160)
2128 = 340282367000000000000000000000000000000
• In 2001, four main DHTs ignited research:
- Chord (MIT)
- CAN (Berkeley, AT&T)
- Pastry (Rice, Microsoft)
- Tapastry (Berkeley)
• Availability, Scale, Decentralized, Churn!
4
Viewed as a Distributed Hash Table [3]
0
Hash table
2128-1
Peer nodes
• Each peer node is responsible for a range of the
hash table, according to the peer hash key
• Location information about Objects are placed in the
peer with the closest key (information redundancy)
5
5
DHT in action: put() [3]
K V
K V
K V
K V
K V
Want to
share a
file
K V
K V
K V
K V
insert(K1,V1)
K V
K V
Operation: Route message, “I have the file,” to node holding key K1
6
6
DHT in action: get()
[3]
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
retrieve (K1)
Operation: Retrieve message V1 at node holding key K1
7
7
Chord [4]
• Keys, nodes have 2m hash (filename, IP)
• Each node stores ~(K / N) keys
• “finger table” = next((n + 2i − 1) % 2m)
8
CAN – Content Addressable Network [3]
• Each peer is
responsible for one
zone, i.e., stores all
(key, value) pairs of
the zone
• Each peer knows the
neighbors of its zone
• Random assignment
of peers to zones at
startup – split zone if
not empty
• Dimensional-ordered
multi-hop routing
9
9
CAN: Object Publishing [3]
x=a
node I::publish(K,V)
(1) a = hx(K)
b = hy(K)
I
y=b
10
10
CAN: Object Publishing [3]
node I::publish(K,V)
(1) a = hx(K)
b = hy(K)
I
J
(2) route (K,V) -> J
11
11
Modern, Popular P2P w/ DHTs
•
•
•
•
•
•
•
Vuze / Azureus (Java BitTorrent client)
BitTorrent DHT (Based on KAD)
IBM Websphere
Apache Cassandra
OpenDHT
Dendrobates Azureus
Mainline
(Blue Poison Dart Frog)
Kazaa, eDonkey, etc. (KAD)
12
Vuze (Azureus) Specifics
• Nodes in network assigned “random” 160bit ID hashed on IP & port (DHT idx range)
• Client sends “put” messages to 20 closest
nodes to hashed key index in DHT
• Nodes re-put() entries from local hash
tables every 30 minutes to combat churn
• Nodes supposedly remove key/value pairs
> 8 hours, if not re-put() by originator
• Originator node must re-put() to persist?
13
Vanish Motivations
• Data frequently cached/archived by email
providers, ISPs, network backup systems
• Often available after account termination
• Forensic examination of hard drives (raid)
• Laptops stolen, taken-in for repair
• High-profile political scandals
• Some argue the right and ability to destroy
data is as fundamental as privacy & liberty
14
Vanish Motivations
• Hushmail email encryption service offered
cleartext contents of encrypted messages
to the federal government
• Trusted 3rd party (Ephemerizer)
supposedly destroy data after timeout, but
this never caught on… trust issue?
• Subpoenas…
15
Vanish Goals
• Create a Vanishing Data Object (VDO)
• Becomes unreadable after a timeout,
regardless if one retroactively obtains a
pristine copy of VDO before expiration
• Accessible until timeout
• Leverage existing infrastructure
• NO required passwords, keys, special
security hardware…
16
Vanish Architecture
• Encrypt data D w/ random key K into C
• Use T.S.S.[6] to split C into N shares
• Pick random access key L, use cryptographically
secure PRNG (keyed by L) to derive N indices 17
Vanish Architecture
• Threshold of T.S.S. (threshold ratio),
determines how many of N shares are
needed to reconstruct K
• EX: N = 20, threshold = 10
So any 10 of the 20 shares can be used
• EX: N = 50, threshold = 50
Better have all shares…
• VDO = (L, C, N, threshold), sent / stored
18
Vanish Decapsulation
• Given VDO:
- extract access key L
- derive locations of shares of K
- get() # of shares required by threshold
- reconstruct K
- decrypt C to obtain D
• # of shares must be > threshold ratio
19
Benefits of Churn!
• Nodes continue to leave/re-enter network
• Supposedly 80% of IPs change in 7 days
• Nodes change IDs (locations in network)
as IP changes
• Also, hash tables per node purge
themselves after some time period
• So data is guaranteed to NOT last long at
its original node…
20
The Big Question… [1][7]
• How long are we talking w/ churn?
- Vuze
= unclear… (7h, 3h, 2h …)
- OpenDHT = (1 hour – 1 week)
- Kazaa
= ~“several minutes” (2.5)
• Refresh uses K, re-splits into new shares,
uses L to derive new indices & re-puts()
• “Naturally, refreshes require periodic
Internet connectivity.” [1]
21
Implementations
22
Implementations
23
Results
“two minor changes (<50 lines of code)”
24
Results
“[With single share VDO model] … ~5% of VDOs continue
to live long after 8 hours.” [1]
“…the cause for the latter effect demands more
investigation, we suspect that some of the single VDO
keys are stored by DHT peers running non-default
configurations.” [1]
“These observations suggest that the naive (one share)
approach [does not meet our goals] … thereby
motivating our need for redundancy.” [1]
25
Contributions
• Solution utilizes existing, popular,
researched technology - used since 2001
• Interesting idea utilizing DHT as general
temporary storage
• No required special security hardware, or
special operations on the part of the user
• Utilizes inherent half-life (churn) of nodes
in DHT – data definitely destroyed
26
Weaknesses
• Requirements:
- network connection (put, get, !destroy)
- complex analysis of DHT networks
- “refresher” hardware for reasonable life
• Clearly not for all data! (network flooding)
• Life of data is not mathematically
determinate or even guaranteed (depends
completely on churn)
• Assumes no hardcopies of data…
27
Improvement / Future Work
• Instead of refresher hardware, DHT
maintenance could refresh automatically
• Utilization of many P2Ps in parallel,
choose appropriate one based upon churn
• Analyze network w/ many data objects
over very long timeout periods
• Make sure VDO is well encrypted or
someone could easily hack the threshold
28
References
1
Geambasu, Roxana, et al. Vanish: Increasing Data Privacy with SelfDestructing Data, USENIX Security Symposium, 2009
2
Wiley, Brandon. Distributed Hash Tables, Part I,
http://www.linuxjournal.com/article/6797, Linux Journal, 2003
3
Hua, Kien. P2P Search, http://www.cs.ucf.edu/~kienhua/classes/,
COP5711 Parallel & Distributed Databases, University of Central Florida,
2008
4
http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
5
http://en.wikipedia.org/wiki/Hash_table
6
Shamir, A. How to share a secret, Commun. ACM, 1979
7
Stutzbach, D., Rejaie R. Characterizing Churn in P2P Networks,
Technical Report, University of Oregon, 2005
29
Download