Vanish: Increasing Data Privacy with Self-Destructing Data Roxana Geambasu, Tadayoshi Kohno, Amit Levy, et al. University of Washington USENIX Security Symposium, 2009 --Presented by Joseph Del Rocco University of Central Florida Outline • Distributed Hash Tables (DHT) • Data Destruction w/ Vanish – Motivations – Architecture – Implementations – Results – Contributions / Weakness / Improvement • References 2 Hash Tables (review) [5] • Tag or data hashed into table index • Hashing functions: MD5, SHA-1, CRC32, FSB, HAS, etc.++++ • Hash collisions unavoidable, so linked lists used (see birthday paradox) 3 Distributed Hash Tables • Hash Tables… split-up across machines • Key-space astronomically large (2128, 2160) 2128 = 340282367000000000000000000000000000000 • In 2001, four main DHTs ignited research: - Chord (MIT) - CAN (Berkeley, AT&T) - Pastry (Rice, Microsoft) - Tapastry (Berkeley) • Availability, Scale, Decentralized, Churn! 4 Viewed as a Distributed Hash Table [3] 0 Hash table 2128-1 Peer nodes • Each peer node is responsible for a range of the hash table, according to the peer hash key • Location information about Objects are placed in the peer with the closest key (information redundancy) 5 5 DHT in action: put() [3] K V K V K V K V K V Want to share a file K V K V K V K V insert(K1,V1) K V K V Operation: Route message, “I have the file,” to node holding key K1 6 6 DHT in action: get() [3] K V K V K V K V K V K V K V K V K V K V K V retrieve (K1) Operation: Retrieve message V1 at node holding key K1 7 7 Chord [4] • Keys, nodes have 2m hash (filename, IP) • Each node stores ~(K / N) keys • “finger table” = next((n + 2i − 1) % 2m) 8 CAN – Content Addressable Network [3] • Each peer is responsible for one zone, i.e., stores all (key, value) pairs of the zone • Each peer knows the neighbors of its zone • Random assignment of peers to zones at startup – split zone if not empty • Dimensional-ordered multi-hop routing 9 9 CAN: Object Publishing [3] x=a node I::publish(K,V) (1) a = hx(K) b = hy(K) I y=b 10 10 CAN: Object Publishing [3] node I::publish(K,V) (1) a = hx(K) b = hy(K) I J (2) route (K,V) -> J 11 11 Modern, Popular P2P w/ DHTs • • • • • • • Vuze / Azureus (Java BitTorrent client) BitTorrent DHT (Based on KAD) IBM Websphere Apache Cassandra OpenDHT Dendrobates Azureus Mainline (Blue Poison Dart Frog) Kazaa, eDonkey, etc. (KAD) 12 Vuze (Azureus) Specifics • Nodes in network assigned “random” 160bit ID hashed on IP & port (DHT idx range) • Client sends “put” messages to 20 closest nodes to hashed key index in DHT • Nodes re-put() entries from local hash tables every 30 minutes to combat churn • Nodes supposedly remove key/value pairs > 8 hours, if not re-put() by originator • Originator node must re-put() to persist? 13 Vanish Motivations • Data frequently cached/archived by email providers, ISPs, network backup systems • Often available after account termination • Forensic examination of hard drives (raid) • Laptops stolen, taken-in for repair • High-profile political scandals • Some argue the right and ability to destroy data is as fundamental as privacy & liberty 14 Vanish Motivations • Hushmail email encryption service offered cleartext contents of encrypted messages to the federal government • Trusted 3rd party (Ephemerizer) supposedly destroy data after timeout, but this never caught on… trust issue? • Subpoenas… 15 Vanish Goals • Create a Vanishing Data Object (VDO) • Becomes unreadable after a timeout, regardless if one retroactively obtains a pristine copy of VDO before expiration • Accessible until timeout • Leverage existing infrastructure • NO required passwords, keys, special security hardware… 16 Vanish Architecture • Encrypt data D w/ random key K into C • Use T.S.S.[6] to split C into N shares • Pick random access key L, use cryptographically secure PRNG (keyed by L) to derive N indices 17 Vanish Architecture • Threshold of T.S.S. (threshold ratio), determines how many of N shares are needed to reconstruct K • EX: N = 20, threshold = 10 So any 10 of the 20 shares can be used • EX: N = 50, threshold = 50 Better have all shares… • VDO = (L, C, N, threshold), sent / stored 18 Vanish Decapsulation • Given VDO: - extract access key L - derive locations of shares of K - get() # of shares required by threshold - reconstruct K - decrypt C to obtain D • # of shares must be > threshold ratio 19 Benefits of Churn! • Nodes continue to leave/re-enter network • Supposedly 80% of IPs change in 7 days • Nodes change IDs (locations in network) as IP changes • Also, hash tables per node purge themselves after some time period • So data is guaranteed to NOT last long at its original node… 20 The Big Question… [1][7] • How long are we talking w/ churn? - Vuze = unclear… (7h, 3h, 2h …) - OpenDHT = (1 hour – 1 week) - Kazaa = ~“several minutes” (2.5) • Refresh uses K, re-splits into new shares, uses L to derive new indices & re-puts() • “Naturally, refreshes require periodic Internet connectivity.” [1] 21 Implementations 22 Implementations 23 Results “two minor changes (<50 lines of code)” 24 Results “[With single share VDO model] … ~5% of VDOs continue to live long after 8 hours.” [1] “…the cause for the latter effect demands more investigation, we suspect that some of the single VDO keys are stored by DHT peers running non-default configurations.” [1] “These observations suggest that the naive (one share) approach [does not meet our goals] … thereby motivating our need for redundancy.” [1] 25 Contributions • Solution utilizes existing, popular, researched technology - used since 2001 • Interesting idea utilizing DHT as general temporary storage • No required special security hardware, or special operations on the part of the user • Utilizes inherent half-life (churn) of nodes in DHT – data definitely destroyed 26 Weaknesses • Requirements: - network connection (put, get, !destroy) - complex analysis of DHT networks - “refresher” hardware for reasonable life • Clearly not for all data! (network flooding) • Life of data is not mathematically determinate or even guaranteed (depends completely on churn) • Assumes no hardcopies of data… 27 Improvement / Future Work • Instead of refresher hardware, DHT maintenance could refresh automatically • Utilization of many P2Ps in parallel, choose appropriate one based upon churn • Analyze network w/ many data objects over very long timeout periods • Make sure VDO is well encrypted or someone could easily hack the threshold 28 References 1 Geambasu, Roxana, et al. Vanish: Increasing Data Privacy with SelfDestructing Data, USENIX Security Symposium, 2009 2 Wiley, Brandon. Distributed Hash Tables, Part I, http://www.linuxjournal.com/article/6797, Linux Journal, 2003 3 Hua, Kien. P2P Search, http://www.cs.ucf.edu/~kienhua/classes/, COP5711 Parallel & Distributed Databases, University of Central Florida, 2008 4 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29 5 http://en.wikipedia.org/wiki/Hash_table 6 Shamir, A. How to share a secret, Commun. ACM, 1979 7 Stutzbach, D., Rejaie R. Characterizing Churn in P2P Networks, Technical Report, University of Oregon, 2005 29