Samsara: Honor Among Thieves in Peer-to-Peer Storage Introduction Peer-to-Peer Paradigm A node stores some data in remote nodes Agree to do the same in return Replication fault-tolerance Decentralized Self-administering Scalable Problem with the P2P Model The tragedy of the commons Consume without contributing Some existing solutions Third parties centralized administration Currency trusted infrastructure Certified evidence of storage consumption centralized authority Observation Problem is simplified if we have Symmetric exchange of resources Observation Problem is simplified if we have Symmetric exchange of resources Guarantees consumption <= contribution However, symmetric relationships are rare Replica A needs 1 GB; replica B needs 1 MB Samsara An infrastructure to enforce fairness in peer-to-peer systems No third trusted parties No monetary models No certified identities Another Observation Symmetric storage relationships can be manufactured A claim-based system Another Observation Symmetric storage relationships can be manufactured A claim-based system Based on incompressible storage claims Storage Claims A node periodically checks its peer Make sure that the peer is adhering to the contract If the peer breaches the contract The node is free to drop the peer’s data Each node now can perform selfishly Collectively, all nodes need to play fair Some Questions How to reduce the storage overhead How to punish cheaters How to tell failures from cheating Background Pastiche Peer-to-peer, cooperative backup system Unsolved problem: unchecked storage consumption Pastiche Samsara Pastry OS, Disk Design Goal: Ensure that nodes consume no more resources than they contribute Manufacture symmetric storageexchange relationships Through storage claims A claim can be passed along to form a dependency chain A claim can be removed if it forms a circular dependency Design Punish cheaters by deleting their data probabilistically Short outage can recover from surviving copies Cheaters will eventually lose data Claim Construction Requires three values A secret pass phrase A private, symmetric key A location in the storage space Claim Construction Storage space initially filled with hash values SHA( Pass phrase , 0) SHA( Pass phrase , 1) SHA( Pass phrase , 2) … … Key … Querying Nodes Queries Monitor remote storage Once every few hours Need not be answered immediately Querying Nodes Query sends h0 to verify data 1..n Remote site computes h1 = SHA(data1, h0) h2 = SHA(data2, h1) … hn = SHA(datan, hn-1) Remote site returns hn Transient Failure Difficult to discern cheating from transient failures One solution: Grace periods before deletion Problem: revolving credits… Samsara Solution Replication + independent probabilistic deletion Deletion rate is an exponential growing function of the number of failed queries A cheater (> 32GB) cannot replicate fast enough to get a free ride Need to replicate 10 times in 3 days Samsara Solution A node should only lose all of its data if it fails queries for an entire grace period Most outages are within 3 days Probabilistic Discard Example X1 2 X3 X4 X5 6 X2 3 X4 X5 X6 1 X1 2 X3 X4 5 X6 Failed queries = 2 1 0 3 Overhead Reduction Storage claims can be forwarded Overhead Reduction Storage claims can be forwarded Overhead Reduction Storage claims can be forwarded However, if something goes wrong The forwarding replica is responsible Increase the incentive for not forwarding Diffie-Hellman Key Exchange Need a prime number p Need a base integer g between 1 and p – 1 Site A picks x between 1 and p – 2 Site B picks y between 1 and p – 2 p: 13 g: 7 A: 3 B: 5 Diffie-Hellman Key Exchange Site A computes gx mod p A: 73 mod 13 = 5 B: 75 mod 13 = 11 Site B computes gy mod p Site A and B exchange public values A: 3, 11(from B) B: 5, 5 (from A) Diffie-Hellman Key Exchange Site A computes (gy mod p)x mod p A: 3, 11(from B) B: 5, 5 (from A) Site B computes (gx mod p)y mod p Now A and B have a shared secret Problem: Prone to man-in-the-middle attacks A: 113 mod 13 = 5 B: 55 mod 13 = 5 Forwarding and Reliability Longer forwarding chain lower reliability Cyclic chains are okay, because the accountability is wrapped around Unfortunately, cycles are rarely found Limitations Cannot handle malicious nodes Cannot force nodes to store data for others Cannot create place holders for bandwidth and processing power Implementation Written in C Three layers Messaging layer Replica manager Storage layer – A single flat file – Linked list of free space File copy benchmark 10 Seconds 8 store claims 6 fetch claims 4 store data 2 0 NFS Samsarad 13MB file copied between two nodes Query benchmark 6 Seconds 5 4 verify 3 answer query 2 1 0 Query File Query Claims 2 hours to verify 32GB claims @550MHz Reliability simulations Examine chain length and reliability What percentage of files lost? Simulate the absolute worst case Limit chain length Transfer as much as possible w/i limit All failures occur Permanently Simultaneously Before new replicas can be created Percent Lost Objects Reliability results 5% 4% 3% 2% 1% 0% 1% 2% 4% 8% Failure Rate 1 2 4 8 unlim 16%