Copyset - Stanford Computer Forum

advertisement
Copysets: Reducing the Frequency of Data Loss in Cloud Storage
Asaf Cidon, Stephen Rumble, Ryan Stutsman
Sachin Katti, John Ousterhout and Mendel Rosenblum
Stanford University
Each Power Outage Causes Data Loss
•
•
•
•
Cloud storage systems use random replication
Random replication is vulnerable to power outages
~1% of nodes fail to reboot after power outage
Each data loss event has a fixed cost:
• Better to lose data infrequently at the
expense of losing more data in each event
Minimize Copysets  Minimize Data Loss Events
•
•
•
•
Copyset: unique set of nodes that contain all replicas of a chunk of data
System loses data when nodes of at least 1 copyset fail simultaneously
Random replication creates too many copysets
Minimum Copysets: statically split nodes into copysets. Each node belongs to single copyset
• Place first replica on random node
• Place other replicas deterministically on first node’s copyset
• On 5000 node cluster, data loss event every 625 years, each event loses data of entire node
Copyset Replication
• Problem: most systems need to scatter data across a number of nodes (scatter width)
• Otherwise, we increase recovery time and impact load balancing
• Copyset Replication: Given a scatter width, minimize the number of copysets:
Permuta on Phase
Node 1
Node 2
Node 3
Node 4
Node 5
Node 6
Replica on Phase
Node 7
Node 8
Node 9
Node 2
Primary
Permuta on 1
Node 7
Node 5
Node 1
Node 6
Node 4
Node 9
Randomly pick
copyset
Node 3
Node 2
Node 8
Node 7
Copyset 1
Copyset 2
Node 5
Node 1
Node 6
Node 4
Node 9
Node 3
Node 2
Node 8
Copyset 3
Copyset 1
Copyset 2
Copyset 3
Permuta on 2
Node 9
Node 7
Copyset 4
Node 2
Node 3
Node 6
Copyset 5
Node 1
Node 4
Node 5
Copyset 6
Node 8
Node 9
Node 7
Copyset 4
Node 2
Node 3
Node 6
Copyset 5
Node 1
Node 4
Node 5
Copyset 6
• Copyset Replication significantly reduces data loss
• While preserving system’s scatter width and node recovery time
• Implemented and evaluated on HDFS and RAMCloud
• Minimal overhead on normal operations and recovery
Node 8
Download