236601 - Coding and Algorithms for Memories Lecture 13 1 Large Scale Storage Systems • Big Data Players: Facebook, Amazon, Google, Yahoo,… Cluster of machines running Hadoop at Yahoo! (Source: Yahoo!) • Failures are the norm 2 Node failures at Facebook Date XORing Elephants: Novel Erasure Codes for Big Data M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur, VLDB 2013 3 Problem Setup • Disks are stored together in a group (rack) • Disk failures should be supported • Requirements: – Support as many disk failures as possible – And yet… • Optimal and fast recovery • Low complexity 4 Reed Solomon Codes • A code with parity check matrix of the form 1 1 ⋯ 1 1 1 _1 2 3 𝑛 0 ⋯ 𝛼 𝛼 _ 𝛼 𝛼 𝛼 𝑛1 2 4 2 2 0 ⋯ 𝛼 𝛼 𝛼 𝛼 𝛼 ⋮_ ⋮_ ⋮ ⋮ _ ⋮ ⋮ _ _1 2 𝑑 1 3 𝑑 1 𝑑 1 𝑛1 𝑑 0 ⋯ 𝛼 𝛼 𝛼𝛼 𝛼 Where 𝛼 is a primitive element at some extension field and O(𝛼) > n-1 Claim: Every sub-matrix of size dxd has full rank 5 Reed Solomon Codes • Advantages: – Support the maximum number of disk failures – Are very comment in practice and have relatively efficient encoding/decoding schemes • Disadvantages – Require to work over large fields Solution: EvenOdd Codes – Need to read all the disks in order to recover even a single disk failure – not efficient rebuild Solution: ZigZag Codes 6 The Repair Problem • 1 Facebook’s storage Scheme: RS code – 10 data blocks – 4 parity blocks – Can tolerate any four disk failures 2 3 4 5 6 7 8 9 10 P1 P2 P3 P4 • A disk is lost – Repair job starts • Access, read, and transmit data of disks! • Overuse of system resources during single repair • Goal: Reduce repair cost in a single disk repair 7 ZigZag Codes • Designed by Itzhak Tamo, Zhiying Wang, and Jehoshua Bruck • The goal: construct codes correcting the max number of erasures and yet allow efficient reconstruction if only a single drive fails 8 ZigZag Codes • Lower bound: The min amount of data required to be read to recover a single drive failure – (n,k) code: n drives, k information, and n-k redundancy – M- size of a single drive in bits • For (n,n-2) code it is required to read at least 1/2 from the remaining drives, that is at least (1/2)(n-1)M bits – The last example is optimal • In general, for (n,n-r) code it required to read at least 1/r from the remaining drives (1/r)(n-1)M 9 ZigZag Codes • Example info 1 info 2 info 3 0 1 2 3 2 3 0 1 1 0 3 2 Row ZigZag parity parity 0 1 2 3 10 Network Coding for Distributed Storage • Goal – show the following: In general, for (n,n-r) code it required to read at least 1/r from the remaining drives (1/r)(n-1)M • Network Coding for Distributed Storage Dimakis, Godfrey, Wu, Wainwright, Ramchandran • File of size M is partitioned into k pieces of size M/k • The k pieces are encoded into n encoded pieces using an (n,k) MDS code 11 Network Coding for Distributed Storage • File of size M is partitioned into k pieces of size M/k • The k pieces are encoded into n encoded pieces using an (n,k) MDS code x1 y1 x2 y2 x3 x4 12 Network Coding for Distributed Storage • File of size M is partitioned into k pieces of size M/k • The k pieces are encoded into n encoded pieces using an (n,k) MDS code x1 y1 x2 y2 x3 β=? β β x5 x4 13 Network Coding for Distributed Storage • File of size M is partitioned into k pieces of size M/k • The k pieces are encoded into n encoded pieces using an (n,k) MDS code ∞ ∞ S ∞ ∞ x1 in α=1 x2 in α=1 x3 in x4 in x1 ou t x2 ou t α=1 x3 α=1 x4 ou t ou t β=? ∞ β DC β x5 in x5 ou t ∞ 14 ZigZag Codes • Example a c b d a+b c+d a+2d c+b 15