distributed storage and NC - Institute of Network Coding

advertisement
BASIC Regenerating Codes for
Distributed Storage Systems
Kenneth Shum
(Joint work with Minghua Chen, Hanxu Hou and Hui Li)
Window Azure data centers
Aug 2013
kshum
2
http://technoblimp.com
Inside a data center
Aug 2013
kshum
3
Data distribution
• Encode and distribute a data file to n storage
nodes.
Data File: “INC”
Aug 2013
kshum
4
Data collector
• Data collector can retrieve the whole file by
downloading from any k storage nodes.
“INC” 
Aug 2013
kshum
5
Three kinds of disk failures
• Transient error due to noise corruption
– repeat the disk access request
• Disk sector error
– partial failure
– detected and masked by the operating system
• Catastrophic error
– total failure due to disk controller for instance
– the whole disk is regarded as erased
Aug 2013
kshum
6
Frequency of node failures
Figure from “XORing elephants: novel erasure codes for Big Data”
by Sathiamoorthy et al.
Aug 2013
Number of failed nodes over a single month in a
3000 node production cluster of Facebook.
7
Outline of this talk
• Repetition scheme
• Traditional erasure-correcting codes
– Reed-Solomon codes
• Network-coding-based scheme
– BASIC regenerating codes
Aug 2013
kshum
8
Distributed storage system
• Encode a data file and distribute it to n disks
• (n,k) recovery property
– The data file can be rebuilt from any k disks.
• Repair
– If a node fails, we regenerate a new node by
connecting and downloading data from any d
surviving disks.
– Aim at minimizing the repair bandwidth
(Dimakis et al 2007).
• A coding scheme with the above properties is
called a regenerating code.
Aug 2013
kshum
9
Repetition scheme
• GFS: Replicate data 3 times
• Gmail: Replicate data 21 times
Aug 2013
kshum
10
2x Repetition scheme
Divide the data
file into 2 parts
A, B
1G
1G
1G
A
B
A
1G
B
Aug 2013
Data
Collector
Cannot tolerate
double disk failures
11
Repair is easy for repetition-based system
New node
A
A
B
1G
A
Repair bandwidth =1G
B
Aug 2013
12
Reed-Solomon Code
Divide the
file into 2 parts
A
A, B
B
Data
Collector
A+B
A+2B
Aug 2013
It can tolerate
double disk failures
13
Repair requires essentially decoding the
whole file
A
A
New node
1G
B
1G
A+B
Repair bandwidth = 2G
A+2B
Aug 2013
kshum
14
BASIC regeneration code
Divide the data
file into 4 parts
0.5G
0.5G
0.5G
0.5G




Aug 2013
Binary
Addition
Shift
Implementable
Convolutional
Utilization of bit-wise shift
in storage was proposed by
Piret and Krol (1983), and
Qureshi, Foh and Cai (2012).
15
Download from nodes 1 and 2
1G
0.5G
0.5G
0.5G
0.5G

1G
Data
Collector



Aug 2013
16
Download from nodes 1 and 3
1G
0.5G
0.5G
0.5G
0.5G
Data
Collector


1G


Aug 2013
17
Download from nodes 1 and 4
1G
0.5G
0.5G
0.5G
0.5G
Data
Collector



1G

Aug 2013
18
Download from nodes 2 and 3
1G
0.5G
0.5G
0.5G
0.5G
Data
Collector


1G


Aug 2013
19
Download from nodes 2 and 4
1G
0.5G
0.5G
0.5G
0.5G
Data
Collector



1G

Aug 2013
20
Download from nodes 3 and 4
0.5G
0.5G
0.5G
0.5G

1G
Data
Collector


1G

Aug 2013
21
Zigzag decoding
à la Gollakata and Katabi (2008)
What to solve
for P1 and P2.
P1

P2
P1  P2
P1

P2’
P1  P2’
Aug 2013
kshum
22
Repair of BASIC regenerating code
New
node
XOR
Repair bandwidth=1.5 G


Bitwise shift and XOR


Bitwise shift and XOR
Repair of BASIC regenerating code


Decode the blue
and red packets by
zigzag decoding

Interference alignment

Comparison of the three examples
Repetition
scheme
Reed-Solomon Codes
BASIC regenerating
codes
Storage
efficiency
1/2
1/2
1/2
Reliability
Tolerate one
disk failure
Tolerate two disk
failures
Tolerate two disk
failures
Repair
bandwidth
1G
2G
1.5 G
Finite field arithmetic
Binary addition
and bit-wise shift
Computational Very small
complexity
Aug 2013
kshum
25
Summary
• We can reduce repair bandwidth by network
coding.
• BASIC regenerating codes
– A failed storage node can be repaired by simple
bit-wise shift and XOR operations.
– Small storage overhead due to shifting.
Aug 2013
kshum
26
References
• Piret and Krol, MDS convolution codes, IEEE Trans. of Information Theory,
1983.
• Dimakis, Brighten, Wainwright and Ramchandran, Network coding for
distributed storage systems, INFOCOM, 2007.
• Gollakata and Katabi, Zigzag decoding: combating hidden terminals in
wireless networks, Proc. in the ACM Sigcomm, 2008.
• Qureshi, Foh, and Cai, Optimal solution for the index coding problem using
network coding over GF(2), Proc. IEEE Conf. on Sensor Mesh and Ad Hoc
Comm. and Network, 2012.
• Sung and Gong, A zigzag decodable code with MDS property for
distributed storage systems, Proc. IEEE Symp. on Information Theory, 2013.
• Hou, Shum, Chen and Li, BASIC regenerating code: binary addition and
shift for exact repair, Proc. IEEE Symp. on Information Theory, 2013.
Aug 2013
kshum
27
Two modes of repair
• Exact repair
– The content of the new node is exactly the same
as the content of the failed node
• Functional repair
– only requires that the (n,k) recovery property is
preserved.
Aug 2013
kshum
28
Download