Coding for Modern Distributed Storage Systems II (slides)

advertisement
Coding for Modern Distributed
Storage Systems: Part 2.
Locally Repairable Codes
Parikshit Gopalan
Windows Azure Storage, Microsoft.
Rate-distance-locality tradeoffs
Def: An 𝑛, π‘˜, 𝑑 π‘ž linear code has locality 𝒓 if each co-ordinate can be expressed
as a linear combination of π‘Ÿ other coordinates.
What are the tradeoffs between 𝑛, π‘˜, 𝑑, π‘Ÿ?
[G.-Huang-Simitci-Yekhanin’12]: In any linear code with information locality r,
π‘Ÿ+1
𝑛 ≥
π‘˜ + 𝑑 − 2.
π‘Ÿ
ο‚΄ Algorithmic proof using linear algebra.
ο‚΄ [Papailiopoulus-Dimakis’12] Replace rank with entropy.
ο‚΄ [Prakash-Lalitha-Kamath-Kumar’12] Generalized Hamming weights.
ο‚΄ [Barg-Tamo’13] Graph theoretic proof.
Generalizations
ο‚΄ Non-linear codes
[Papailiopoulos-Dimakis, Forbes-Yekhanin].
ο‚΄ Vector codes
[Papailoupoulos-Dimakis, Silberstein-Rawat-Koyluoglu-Vishwanath, KamathPrakash-Lalitha-Kumar]
ο‚΄ Codes over bounded alphabets
[Cadambe-Mazumdar]
ο‚΄ Codes with short local MDS codes
[Prakash-Lalitha-Kamath-Kumar, Silberstein-Rawat-Koyluoglu-Vishwanath]
Explicit codes with all-symbol locality.
[Tamo-Papailiopoulos-Dimakis’13]
ο‚΄ Optimal length codes with all-symbol locality for π‘ž = exp(π‘˜).
ο‚΄ Construction based on RS code, analysis via matroid theory.
[Silberstein-Rawat-Koyluoglu-Vishwanath’13]
ο‚΄ Optimal length codes with all-symbol locality for π‘ž = 2𝑛 .
ο‚΄ Construction based on Gabidulin codes (aka linearized RS codes).
[Barg-Tamo’ 14]
ο‚΄ Optimal length codes with all-symbol locality for π‘ž = 𝑂(𝑛).
ο‚΄ Construction based on Reed-Solomon codes.
Stronger notions of locality
ο‚΄ Codes with local Regeneration
[Silberstein-Rawat-Koyluoglu-Vishwanath, Kamath-Prakash-Lalitha-Kumar…]
ο‚΄ Codes with short local MDS codes [Prakash-Lalitha-Kamath-Kumar,
Silberstein-Rawat-Koyluoglu-Vishwanath]
Avoids the slowest node bottleneck [Shah-Lee-Ramachandran]
ο‚΄ Sequential local recovery [Prakash-Lalitha-Kumar]
ο‚΄ Multiple disjoint local parities [Wang-Zhang, Barg-Tamo]
Can serve multiple read requests in parallel.
Problem: Consider an 𝑛, π‘˜ π‘ž linear code where even after 𝑑 arbitrary failures,
every (information) symbol has locality π‘Ÿ. How large does 𝑛 need to be?
[Barg-Tamo’14] might be a good starting point.
Tutorial on LRCs
Part 1.1: Locality
1. Locality of codeword symbols.
2. Rate-distance-locality tradeoffs: lower bounds and constructions.
Part 1.2: Reliability
1. Beyond minimum distance: Maximum recoverability.
2. Constructions of Maximally Recoverable LRCs.
Beyond minimum distance?
Is minimum distance the right measure of reliability?
Two types of failures:
ο‚΄ Large correlated failures
Power outage, upgrade.
Whole data center offline.
ο‚΄ Can assume further failures are independent.
Beyond minimum distance?
4 Racks
6 Machines per Rack
ο‚΄ Machines fail independently with probability 𝑝.
ο‚΄ Racks fail independently with probability π‘ž ≈ 𝑝3 .
ο‚΄ Some 7 failure patterns are more likely than 5 failure patterns.
Beyond minimum distance
4 Racks
6 Machines per Rack
Want to tolerate 1 rack failure + 3 additional machine failures.
Beyond minimum distance
ο‚΄ Want to tolerate 1 rack + 3 more failures (9 total).
Solution 1: Use a [24,15,10] Reed-Solomon code.
Corrects any 9 failures.
Poor locality after a single failure.
Beyond minimum distance
ο‚΄ Want to tolerate 1 rack + 3 more failures (9 total).
[Plank-Blaum-Hafner’13]:
Sector-Disk (SD) codes.
Solution 1: Use [24, 15, 6] LRCs derived from Gabidulin codes.
Rack failure gives a 18, 15, 4 MDS code.
Stronger guarantee than minimum distance.
Beyond minimum distance
ο‚΄ Want to tolerate 1 rack + 3 more failures (9 total).
[Plank-Blaum-Hafner’13]:
Partial MDS codes.
Solution 1: Use [24, 15, 6] LRCs derived from Gabidulin codes.
Rack failure gives a 18, 15, 4 MDS code.
Stronger guarantee than minimum distance.
Maximally Recoverable Codes
[Chen-Huang-Li’07, G.-Huang-Jenkins-Yekhanin’14]
Code has a topology that decides linear relations between symbols (locality).
Any erasure with sufficiently many (independent) constraints is correctible.
[G-Huang-Jenkins-Yekhanin’14]: Let 𝛼1 , … , 𝛼𝑑 be variables.
1. Topology is given by a parity check matrix, where each entry is a linear
function in the 𝛼𝑖 s.
2. A code is specified by choice of 𝛼𝑖 s.
3. The code is Maximally Recoverable if it corrects every error pattern that its
topology permits.
ο‚΄ Relevant determinant is non-singular.
ο‚΄ There is some choice of 𝛼s that corrects it.
Example 1: MDS codes
β„Ž global equations:
𝑛
𝛼𝑖,𝑗 𝑋𝑖 = 0.
𝑖=1
Reed-Solomon codes are Maximally Recoverable.
Example 2: LRCs (PMDS codes)
Assume π‘Ÿ|π‘˜, (π‘Ÿ + 1)|𝑛. Want length 𝑛 codes satisfying
1. Local constraints: Parity of each column is 0.
2. β„Ž Global constraints: Linear constraints over all symbols.
The code is MR if puncturing one entry per column gives an π‘˜ + β„Ž, π‘˜
The code is SD if puncturing any row gives an π‘˜ + β„Ž, π‘˜
Known constructions require fairly large field sizes.
π‘ž
MDS code.
π‘ž
MDS code.
Example 3: Tensor Codes
Assume π‘Ÿ|π‘˜, (π‘Ÿ + 1)|𝑛. Want length 𝑛 codes satisfying
1. Column constraints: Parity of each column is 0.
2. β„Ž constraints per row: Linear constraints over symbols in the row.
Problem: When is an error pattern correctible?
Tensor of Reed-Solomon with Parity is not necessarily MR.
Maximally Recoverable Codes
[Chen-Huang-Li’07, G.-Huang-Jenkins-Yekhanin’14]
Let 𝛼1 , … , 𝛼𝑑 be variables.
1. Each entry in the parity check matrix is a linear function in the 𝛼𝑖 s.
2. A code is specified by choice of 𝛼𝑖 s.
3. The code is Maximally Recoverable if it corrects every error pattern possible
given its topology.
[G-Huang-Jenkins-Yekhanin’14] For any topology, random codes over sufficiently
large fields are MR codes.
Do we need explicit constructions?
ο‚΄ Verifying a given construction is good might be hard.
ο‚΄ Large field size is undesirable.
How encoding works
Encoding a file using an 𝑛, π‘˜
π‘ž
code 𝐢.
Ideally field elements are byte vectors, so π‘ž = 28𝑐 .
1. Break file into π‘˜ equal sized parts.
2. Treat each part as a long stream over πΉπ‘ž .
3. Encode each row (of π‘˜ elements) using 𝐢, to create 𝑛 − π‘˜ more streams.
4. Distribute them to the right nodes.
a
z
a
j
d
r
b
c
d
g
f
t
b
f
n
v
v
y
g
g
g
x
b
j
How encoding works
Encoding a file using an 𝑛, π‘˜
π‘ž
code 𝐢.
Ideally field elements are byte vectors, so π‘ž = 28𝑐 .
1. Break file into π‘˜ equal sized parts.
2. Treat each part as a long stream over πΉπ‘ž .
3. Encode each row (of π‘˜ elements) using 𝐢, to create 𝑛 − π‘˜ more streams.
4. Distribute them to the right nodes.
Step 3 requires finite field arithmetic over πΉπ‘ž .
ο‚΄ Can use log tables up to 224 (a few Gb).
ο‚΄ Speed up via specialized CPU instructions.
ο‚΄ Beyond that, matrix vector multiplication (dimension = bit-length).
Field size matters even at encoding time.
How decoding works
Decoding from erasures = solving a linear system of equations.
ο‚΄ Whether an erasure pattern is correctible can be deduced from the generator
matrix.
ο‚΄ If correctible, each missing stream is a linear combination of the available
streams.
Random codes are as “good” as explicit codes for a given field size.
a
z
d
r
a
j
b
c
f
t
d
g
b
f
v
y
n
v
g
g
b
j
g
x
Maximally Recoverable Codes
[Chen-Huang-Li’07, G.-Huang-Jenkins-Yekhanin’14]
Thm: For any topology, random codes over sufficiently large fields are MR codes.
ο‚΄ Large field size is undesirable.
ο‚΄ Is there a better analysis of the random construction?
[Kopparty-Meka’13]: Random π‘˜ + 𝑑 − 1, π‘˜
probability exp −π‘˜ for π‘ž ≤ π‘˜ 𝑑−1.
π‘ž
codes are MDS only with
Random codes are MR with constant probability for π‘ž = O(d ⋅ π‘˜ 𝑑 ).
Could explicit constructions require smaller field size?
Maximally Recoverable LRCs
1. Local constraints: Parity of each column is 0.
2. β„Ž Global constraints.
The code is MR if puncturing one entry per
column gives an π‘˜ + β„Ž, π‘˜ π‘ž MDS code.
1. Random gives MR LRCs for π‘ž = O
π‘˜β„Ž
π‘˜
π‘Ÿ
⋅ π‘Ÿ , SD for q = 𝑂 π‘˜β„Ž .
2. [Silberstein-Rawat-Koylouglu-Vishwanath’13] Explicit MR LRCs with π‘ž = 2𝑛 .
[G.-Huang-Jenkins-Yekhanin]
ο‚΄ Basic construction: Gives π‘ž = 𝑂 π‘˜ β„Ž .
ο‚΄ Product construction: Gives π‘ž = 𝑂 π‘˜
1−πœ– β„Ž
for suitable β„Ž, π‘Ÿ.
Open Problems:
ο‚΄ Are there MR LRCs over fields of size 𝑂 𝑛 ?
ο‚΄ When is a tensor code MR? Explicit constructions?
ο‚΄ Are there natural topologies for which MR codes only exist over
exponentially large fields? Super-linear sized fields?
Thank you
ο‚΄ The Simons institute, David Tse, Venkat Guruswami.
ο‚΄ Azure Storage + MSR: Brad Calder, Cheng Huang, Aaron Ogus, Huseyin
Simitci, Sergey Yekhanin.
ο‚΄ My former colleagues at MSR-Silicon Valley.
Download