Codes with local decoding procedures Sergey Yekhanin Microsoft Research Error-correcting codes: paradigm Sender Receiver π ∈ πΉ2π E(π) ∈ πΉ2π E(π) +noise π 0110001 011000100101 01*00*10010* 0110001 Encoder Channel Decoder Erases up to π coordinates. • The paradigm dates back to 1940s (Shannon / Hamming) • One limitation: recovering a single message coordinate requires processing all corrupted codeword Local decoding: paradigm π ∈ πΉ2π 0110001 E(π) ∈ πΉ2π E(π) +noise 011000100101 01*00*10010* Encoder ππ 1 Channel Local Decoder Erases up to π coordinates. Reads up to π coordinates. Local decoder runs in time much smaller than the message length! • First account: Reed’s decoder for Muller’s codes (1954) • Implicit use: (1950s-1990s) • Formal definition and systematic study (late 1990s) [Levin’95, STV’98, KT’00] ο§ Original applications in computational complexity theory ο§ Cryptography ο§ Most recently used in practice to provide reliability in distributed storage (Microsoft Azure, Windows Server, Windows, Hadoop, etc.) Local decoding: example E(X) X1 X X1 X2 X3 X1 X2 X2 X1 X3 X1 X2 X3 Message length: k = 3 Codeword length: n = 7 Erased locations: π = 3 Locality: π = 2 X3 X2 X3 Local decoding: example E(X) X1 X X1 X2 X3 X1 X2 X2 X1 X3 X1 X2 X3 Message length: k = 3 Codeword length: n = 7 Erased locations: π = 3 Locality: π = 2 X3 X2 X3 Local decoding: Decoding tuples for X1 E(X) X1 X2 X3 X X1 X2 X3 X1ο X2 X1ο X3 X1ο X2ο X3 X2ο X3 Local decoding: Decoding tuples for X2 E(X) X1 X2 X3 X X1 X2 X3 X1ο X2 X1ο X3 X1ο X2ο X3 X2ο X3 Codes with local decoding Setting: Encode π dimensional messages to π dimensional codewords. Main parameters: Redundancy π − π, locality π, and noise level π. Goal: Understand the true shape of the tradeoff between redundancy and locality, for different settings of noise. (e.g., π = πΏπ, ππ , π 1 .) Applications in crypto / complexity ππ (log π)π Multiplicity codes Local reconstruction codes Projective geometry codes Locally decodable codes π Applications to data storage Reed Muller codes Matching vector codes π(1) π(1) ππ πΏπ π Taxonomy of known families of codes Plan • Part I: (Locally decodable codes) • Private Information Retrieval (PIR) schemes • PIR schemes from smooth codes • Reed Muller codes • Part II: (Codes with locality for distributed data storage) • Erasure coding for data storage • Local reconstruction codes for data storage • Constructions and limitations Part I: Locally decodable codes Smooth codes E(X) X X1 X2 c1 … Xk c1 ci c5 … c4 c7 c8 c2 c6 c3 cj cn Definition: Consider a code πΈ that encodes π–dimensional messages π to π −dimensional codewords πΈ(π). For every π in [π], we have a family of decoding π −tuples π·π. We say that E is π −smooth if for each π in [π], π −tuples in π·π partition the set [π]. Note: If the code πΈ is π −smooth; then each Xπ can be recovered by π reading π coordinates after π = − 1 erasures in πΈ(π). π Private information retrieval [CGKS] Protocols that allow users to privately retrieve items from replicated DBs X→E(X) Protocol: • • • • • … X→E(X) Each server encodes the π-bit database π with the same π-query smooth code The user interested in ππ, picks a random π-tuple π from π·π The user sends π elements of π to π different servers Servers respond with respective coordinates of πΈ(π) User finds out ππ (Each server observes a sample from a uniform distribution on [π].) Private information retrieval Properties of the protocol: • Information theoretic privacy ↔ smoothness • Number of DB replicas needed ↔ locality • Communication complexity ↔ codeword length Short smooth codes with low query complexity yield efficient PIR schemes. Example: 3-server schemes with 2π log π -communication to access an π-bit DB. [Dvir Gopi’ 2014]: 2-server scheme with the same communication. Reed Muller codes • Parameters: π, π, π = π − 2. • Codewords: evaluations of degree π polynomials in π variables over πΉπ . • Polynomial π ∈ πΉπ π§1 , … , π§π , deg f ≤ π yields a codeword: π(π₯) • Encoder is systematic. • Parameters: π = π π , π = • π+π . π We argue that the code is π-smooth for π = π − 1. π₯∈πΉππ Reed Muller codes: local decoding • Key observation: Restriction of a codeword to an affine line yields an evaluation of a univariate polynomial π πΏ of degree at most π. q, m , d . • Decoding tuples for the value at π₯: – Consider all affine lines through π₯. – Use polynomial interpolation. πΉππ π₯ • Smooth code: Affine lines partition the space. Decoder reads π − 1 coordinates. Reed Muller codes: parameters π = ππ , π= π+π , π π = π − 2, π = π − 1, π = πΏπ. Setting parameters: 1 π−1 • q = π 1 , π → ∞: π = π 1 , π = exp π • q = π2 π = (log π)2 , π = ππππ¦ π . βΆ • q → ∞, π = π 1 : . π = ππ , π = π π . Better codes are known Reducing codeword length of locally decodable codes is a major open problem. Part II: Distributed storage Data storage • Store data reliably • Keep it readily available for users Data storage: Replication • Store data reliably • Keep it readily available for users • Very large overhead • Moderate reliability • Local recovery: Lose one machine, access one Data storage: Erasure coding • Store data reliably • Keep it readily available for users … … … • Low overhead • High reliability • No local recovery: Loose one machine, access π π data chunks π − π parity chunks Need: Erasure codes with local decoding Codes for data storage X1 X2 … Xk P1 … Pn-k • Goals: • (Cost) minimize the number of parities. • (Reliability) tolerate any pattern of h + 1 simultaneous failures. • (Availability) recover any data symbol by accessing at most π other symbols • (Computational efficiency) use a small finite field to define parities. Local reconstruction codes • Def: An (π, β) – Local Reconstruction Code (LRC) encodes π symbols to π symbols, and • Corrects any pattern of β + 1 simultaneous failures; • Recovers any single erased data symbol by accessing at most π other symbols. Local reconstruction codes • Def: An (π, β) – Local Reconstruction Code (LRC) encodes π symbols to π symbols, and • Corrects any pattern of β + 1 simultaneous failures; • Recovers any single erased data symbol by accessing at most π other symbols. • Theorem[GHSY]: In any (π, β) – (LRC), redundancy π − π satisfies π − π ≥ π π + β. Local reconstruction codes • Def: An (π, β) – Local Reconstruction Code (LRC) encodes π symbols to π symbols, and • Corrects any pattern of β + 1 simultaneous failures; • Recovers any single erased data symbol by accessing at most π other symbols. • Theorem[GHSY]: In any (π, β) – (LRC), redundancy π − π satisfies π − π ≥ π π + β. • Theorem[GHSY]: If π π and β < π + 1; then any (π, β) – LRC has the following topology: Light parities Data symbols Heavy parities … L1 X1 … Lg Xr … Xk-r H1 … Hh … Local group Xk Local reconstruction codes • Def: An (π, β) – Local Reconstruction Code (LRC) encodes π symbols to π symbols, and • Corrects any pattern of β + 1 simultaneous failures; • Recovers any single erased data symbol by accessing at most π other symbols. • Theorem[GHSY]: In any (π, β) – (LRC), redundancy π − π satisfies π − π ≥ π π + β. • Theorem[GHSY]: If π π and β < π + 1; then any (π, β) – LRC has the following topology: Light parities Data symbols Heavy parities … L1 X1 … Lg Xr … Xk-r H1 … Hh … Local group Xk • Fact: [HCL] There exist (π, β) – LRCs with optimal redundancy over a field of size π + β. Reliability Set π = 8, π = 4, and β = 3. L1 X1 X2 L2 X3 X5 X4 H1 H2 H3 X6 X7 X8 Reliability Set π = 8, π = 4, and β = 3. L1 X1 X2 L2 X3 X5 X4 H1 • All 4-failure patterns are correctable. H2 H3 X6 X7 X8 Reliability Set π = 8, π = 4, and β = 3. L1 X1 X2 L2 X3 X5 X4 H1 H2 • All 4-failure patterns are correctable. • Some 5-failure patterns are not correctable. H3 X6 X7 X8 Reliability Set π = 8, π = 4, and β = 3. L1 X1 X2 L2 X3 X5 X4 H1 H2 • All 4-failure patterns are correctable. • Some 5-failure patterns are not correctable. • Other 5-failure patterns might be correctable. H3 X6 X7 X8 Reliability Set π = 8, π = 4, and β = 3. L1 X1 X2 L2 X3 X5 X4 H1 H2 • All 4-failure patterns are correctable. • Some 5-failure patterns are not correctable. • Other 5-failure patterns might be correctable. H3 X6 X7 X8 Combinatorics of correctable failure patterns Def: A regular failure pattern for a (π, β)-LRC is a pattern that can be obtained by failing at most one symbol in each local group and β extra symbols. L1 X1 X2 L2 X3 X4 H1 X5 H2 X6 L1 X7 X8 X1 X2 H3 L2 X3 X4 H1 X5 H2 X6 H3 Theorem: • If a failure pattern that is not regular; then it is not correctable by any LRC. • There exist LRCs that correct all regular failure patterns. X7 X8 Maximally recoverable codes Def: An (π, β)-LRC is maximally recoverable if it corrects all regular failure patterns. Theorem: [BHH] Maximally recoverable (π, β)-LRCs exist. Proof sketch: Pick the coefficients in heavy parities at random from a large finite field. Asymptotic setting: β = π 1 , π = π 1 , π → ∞. Random choice needs a field of size at least [KM]: Ω π β−1 . The tradeoff: Larger fields allow for more reliable codes up to maximal recoverability. We want both: small field size (efficiency) and maximal recoverability. Explicit maximally recoverable codes Theorem[GHJY]: There exist maximally recoverable (π, β)-LRC over a field of size ππ 1 β−1 1− π 2 . Comparison: • Our alphabet grows as π π β−1 or slower. • Beats random codes for small β and large β. • Our only lower bound for the alphabet size thus far is π + 1 independent of β. Code construction We use dual constraints to specify the code. π π ππ ππ … ππ π³π 1 1 … 1 1 … ππ−π ππ−π+π 1 1 π π + 1 local groups. … ππ π³π/π … 1 1 π―π π―π … h πΌππ 2 πΌππ … β−1 2 πΌππ Element πΌππ appears in the j-th column of the i-th group. We consider a sequence field extensions πΉ2 ⊆ πΉ2π ⊆ πΉ2π . {ππ } ⊆ πΉ2π form a basis over πΉ2 . {ππ } ⊆ πΉ2π are β-independent over πΉ2π . πΌππ =ππ × ππ . π―π Erasure correction k=8, r=4, h=2. ππ ππ ππ ππ π³π 1 1 1 1 1 ππ ππ ππ ππ π³π π―π π―π π―π 1 1 1 1 1 1 1 1 1 πΌ11 πΌ12 πΌ21 πΌ22 πΌ31 πΌ11 πΌ12 πΌ21 πΌ22 πΌ31 2 2 πΌ11 πΌ12 2 2 πΌ21 πΌ22 2 πΌ31 2 2 2 2 2 πΌ11 πΌ12 πΌ21 πΌ22 πΌ31 4 4 πΌ11 πΌ12 4 4 πΌ21 πΌ22 4 πΌ31 4 4 4 4 4 πΌ11 πΌ12 πΌ21 πΌ22 πΌ31 (πΌ11 +πΌ12 ) (πΌ21 +πΌ22 ) πΌ31 πΌ11 +πΌ12 πΌ21 +πΌ22 πΌ31 2 2 πΌ11 +πΌ12 2 2 πΌ21 +πΌ22 2 πΌ31 (πΌ11 +πΌ12 ) 2 (πΌ21 +πΌ22 )2 2 πΌ31 4 4 πΌ11 +πΌ12 4 4 πΌ21 +πΌ22 4 πΌ31 (πΌ11 +πΌ12 ) 4 (πΌ21 +πΌ22 )4 4 πΌ31 (πΌ11 +πΌ12 ) (πΌ21 +πΌ22 ) πΌ31 (π1 + π2 ) × π1 (π1 + π2 ) × π2 π1 × π3 Looking forward • Codes with locality allow super-fast recovery of individual message coordinates from corrupted codewords. • Such codes are used to provide reliability in distributed storage and have many applications in theoretical computer science. • Many questions regarding these codes remain wide open: – π=Ω π : • π = 3: π 2 ≤ π π ≤ 22 log π . • π = π π(1) : π ≤ n(k) ≤ π(π). - π = 1: π(π) is well understood. Maximally recoverable codes? - π = π 1 : Tight bounds for π π ?