When Are LDCs a False Promise? Moni Naor Weizmann Institute of Science Talk Based on: • The Complexity of Online Memory Checking [Naor and Rothblum] • Fault Tolerant Storage And Quorum Systems [Nadav and Naor] • On the Compressibility of NP Instance and Cryptographic Applications [Harnik and Naor] Theme: cases where LDC should be helpful but •Either provably not helpful •Or open problem Authentication Verifying a string has not been modified – Central problem in cryptography – Many variants Our Setting: • User works on large file residing on a remote server • User stores a small secret `fingerprint’ (hash) of file – Used to detect corruption • What is the size of the fingerprint? – A well understood problem Online Memory Checking Problem with the model: What if we don’t want to read the entire file? What if we only want small part? Read entire file?! Idea: Don’t verify the entire file, verify what you need! – How much of the file do you read per authenticated bit? – How large a fingerprint do you need? Online Memory Checkers User makes store and retrieve requests to memory a vector in {0,1}n under adversary’s control Checker Checks: answer to retrieve = last stored value Checker: – Has secret reliable memory: space complexity s(n) – Makes its own reads/writes: query complexity q(n) Want small s(n) and small q(n)! b User retrieve(i) store(i,b) q(n) bits C memory checker secret memory s(n) bits R/W R/W R/W Public memory Memory Checker Requirements: For ANY sequence of user requests and ANY responses from public memory: Completeness: If every read from public memory = last write Guarantee: user retrieve = last store (w.h.p) Soundness: If some read from public memory ≠ last write Guarantee: user retrieve = last store or BUG (w.h.p) b orbBUG User retrieve(i) C memory checker secret memory s(n) bits Public memory Past Results: [Blum, Evans, Gemmel, Kannan and Naor 1991] Offline Memory Checkers: Detect errors only at end of long request sequence q(n)=O(1) (amortized) s(n)=O(log n) Are they Very Simple No Crypto assumptions! necessary?! (in chunks) Online Memory Checkers: Other Results: With One-Way Functions No Computational Assumpt. Optimal [Gemmel Naor 92] q(n)=O(log n) q(n) (any query complexity) Must be> invasive [Ajtai 2003] = O(n/q(n)) s(n)=n (for any 0) s(n) s(n) x q(n) = O(n) Authenticators Memory Checkers allow reliable local decodability, What about reliable local testability? Authenticators: • Encode the file x 2 {0,1}n into: • a large public encoding px • a small secret encoding sx. Space complexity: s(n) • Decoding Algorithm D: – Receives a public encoding p and decodes it into a vector x 2 {0,1}n • Consistency verifier checks (repeatedly) public encoding was it (significantly) corrupted? reading only a few bits: t(n). – If not currupted: verifier should output “Ok” – If verifier outputs “Ok”, decoder can (whp) retrieve the file Pretty Good Authenticator with computational assumptions • Idea: encode file X using a good error correcting code C – Actually erasures are more relevant – As long as a certain fraction of the symbols of C(X) is available, can decode X • Add to each symbol a tag Fk(a,i), a function of • secret information k 2 {0,1}s, seed of a PRF • symbol a 2 • location i Good example: Reed Solomon • Verifiers picks random location i reads symbol ’a’ and tag t – Check whether t=Fk(a,i) and rejects if not • Decoding process removes all inappropriate tags and uses the decoding procedure of C Memory Checker Authenticator If there exists an online memory checker with – space complexity s(n) – query complexity t(n) then there exists an authenticator with – space complexity O(s(n)) – query complexity O(t(n)) Idea: Use a high-distance code Improve the Information Theoretic Upper Bound(s)? Maybe we can use: Locally Decodable Codes? Locally Testable Codes? PCPs of proximity? The Lower Bound Theorem 1 [Tight lower bound]: For any online memory checker secure against a computationally unbounded adversary s(n) x q(n) = (n) True also for authenticators Memory Checkers and One-Way Functions Breaking the lower bound implies one-way functions. Theorem 2: If there exists an online memory checker: – Working in polynomial time – Secure against polynomial time adversaries – With query and space complexity: s(n) x q(n) < c · n (for a constant c > 0) Then there exist functions that are hard to invert for infinitely many input lengths (“almost one-way” functions) This Talk: • Not say much about the proof – It is involved • Initial insight: connection to the simultaneous message model Simultaneous Messages Protocols [Yao 1979] x {0,1}n ALICE mA CAROL y {0,1}n BOB f(x,y) x=y? mB • For the equality function: – |mA| + |mB| = (√n) – |mA| x |mB| = (n) [Newman Szegedy 1996] [Babai Kimmel 1997] Ingredients for Full Proof: • Consecutive Messages Model: Generalized communication complexity lower bound. • Adversary “learns” public memory access distribution: Learning Adaptively Changing Distributions [NR06]. • “Bait and Switch” technique: Handle adaptive checkers. • One-Way functions: Breaking the generalized communication complexity lower bound in a computational setting requires oneway functions. Conclusions for OMC Settled the complexity of online memory checking Characterized the computational assumptions required for good online memory checkers Open Questions: Do we need logarithmic query complexity for online memory checking with computational assumptions? Understanding relationships of crypto/complexity objects Quantum Memory Checkers? LDC Talk Based on: • The Complexity of Online Memory Checking [Naor and Rothblum] • Fault Tolerant Storage And Quorum Systems [Nadav and Naor] • On the Compressibility of NP Instance and Cryptographic Applications [Harnik and Naor] Theme: cases where LDC should be helpful but •Either provably not helpful •Or open problem Goal • Distributed file storage system – Peer-to-peer environment – Processors join and leave the system continuously Want to be able to store and retrieve files distributively • Partial Solutions – Distributed File sharing applications [Gnutella, Kazaa] – Distributed Hash Tables [DH, Chord, Viceroy] • Store (key, value) pairs and perform lookup on key Fault-Tolerant Storage System • Censor – Aims to eliminate access to some files – Can take down some servers • Design Goal: – A reader should be able to reconstruct each file with high probability even after faults have occurred Probability taken over coins of the writer and reader Adversarial Behavior • How are the faulty processors chosen? What is the influence of the adversary • Type of faults – Complete/Partial control Adversarial Model • Adversary chooses the set of processors to crash • Different degrees of adaptiveness – Non adaptive adversary • Choice of faulty processors is not based on their content – Adversary with a limited number of queries • May query some processors • fail-stop failures – We do not consider Byzantine failures Other Fault Models • Random faults model: – Examples: Distance Halving DHT, Chord – Standard technique: • Replication to log(n) processors • Assures survival with high probability • Adversarial faults [Fiat, Saia] – Large fraction accessible after adversary crashes a linear fraction of the processors • Still, a censor can target a specific file Measures of Quality • Read/Write complexity: – Average number of processors accessed during a read/write operation • Number of rounds: – Number of rounds required from an adaptive reader • Blowup Ratio: – Ratio between the total number of bits used for the storage of a file and its size Connection to LDC • If you are willing to have high write complexity: • Can encode ALL the data with an LDC • Parameters of the LDC determine how good the data storage is Probabilistic Storage system based on intersecting quorum system • Storage System: – To store a file: pick a set of size uniformly at random • replicate the file to all members of the quorum set – Retrieval: Choose a random set of size members – Intersection follows from the birthday paradox and probe its Properties of the Probabilistic Storage System • Pros: – Simplicity – Resilient against linear number of faults • Even if the processors are chosen by the adversary adaptively – Adapted to a dynamic environment [Abraham, Malkhi] •Cons: •High read/write complexity •High blowup-ratio Want a storage system with better parameters Non-adaptive readers are wasteful! • Non-adaptive reader: – Processors are chosen without accessing any processor Theorem: A fault tolerant storage system, in the non-adaptive reader model, resilient against (n) faults, cannot do better than the intersecting storage system example. Read Complexity ¢ Write Complexity is (n) Blowup Ratio is (√n) Open Question • Do the lower bounds for the case when both the reader and the adversary are non-adaptive hold when both are fully adaptive? For Effort Talk Based on: • The Complexity of Online Memory Checking [Naor and Rothblum] • Fault Tolerant Storage And Quorum Systems [Nadav and Naor] • On the Compressibility of NP Instance and Cryptographic Applications [Naor and Harnik] Theme: cases where LDC should be helpful but •Either provably not helpful •Or open problem The Problem Is it possible to have an efficient procedure: • Given CNF formulae 1 and 2 on same variables and same length come up with a CNF formula that is: 1. Satisfiable if and only if 1 v 2 is satisfiable 2. Shorter than |1|+|2| Sufficiently short to apply recursively (1-) (|1|+|2|) If no: there is hope for: If yes: There is a construction of Collision Resistant Hash from any encryption one-way function • functions Efficient everlasting in the hybrid bounded storage model • • No “black box” construction of CRH from OWF [Simon98] Forward-Secure-Storage [Dziembowski] Construction uses code of [Dubrov-Ishai] the one-way function Derandomization ofthe Sampling No Witness Retrievable Compression • Given CNF formulae 1 and 2 on same variables come up with a formula that is: 1. Satisfiable if and only if 1 v 2 is satisfiable 2. Shorter than |1|+|2| Satisfying assignment Claim: if one-way functions exist, then a witness for either 1 or 2 cannot yield a witness for efficiently. Most natural ideas are witness retrievable Proof intuition based on broadcast encryption lower bounds I can’t find an algorithm for the problem Find an algorithm that usually works? Maybe I can Solve justit in approximateCould we n time 2 postpone it ? it Solve it for some fixed parameters Approaches for dealing with NP-complete problems: • Approximation algorithms • Sub-exponential time algorithms • Parameterized complexity Garey and Johnson, 1979 • Average case complexity • Save it for the future Verdict on LDCs? Uncompressed paper on compressibility: www.wisdom.weizmann.ac.il/~naor/PAPERS/compressibility.html Compressed version FOCS 2006 THE END Thank You Slides for the Proof of OMC Simultaneous Consecutive Messages Protocols x {0,1}n ALICE mP y {0,1}n mA CAROL x=y? mB BOB Theorem (lower bound for CM protocols): For any equality protocol, as long as |mP| ≤ n/100, |mA| x |mB| = (n) Program for This Talk: • • • • Define online memory checkers Review some past results Describe new results Proof sketch: – Define communication complexity model – Sketch lower bound for a simple case – Ideas for extending to the general case The Reduction Use online memory checker to construct a consecutive messages equality protocol Online Checker Space: s(n) Query: q(n) Equality Protocol Reduction Alice msg: s(n) Bob msg: O(q(n)) Conclusion: s(n) x q(n) = Ω(n) (From communication complexity lower bound) Simplifying Assumption (With loss of generality) Assumption: checker chooses indices to read from public memory independently of secret memory Checker Operation: 1. Get an index i in the original file 2. Choose which indices to read from the public memory, and read them. 3. Get the secret memory 4. Retrieve i-th bit or say BUG The Reduction: Outline Use online memory checker Construct “random index” protocol, Bob chooses random index i: If x = y, then Carol accepts If xi ≠ yi, then Carol rejects Use online checker to build this protocol Use error correcting code Go from “random index” to equality testing: Alice, Bob encode inputs and run “random index” protocol If Alice’s and Bob’s inputs different at even one index, encodings are different at many indices. xi retrieve(i) store(x) Checker Public Memory P(x) Secret Memory S(x) x{0,1}n ALICE S(x) Accept if yi = Cbits i s(n) WANT: An adversary that can find bad x,y for protocol be x=yCan accept CAROL x ≠y reject used to find bad x,P(y),i for memory checker i i i, yi Conclusion n [Weak Theorem]: PROBLEM:y{0,1} Protocol adversary BOB sees randomness!q(n)+1 bits Get random SOLUTION: Re-Randomize! Bits for Carol Cindex i = xi /BUG For “restricted” online memory checkers store(y) Checker Alice re-computes S(x) with different randomness, retrieve(i) s(n) x q(n) = Ω(n) Public Memory P(y) New S(x) independent of public randomness (given P(x)) Secret Memory Memory S(y) S(x) Secret Requires exponential time Alice Program for This Talk: • • • • Define online memory checkers Review some past results Describe new results Proof sketch: – Define communication complexity model – Sketch lower bound for a simple case – Ideas for extending to the general case Recall Simplifying Assumption Assumption: checker chooses indices to read from public memory independently of secret memory Do we really need the assumption? Idea: If checker uses secret memory to choose indices, Adversary learns something about the secret memory from indices the checker reads. Access Pattern Distribution For a retrieve request Access Pattern: Bits of public memory accessed by checker Access Pattern Distribution: Distribution of the checker’s access pattern (given its secret memory) Randomness: over checker’s coin tosses Where Do We Go From Here? Observation: If adversary doesn’t know the access pattern distribution, then the checker is “home free”. Lesson for adversary: Activate checker many times, “learn” its access pattern distribution! [NR05]: Learning to Impersonate. Learning The Access Pattern Distribution Theorem (Corollary from [NR05]) Learning algorithm for adversary: – Adversary stores x, secret memory s – Adversary makes O(s(n)) retrieves, p: Final public memory (after the stores and retrieves) – Adversary learns L, can generate distribution DL(p). – “Real” distribution is DS(p) Guarantee: With high probability, the distributions DL(p) and DS(p) are ε -close. L is of size O(q(n) x s(n)) bits. Guarantee is only for the public memory p reached by checker! store(x), retrieves Checker Public Memory P(x) Secret Memory S(x) x{0,1}n ALICE L Accept s(n) bits if S(x) Run Learneryi = Ci with public coins x=y accept L CAROL xLearned i≠yi reject Soundness: An adversary that finds x≠y s.t. Carol doesn’t bits O(s(n)xq(n)) i, y i n y{0,1} reject, also fools Adversary memory checker BOB Completeness: that finds x s.t. Carol rejects when Run random Learner Get Alice AND Bob’s inputs C = x /BUG withindex same i i icoins q(n)+1 bits are x, also memory checker Bitsfools for Carol store(y), retrieves Checker Does this work??? Access pattern distributions by “real” S and “learned” L are retrieve(i) Public Memory PROBLEM: distributions by “real” S and “learned” L are P(y) close close on P(x). on original P(x)! They be very far on P(y)!learns it! Secret Memory S(x) Protocol adversary seesmay L, checker adversary Secret Memory Learned LS(y) Does it Work? • Will the protocol work when y≠x? • No! Big problem for the adversary: Can learn access pattern distribution on correct and unmodified public memory… really wants the distribution on different modified memory! • Learned information L may be: – Good on unmodified memory (DL(P(x)), DS(P(x)) close) – Bad on modified memory (DL(P(y)), DS(P(y)) far) • Can’t hope to learn distribution on modified public memory Bait and Switch Carol knows S and L, if only she could check whether DL(P(y)), DS(P(y)) are ε-close… If far: P(y)≠P(x) (not “real” public memory)! Reject! If close: OK for Bob to use L for access pattern! Bob always uses L to determine access pattern. This is a “weakening” of the checker. Bait and Switch: Carol Approximates the Distance Main Observation: Carol (computationally unbounded) can compute probabilities of any access pattern for which all the bits read from P(y) are known. (Probabilities by both DL(P(y)) and DS(P(y))) Solution: Sample O(1) access patterns by DL(P(y)), use them to approximate distance between the distributions. In the protocol Bob sends these samples to Carol, she approximates the distance. Putting It Together From any memory checker, we get a CM protocol for equality with: • Public message: length O(s(n) x q(n)) • Alice message: length s(n) • Bob message: length O(q(n)) Conclusion: s(n) x q(n) = (n) Conclusion Settled the complexity of online memory checking Characterized the computational assumptions required for good online memory checkers Open Questions: Do we need logarithmic query complexity for online memory checking with computational assumptions? Understanding relationships of crypto/complexity objects Quantum Memory Checkers?