Outline: - Give outline of talk o Outline: Intro to complexity NP-completeness PCPs Remarks o Note: Venkat Guruswami is teaching a class on this topic in the fall in CSE. o Note: Not related to P vs. NP. o References: See survey by Guruswami for outline of proof of PCP Theorem. See survey by Trevisan for applications to hardness of approximation. Orginal proofs in Arora and Safra, and Arora, Lund, Motwani, Sudan, and Szegedy. o Caveat: I’m no expert. - computational problems as languages o Def: sigma* is the set of all binary strings o Def: A language is just a subset of Sigma*. Computational complexity tries to classify languages. o Def: A turing machine M runs in polynomial time if there exists a constant c>0 such that on inputs of size n, M runs for at most O(nc) steps. o Def: P = { L subset Sigma* | exists a TM M that decides L in poly time. } o comment: we show membership in P by giving efficient algorithm - NP as polynomial time verification o NP stands for non-deterministic polynomial time, formulated in terms of nondeterministic turning machines. o Canonical problem: Def: SAT = { phi | phi is a Boolean formula with a satisfying assignment } require that formula is AND of clauses that are OR of 3 literals o We’ll look at NP as languages where solutions can be quickly verified. graph coloring is in NP: just give the coloring CLIQUE in NP: just give the clique o Each time, we’ve given a “certificate” showing that the instance is in the language. o Def: L in NP iff there exists a polynomial time verifier V s.t. for all x in L, there exists c in Sigma* such that V(x, c) accepts, and for all x notin L, all c in Sigma*, V(x, c) rejects o Adversarial model o draw ven diagram - NP-completeness o Def: An input to 3SAT is a conjunction of clauses, each of which is a disjunction of 3 literals. x in 3SAT iff there is a setting of variables that satisfies formula. examples o Cook-Levin Theorem: If you can solve 3SAT, you can solve anything in NP. Shown by turning a polynomial sized verifier computation into a large logic formula that checks the correctness of the computation. o Def: L is NP-complete iff L in NP for all L’ in NP, exists a polytime f such that x in L’ f(x) in L x notin L’ f(x) notin L o intuition: If we could decide L, we could decide all of NP o Surprising result, that make whole field important: many, many important problems all turn out to be equivalent—solving one would solve all problems in NP. ex: many graph problems—clique, coloring, min cut; lattice problems—shortest vector, closest vector, coding problems, constraint problems—3SAT, polynomial zeros, integer programming. o update Ven Diagram - Definition and intuition of PCPs o Intuition: in lots of algorithms randomness seems to help. (ex: primality testing.) Can we use randomness to speed the verification process? If we only want probabilistic guarantee of correctness, can we read less of the certificate? o Def: An (r, q) restricted verifier is a polytime verifier V that on input of size n uses r(n) random bits, and reads only q(n) bits of the witness. o Describe interaction with prover. o Def: PCPe[r(n), q(n)] is the set of languages L such that there exists an (r, q) restricted verifier V s.t. for all x in L exists c in Sigma* s.t. Pr(V(x, c) accepts) = 1, for all x notin L, for all c in Sigma*, Pr(V(x, c) accepts) < e. o Comments: NP = PCP[0, poly(n)] Think of PCPs as an adversarial prover trying to fool the verifier into accepting. We require e to be constant. Achieving arbitrarily close to 1 is easy and not useful. Error can be reduced exponentially by repeating certificate l times, with l*r randomness and l*r queries. o A prover/verifier system Prover: for each clause, writes down assignment to 3 variables in clause. Then writes down assignment to all n variables separately. Verifier: picks random clause, and random variable in that clause. Accepts if clause satisfied and assignments consistent. o If x in 3SAT then prover can use satisfying assignment, and Pr(V accepts)=1. If at most an epsilon fraction of clauses can be satisfied then Pr(V accepts)< epsilon. - this is an (O(log n), 4)-restricted verifier problem is epsilon could be arbitrarily close to 1 for large inputs The PCP Theorem: NP = PCP[O(log n), O(1)] o This is very surprising—no matter how large the input, we can verify membership looking at only a constant amount of information. think of examples from NP-complete problems o Optimal Result [Hastad, Guruswami]: NP=PCP[O(log n), 3], with soundness ½ + epsilon. Very surprising! - PCPs via polynomial codes o intuition: want to be able to get a sufficient amount of information, but only reading a portion of the input. This is reminiscent of error-correcting codes: a large portion of input may be corrupted, but we can still read message. Indeed error correcting codes are the foundation of PCPs. o Codes based on polynomials over a finite field have very nice properties: A degree d polynomial over a finite field has is completely determined by d+1 points. If the polynomial is non-zero, then it is non-zero on many (|F|-d) points o Quick outline: can encode SAT as polynomial over finite field. Checking that all clauses satisfied becomes checking if zero on some subset of domain. If polynomial is low degree, and random point evaluates to zero, then very likely entire polynomial is zero. Prover gives table of polynomial evaluated at all points. Verifier picks a random point. Two main problems: Assumes low-degree Doesn’t give constant queries o low-degree extension can encode a length n bit string in a polynomial of degree n. Assosiate each x_i with a field element h_i, and require P(h_i) = x_i. By reading ANY n values of polynomial we can interpolate and reconstruct values x_1...x_n. o Arithmetization of 3SAT: ex: (x or y or not z) (A(x)-1)(A(y)-1)(A(z)) = 0 iff one of x,y,z is true, ie 0 iff clause satisfied combine all clauses into one big polynomial. Prover provides a table with the value of the polynomial at each point. Want to check quickly that entire polynomial is 0. Roughly, a random point will be zero only if entire polynomial is zero. o Missing steps (a lot): All that we’ve said holds for low-degree polynomials. But we’re asking prover for P(x), no guarantee that prover is really returning the evaluation of a low-degree polynomial. Very simple tests for low-degree, with very sophisticated analysis. Can get PCP[log n, log n] with this, to get constant we build a PCP[ poly(n), O(1) ] system, and then combine the two very carefully. - PCPs via expander constructions o Above construction very complicated. Spread over many papers, full selfcontained proof would be book length. o This year Irit Dinur proved PCP theorem by completely different methods. Mostly self contained in 15 pages. Much much prettier. o Uses expander graphs, which have been very popular recently in CS. - Relationship to locally testable codes. - Hardness results from PCPs o Introduce hardness of approximation knowning that all these combinatorial optimization problems are NP-hard, and thus unlikely to have efficient solutions, a natural next question is, can we approximate the solutions quickly? ex: 3SAT: a random assignment satisfies an expected 7/8 of the clauses. This gives a 7/8 approximation algorithm. Can we do better? PCP theorem implies there is some limit. Hastad proved 7/8 is the optimal. o great thing about PCP[O(log n), O(1)]: there are a polynomial number of question sets. For each random string, there’s a simple constraint that must be satisfied. So we can represent the PCP structure explicitly and use it in reductions. o Independent Set: Given a graph G=(V, E), find maximum I subset V such that not two vertices in I are adjacent. Reduction from PCP to Ind. Set: Make a vertex for each pair of random strings and answers that cause verifier to accept. (r determines queries asked.) Connect two vertices if the answers are inconsistent. This is all polytime. If x in SAT then no inconsistent answers, so |I|>2^r. If |I|>epsilon 2^r, then there are a set of consistent answers that cause V to accept, so x in SAT. Therefore, if we can determine Ind. Set. to within 1/epsilon factor, we can decide if x in SAT. So approx. is hard. Can repeat a constant number of times, showing any constant approx is NP-hard. Using expander graphs, we can show nc hard for some c.