TDDD65 Introduction to the Theory of Computation Gustav Nordh Department of Computer and Information Science gustav.nordh@liu.se 2012-09-19 Complexity What can be computed efficiently? Complexity What can be computed efficiently? Efficient in terms of what? Energy? Space? Time? Time Complexity Definition The running time or time complexity of a Turing machine M (which halts on all inputs) is the function f : N → N where f (n) is the maximum number of steps that M uses on any input of length n. Time Complexity Definition The running time or time complexity of a Turing machine M (which halts on all inputs) is the function f : N → N where f (n) is the maximum number of steps that M uses on any input of length n. Worst-case complexity Time Complexity Given an algorithm, we would like to have a measure for the running time that is simple (the exact running time is often a very complicated expression) is independent of the machine model we implement the algorithm in (Turing machine/Java/Pascal ...) Time Complexity Given an algorithm, we would like to have a measure for the running time that is simple (the exact running time is often a very complicated expression) is independent of the machine model we implement the algorithm in (Turing machine/Java/Pascal ...) We use the asymptotic running time as our measure Time Complexity: Asymptotics In asymptotic analysis of the running time we try to understand the running time of the algorithm when it is run on large inputs Time Complexity: Asymptotics, big-O notation Time Complexity: Asymptotics, big-O notation Time Complexity: Asymptotics, big-O notation Time Complexity: Asymptotics In asymptotic analysis of the running time we try to understand the running time of the algorithm when it is run on large inputs Time Complexity: Asymptotics In asymptotic analysis of the running time we try to understand the running time of the algorithm when it is run on large inputs Example If the running time of an algorithm/Turing machine is 3n2 + 10n + 200 (n is the length of the input). Then, for large n the running time is “similar” to n2 . Time Complexity: Asymptotics In asymptotic analysis of the running time we try to understand the running time of the algorithm when it is run on large inputs Example If the running time of an algorithm/Turing machine is 3n2 + 10n + 200 (n is the length of the input). Then, for large n the running time is “similar” to n2 . Only consider the highest order term (here 3n2 ) Ignore constant factors Time Complexity: Asymptotics In asymptotic analysis of the running time we try to understand the running time of the algorithm when it is run on large inputs Example If the running time of an algorithm/Turing machine is 3n2 + 10n + 200 (n is the length of the input). Then, for large n the running time is “similar” to n2 . Only consider the highest order term (here 3n2 ) Ignore constant factors 3n2 + 10n + 200 = O(n2 ) Time Complexity: Asymptotics, big-O notation Definition For functions f and g we say that f (n) = O(g(n)) if positive integers c and n0 exist such that for every integer n ≥ n0 f (n) ≤ cg(n) Time Complexity: Asymptotics, big-O notation Definition For functions f and g we say that f (n) = O(g(n)) if positive integers c and n0 exist such that for every integer n ≥ n0 f (n) ≤ cg(n) When f (n) = O(g(n)) then we say that g(n) is an asymptotic upper bound for f (n) Time Complexity: Asymptotics, big-O notation Definition For functions f and g we say that f (n) = O(g(n)) if positive integers c and n0 exist such that for every integer n ≥ n0 f (n) ≤ cg(n) Recall: Ignore constant factors and only consider the highest order term 3n2 + 10n + 200 = O(n2 ) Time Complexity: Asymptotics, big-O notation Definition For functions f and g we say that f (n) = O(g(n)) if positive integers c and n0 exist such that for every integer n ≥ n0 f (n) ≤ cg(n) Recall: Ignore constant factors and only consider the highest order term 3n2 + 10n + 200 = O(n2 ) For example by taking, c = 100 and n0 = 3 Time Complexity: Asymptotics, big-O notation Example Describe a Turing machine that recognizes the language L = {0k 1k 2k | k ≥ 0}. 1 Scan the input from left to right and make sure it is of the form 0∗ 1∗ 2∗ (if it is not, then reject) 2 Repeat if both 0’s 1’s and 2’s remain on the tape 3 Return the head to the left end of the tape 4 Cross of the first 0 and continue to the right crossing of the first 1 and the first 2 that is found 5 Scan the tape and check that there are no 0’s 1’s and 2’s on the tape and accept (should a 0 1 or 2 be on the tape, then reject) Time Complexity: Asymptotics, big-O notation Example Describe a Turing machine that recognizes the language L = {0k 1k 2k | k ≥ 0}. 1 Scan the input from left to right and make sure it is of the form 0∗ 1∗ 2∗ (if it is not, then reject) 2 Repeat if both 0’s 1’s and 2’s remain on the tape 3 Return the head to the left end of the tape 4 Cross of the first 0 and continue to the right crossing of the first 1 and the first 2 that is found 5 Scan the tape and check that there are no 0’s 1’s and 2’s on the tape and accept (should a 0 1 or 2 be on the tape, then reject) t(n) = O(n) + n3 (O(n) + O(n) + O(n)) + O(n) = O(n2 ) Time Complexity: Asymptotics, big-O notation T [1, . . . , n] is an ordered list (increasing order) and we want to determine whether key is in the list. function L OOK U P TABLE(table T [1, . . . , n],key k ) for i from 1 to n do if T [i] = k then return true if T [i] > k then return false return false Time Complexity: Asymptotics, big-O notation T [1, . . . , n] is an ordered list (increasing order) and we want to determine whether key is in the list. function L OOK U P TABLE(table T [1, . . . , n],key k ) for i from 1 to n do if T [i] = k then return true if T [i] > k then return false return false t(n) = n(O(1) + O(1)) = O(n) Time Complexity: Asymptotics, big-O notation function bubblesort (A : list[1..n]) { var int i, j; for i from n downto 1 { for j from 1 to i-1 { if (A[j] > A[j+1]) swap(A[j], A[j+1]) } } } Time Complexity: Asymptotics, big-O notation function bubblesort (A : list[1..n]) { var int i, j; for i from n downto 1 { for j from 1 to i-1 { if (A[j] > A[j+1]) swap(A[j], A[j+1]) } } } t(n) = n(n − 1)(O(1) + O(1)) = O(n2 ) Time Complexity: Asymptotics, big-O notation int summation (int m) { int sum = 0; for (int i = 1; i <= m; i++) { sum = sum + i; } return sum; } Time Complexity: Asymptotics, big-O notation int summation (int m) { int sum = 0; for (int i = 1; i <= m; i++) { sum = sum + i; } return sum; } t(n) is not O(n) Time Complexity: Asymptotics, big-O notation Recall: Definition The running time or time complexity of a Turing machine M (which halts on all inputs) is the function f : N → N where f (n) is the maximum number of steps that M uses on any input of length n. Time Complexity: Asymptotics, big-O notation int summation (int m) { int sum = 0; for (int i = 1; i <= m; i++) { sum = sum + i; } return sum; } Time Complexity: Asymptotics, big-O notation int summation (int m) { int sum = 0; for (int i = 1; i <= m; i++) { sum = sum + i; } return sum; } n = log2 m, so mO(1) = O(2n ). Time Complexity: Asymptotics, big-O notation int summation (int m) { int sum = 0; for (int i = 1; i <= m; i++) { sum = sum + i; } return sum; } n = log2 m, so mO(1) = O(2n ). Assuming a linear time algorithm for addition, we get t(n) = mO(log2 m) = 2n O(n) = O(2n n) Time Complexity Definition Let t : N → R+ be a function. The time complexity class TIME(t(n)) is the collection of all languages that are decidable by an O(t(n)) time Turing machine. Time Complexity What can be computed efficiently? Time Complexity What can be computed efficiently? In terms of the time required We use an asymptotic measure for the worst-case running time Time Complexity What can be computed efficiently? In terms of the time required We use an asymptotic measure for the worst-case running time In terms of this measure, what do we consider to be efficient? Time Complexity: Asymptotics, big-O notation Time Complexity n 2 16 64 n 2 16 64 n log2 n 2 64 385 n2 4 256 4096 n3 8 4096 2.6 · 105 2n 4 6.5 · 104 1.84 · 1019 Time Complexity n n n log2 n n2 n3 2n 2 2 2 4 8 4 16 16 64 256 4096 6.5 · 104 5 64 64 385 4096 2.6 · 10 1.84 · 1019 1.84 · 1019 nano seconds = 2.14 · 105 days (about 584 years) Time Complexity n n n log2 n n2 n3 2n 2 2 2 4 8 4 16 16 64 256 4096 6.5 · 104 5 64 64 385 4096 2.6 · 10 1.84 · 1019 1.84 · 1019 nano seconds = 2.14 · 105 days (about 584 years) Algorithms having running times of the form 2cn , c > 0 (exponential time) are rarely considered to be efficient! Time Complexity: P Definition P is the class of all languages that are decidable in polynomial time on a deterministic Turing machine. So, [ P= TIME(nk ) k Time Complexity: P Definition P is the class of all languages that are decidable in polynomial time on a deterministic Turing machine. So, [ P= TIME(nk ) k efficient computation = P Time Complexity: P Definition P is the class of all languages that are decidable in polynomial time on a deterministic Turing machine. So, [ P= TIME(nk ) k efficient computation = P Real world problems in P seems to be practically solvable on computers All (reasonable) deterministic computational models can simulate each other with only polynomial increase in running time Time Complexity: P Example (PATH) Given a directed graph G and two vertices s and t, is there a path in the graph from s to t? Time Complexity: P Example (PATH) Given a directed graph G and two vertices s and t, is there a path in the graph from s to t? Algorithm I. Generate each sequence of at most n vertices (where n is the number of vertices in G) and check whether the sequence is a directed path from s to t. Time Complexity: P Example (PATH) Given a directed graph G and two vertices s and t, is there a path in the graph from s to t? Algorithm I. Generate each sequence of at most n vertices (where n is the number of vertices in G) and check whether the sequence is a directed path from s to t. The number of such paths is roughly nn , so the running time of the algorithm is exponential in n. Time Complexity: P Example (PATH) Given a directed graph G and two vertices s and t, is there a path in the graph from s to t? Time Complexity: P Example (PATH) Given a directed graph G and two vertices s and t, is there a path in the graph from s to t? Algorithm II. Place a mark on s. Repeat the following until no new vertices get marked: Scan all edges of G and for all edges (a, b) where a is marked and b is not marked, place a mark on b. Finally, if t is marked then accept, otherwise reject. Time Complexity: P Example (PATH) Given a directed graph G and two vertices s and t, is there a path in the graph from s to t? Algorithm II. Place a mark on s. Repeat the following until no new vertices get marked: Scan all edges of G and for all edges (a, b) where a is marked and b is not marked, place a mark on b. Finally, if t is marked then accept, otherwise reject. t(n) = n(O(n2 )) = O(n3 ) Time Complexity: P Example (PATH) Given a directed graph G and two vertices s and t, is there a path in the graph from s to t? Algorithm II. Place a mark on s. Repeat the following until no new vertices get marked: Scan all edges of G and for all edges (a, b) where a is marked and b is not marked, place a mark on b. Finally, if t is marked then accept, otherwise reject. t(n) = n(O(n2 )) = O(n3 ) PATH is in P Time Complexity: P Theorem Any context-free language L is in P See Sipser Theorem 7.16 for the proof Time Complexity: P Example Consider the following algorithm where the input is an undirected graph and we define the size of the input to be the number of vertices of the graph. We assume a representation of the graph such that the existence of an edge beteen two vertices can be determined in constant time. TRIANGLE: For each set of 3 distinct vertices from the graph, check whether the set form a triangle, and in that case output true. If none of the 3-element sets form a triangle, then output false. Time Complexity: P Example Consider the following algorithm where the input is an undirected graph and we define the size of the input to be the number of vertices of the graph. We assume a representation of the graph such that the existence of an edge beteen two vertices can be determined in constant time. TRIANGLE: For each set of 3 distinct vertices from the graph, check whether the set form a triangle, and in that case output true. If none of the 3-element sets form a triangle, then output false. There are n(n − 1)(n − 2)/6 distinct sets of 3 vertices Time Complexity: P Example Consider the following algorithm where the input is an undirected graph and we define the size of the input to be the number of vertices of the graph. We assume a representation of the graph such that the existence of an edge beteen two vertices can be determined in constant time. TRIANGLE: For each set of 3 distinct vertices from the graph, check whether the set form a triangle, and in that case output true. If none of the 3-element sets form a triangle, then output false. There are n(n − 1)(n − 2)/6 distinct sets of 3 vertices t(n) = n(n − 1)(n − 2)/6(O(1) + O(1) + O(1)) = O(n3 ) Time Complexity: P Example Consider the following algorithm where the input is an undirected graph and we define the size of the input to be the number of vertices of the graph. We assume a representation of the graph such that the existence of an edge beteen two vertices can be determined in constant time. TRIANGLE: For each set of 3 distinct vertices from the graph, check whether the set form a triangle, and in that case output true. If none of the 3-element sets form a triangle, then output false. There are n(n − 1)(n − 2)/6 distinct sets of 3 vertices t(n) = n(n − 1)(n − 2)/6(O(1) + O(1) + O(1)) = O(n3 ) Polynomial time Time Complexity: P Example 3COL: This algorithm outputs true if the graph can be properly colored by 3 colors (i.e., if every vertex can be assigned a color (say Red, Blue, or Green) such that no two adjacent vertices get the same color) and false otherwise. For each possible assignment of colors (i.e., Red, Blue, or Green) to all the vertices in the graph, check whether the graph is properly 3 colored, and in that case output true. If none of the assignments is a proper 3 coloring, then output false. Time Complexity: P Example 3COL: This algorithm outputs true if the graph can be properly colored by 3 colors (i.e., if every vertex can be assigned a color (say Red, Blue, or Green) such that no two adjacent vertices get the same color) and false otherwise. For each possible assignment of colors (i.e., Red, Blue, or Green) to all the vertices in the graph, check whether the graph is properly 3 colored, and in that case output true. If none of the assignments is a proper 3 coloring, then output false. There are 3n possible assignments of 3 colors to the n vertices in the graph. Checking whether an assignment of colors is a proper 3 coloring can be done in time O(n2 ). t(n) = O(3n n2 ) (or O(3n ) if we ignore polynomial factors) Time Complexity: P Example 3COL: This algorithm outputs true if the graph can be properly colored by 3 colors (i.e., if every vertex can be assigned a color (say Red, Blue, or Green) such that no two adjacent vertices get the same color) and false otherwise. For each possible assignment of colors (i.e., Red, Blue, or Green) to all the vertices in the graph, check whether the graph is properly 3 colored, and in that case output true. If none of the assignments is a proper 3 coloring, then output false. There are 3n possible assignments of 3 colors to the n vertices in the graph. Checking whether an assignment of colors is a proper 3 coloring can be done in time O(n2 ). t(n) = O(3n n2 ) (or O(3n ) if we ignore polynomial factors) Exponential time Time Complexity: Nondeterministic Turing machines Definition A nondeterministic Turing machine is a decider if all its computation branches halt on all inputs Time Complexity: Nondeterministic Turing machines Definition A nondeterministic Turing machine is a decider if all its computation branches halt on all inputs Definition The running time of a nondeterministic Turing machine (which is a decider) is the function f : N → N where f (n) is the maximum number of steps that the machine uses on any branch of its computation on any input of length n Time Complexity Definition Let t : N → R+ be a function. The nondeterministic time complexity class NTIME(t(n)) is the collection of all languages that are decidable by an O(t(n)) time nondeterministic Turing machine. Time Complexity: NP Definition NP is the class of problems solvable in polynomial time on a nondeterministic Turing machine, so [ NP = NTIME(nk ) k Time Complexity: NP Definition A verifier for a language L is an algorithm V that can verify that w ∈ L with the help of a certificate c. A polynomial time verifier runs in polynomial time in the length of w. Time Complexity: NP Definition NP is the class of languages that have polynomial time verifiers Time Complexity: NP Definition NP is the class of languages that have polynomial time verifiers Definition NP = [ k NTIME(nk ) Time Complexity: NP Definition NP is the class of languages that have polynomial time verifiers Definition NP = [ NTIME(nk ) k Theorem The class S of languages that have polynomial time verifiers equals k NTIME(nk ) See Sipser, Theorem 7.20 for the proof Time Complexity: NP Definition NP is the class of problems for which the correctness of solutions can be verified in polynomial time Time Complexity: NP Definition NP is the class of problems for which the correctness of solutions can be verified in polynomial time LINEQ: Given a system of linear equations over N, is there a solution? Time Complexity: NP Definition NP is the class of problems for which the correctness of solutions can be verified in polynomial time LINEQ: Given a system of linear equations over N, is there a solution? LINEQ is in NP, let the certificate c be a solution. To verify that c is a solution, check that all equations are satisfied by c (which can be done in polynomial time). Time Complexity: NP Definition NP is the class of problems for which the correctness of solutions can be verified in polynomial time LINEQ: Given a system of linear equations over N, is there a solution? LINEQ is in NP, let the certificate c be a solution. To verify that c is a solution, check that all equations are satisfied by c (which can be done in polynomial time). HAMPATH: Given a directed graph G and two vertices s and t, is there a path from s to t that pass through all the vertices of G exactly once? Time Complexity: NP Definition NP is the class of problems for which the correctness of solutions can be verified in polynomial time LINEQ: Given a system of linear equations over N, is there a solution? LINEQ is in NP, let the certificate c be a solution. To verify that c is a solution, check that all equations are satisfied by c (which can be done in polynomial time). HAMPATH: Given a directed graph G and two vertices s and t, is there a path from s to t that pass through all the vertices of G exactly once? HAMPATH is in NP, let the certificate c be a solution (a hamilton path). To verify that c is a solution, check that c is a path from s to t that pass through all vertices of G exactly once (which can be done in polynomial time). Time Complexity: NP By definition P ⊆ NP Time Complexity: NP By definition P ⊆ NP Is P 6= NP? Time Complexity: NP Is P 6= NP? Time Complexity: NP Is P 6= NP? Why do we care? Time Complexity: NP Is P 6= NP? Why do we care? Most of the problems we need to solve are in NP but not known to be in P Time Complexity: NP Is P 6= NP? Why do we care? Most of the problems we need to solve are in NP but not known to be in P Scheduling (university classes, processor instructions ...) Time Complexity: NP Is P 6= NP? Why do we care? Most of the problems we need to solve are in NP but not known to be in P Scheduling (university classes, processor instructions ...) Planning (traveling salesperson, circuit design ...) Time Complexity: NP Is P 6= NP? Why do we care? Most of the problems we need to solve are in NP but not known to be in P Scheduling (university classes, processor instructions ...) Planning (traveling salesperson, circuit design ...) Solving systems of linear equations over the natural numbers ... Time Complexity: NP Is P 6= NP? Why do we care? Most of the problems we need to solve are in NP but not known to be in P Scheduling (university classes, processor instructions ...) Planning (traveling salesperson, circuit design ...) Solving systems of linear equations over the natural numbers ... Factoring and Crypto Time Complexity: NP Is P 6= NP? Why do we care? Most of the problems we need to solve are in NP but not known to be in P Scheduling (university classes, processor instructions ...) Planning (traveling salesperson, circuit design ...) Solving systems of linear equations over the natural numbers ... Factoring and Crypto Concerns a fundamental property of computation and the world we live in Time Complexity: NP Is P 6= NP? Why do most researchers believe that P 6= NP? Time Complexity: NP Is P 6= NP? Why do most researchers believe that P 6= NP? We have failed to prove P = NP Time Complexity: NP Is P 6= NP? Why do most researchers believe that P 6= NP? We have failed to prove P = NP If P = NP, then the world would be a profoundly different place than we usually assume it to be. There would be no special value in “creative leaps”, no fundamental gap between solving a problem and recognizing the solution once it’s found. Everyone who could appreciate a symphony would be Mozart; everyone who could follow a step-by-step argument would be Gauss... -Scott Aaronson Time Complexity: NP, trying to prove P = NP Time Complexity: NP, trying to prove P = NP Recall the proof/algorithm that showed that any nondeterministic Turing machine can be simulated by a deterministic Turing machine: Time Complexity: NP, trying to prove P = NP Recall the proof/algorithm that showed that any nondeterministic Turing machine can be simulated by a deterministic Turing machine: Theorem Every t(n) time nondeterministic Turing machine has an equivalent 2O(t(n)) time deterministic Turing machine Time Complexity: NP, trying to prove P = NP Recall the proof/algorithm that showed that any nondeterministic Turing machine can be simulated by a deterministic Turing machine: Theorem Every t(n) time nondeterministic Turing machine has an equivalent 2O(t(n)) time deterministic Turing machine So, given a polynomial time nondeterministic Turing machine this only gives us an exponential time deterministic Turing machine :-( NP-completeness The general belief is that P 6= NP (there are problems in NP that cannot be solved efficiently (i.e., not in P)) NP-completeness The general belief is that P 6= NP (there are problems in NP that cannot be solved efficiently (i.e., not in P)) It would be good to know which are the most difficult problems in NP, and in particular which problems in NP that are not in P assuming P 6= NP NP-completeness The general belief is that P 6= NP (there are problems in NP that cannot be solved efficiently (i.e., not in P)) It would be good to know which are the most difficult problems in NP, and in particular which problems in NP that are not in P assuming P 6= NP The answer to this is the theory of NP-completeness NP-completeness The general belief is that P 6= NP (there are problems in NP that cannot be solved efficiently (i.e., not in P)) It would be good to know which are the most difficult problems in NP, and in particular which problems in NP that are not in P assuming P 6= NP The answer to this is the theory of NP-completeness NP-complete problems are problems in NP that have the remarkable property that if any single one of them is in P, then P = NP. NP-completeness Stephen Cook (1939-) NP-completeness Leonid Levin (1948-) NP-completeness: Reductions NP-completeness: Reductions Definition A function f : Σ∗ → Σ∗ is a polynomial time computable function if there is some polynomial time Turing machine M that on input w halts with f (w) on its tape NP-completeness: Reductions Definition A function f : Σ∗ → Σ∗ is a polynomial time computable function if there is some polynomial time Turing machine M that on input w halts with f (w) on its tape Definition Language A is polynomial time mapping reducible to language B if there is a polynomial time computable function f : Σ∗ → Σ∗ such that for every w w ∈ A iff f (w) ∈ B The function f is called a polynomial time reduction from A to B NP-completeness: Reductions Definition A function f : Σ∗ → Σ∗ is a polynomial time computable function if there is some polynomial time Turing machine M that on input w halts with f (w) on its tape Definition Language A is polynomial time mapping reducible to language B if there is a polynomial time computable function f : Σ∗ → Σ∗ such that for every w w ∈ A iff f (w) ∈ B The function f is called a polynomial time reduction from A to B If A is polynomial time mapping reducible to B then we write A ≤Pm B NP-completeness: Definition Definition We say that a problem B is NP-complete if 1 B ∈ NP 2 For every problem A in NP, A ≤Pm B NP-completeness: Definition Definition We say that a problem B is NP-complete if 1 B ∈ NP 2 For every problem A in NP, A ≤Pm B Corollary If B is NP-complete and solvable in polynomial time (in P), then P=NP NP-completeness: Definition Definition We say that a problem B is NP-complete if 1 B ∈ NP 2 For every problem A in NP, A ≤Pm B Corollary If B is NP-complete and solvable in polynomial time (in P), then P=NP So, to prove that P=NP it is sufficient to find a polynomial time algorithm to solve a single NP-complete problem. NP-completeness: SAT problem Recall: Variables that can take the values TRUE (1) and FALSE (0) are called Boolean variables The Boolean operations are: AND (∧), OR (∨), and NOT (¬) A Boolean formula is an expression involving Boolean variables and operations NP-completeness: SAT problem Recall: Variables that can take the values TRUE (1) and FALSE (0) are called Boolean variables The Boolean operations are: AND (∧), OR (∨), and NOT (¬) A Boolean formula is an expression involving Boolean variables and operations Definition A Boolean formula is satisfiable if some assignment of 0’s and 1’s to the variables makes the formula evaluate to 1 (TRUE) NP-completeness: SAT problem The satisfiability problem (SAT) is to test whether a Boolean formula is satisfiable. NP-completeness: SAT problem The satisfiability problem (SAT) is to test whether a Boolean formula is satisfiable. SAT = {hϕi | ϕ is a satisfiable Boolean formula} NP-completeness: SAT problem The satisfiability problem (SAT) is to test whether a Boolean formula is satisfiable. SAT = {hϕi | ϕ is a satisfiable Boolean formula} Theorem (Cook-Levin Theorem) SAT is NP-complete NP-completeness: SAT problem The satisfiability problem (SAT) is to test whether a Boolean formula is satisfiable. SAT = {hϕi | ϕ is a satisfiable Boolean formula} Theorem (Cook-Levin Theorem) SAT is NP-complete Corollary SAT ∈ P iff P=NP NP-completeness: SAT problem SAT = {hϕi | ϕ is a satisfiable Boolean formula} Lemma SAT ∈ NP NP-completeness: SAT problem SAT = {hϕi | ϕ is a satisfiable Boolean formula} Lemma SAT ∈ NP Proof. Given a Boolean formula ϕ and an assignment of values to the variables c, we can verify that the formula evaluates to 1 (TRUE) by substituting the variables by the values given by c and simplifying. NP-completeness: SAT problem Theorem SAT is NP-complete Proof idea. We need to show that for any L ∈ NP, L ≤Pm SAT NP-completeness: SAT problem Theorem SAT is NP-complete Proof idea. We need to show that for any L ∈ NP, L ≤Pm SAT L ∈ NP implies that it is decided in polynomial time by some NTM N NP-completeness: SAT problem Theorem SAT is NP-complete Proof idea. We need to show that for any L ∈ NP, L ≤Pm SAT L ∈ NP implies that it is decided in polynomial time by some NTM N We need to simulate N by a Boolean formula NP-completeness: SAT problem Theorem SAT is NP-complete Proof idea. We need to show that for any L ∈ NP, L ≤Pm SAT L ∈ NP implies that it is decided in polynomial time by some NTM N We need to simulate N by a Boolean formula Given N and a string w we construct a Boolean formula ϕN,w that is satisfiable iff N accepts w. NP-completeness: SAT problem Theorem SAT is NP-complete Now, to prove that a problem B is NP-complete, it is sufficient to prove that 1 B is in NP 2 SAT ≤Pm B NP-completeness: SAT problem A Boolean formula is in conjunctive normal form (CNF) if it is a conjunction of disjunctive clauses, for example (x ∨ y ∨ z) ∧ (y ∨ z ∨ y ) ∧ (w ∨ y ∨ x) NP-completeness: SAT problem A Boolean formula is in conjunctive normal form (CNF) if it is a conjunction of disjunctive clauses, for example (x ∨ y ∨ z) ∧ (y ∨ z ∨ y ) ∧ (w ∨ y ∨ x) A formula is in k CNF if each clause contains exactly k literals 3SAT = {hϕi | ϕ is a satisfiable 3CNF formula} NP-completeness: SAT problem A Boolean formula is in conjunctive normal form (CNF) if it is a conjunction of disjunctive clauses, for example (x ∨ y ∨ z) ∧ (y ∨ z ∨ y ) ∧ (w ∨ y ∨ x) A formula is in k CNF if each clause contains exactly k literals 3SAT = {hϕi | ϕ is a satisfiable 3CNF formula} Theorem 3SAT is NP-complete Proof. SAT ≤Pm 3SAT NP-completeness: SAT problem DOUBLESAT = {hϕi | ϕ has at least two satisfying assignments} NP-completeness: SAT problem DOUBLESAT = {hϕi | ϕ has at least two satisfying assignments} Theorem DOUBLESAT is NP-complete NP-completeness: SAT problem DOUBLESAT = {hϕi | ϕ has at least two satisfying assignments} Theorem DOUBLESAT is NP-complete Proof. We give a reduction from 3SAT, i.e., 3SAT ≤Pm DOUBLESAT NP-completeness: SAT problem DOUBLESAT = {hϕi | ϕ has at least two satisfying assignments} Theorem DOUBLESAT is NP-complete Proof. We give a reduction from 3SAT, i.e., 3SAT ≤Pm DOUBLESAT First note that DOUBLESAT is in NP, since given ϕ0 and two variable assignments c1 and c2 we can verify in polynomial time that c1 and c2 both satisfy ϕ0 (substitute the variables in ϕ0 by the values given by c1 (c2 ), simplify and check that ϕ0 evaluates to TRUE. Given a 3SAT instance/formula ϕ, reduce it to ϕ0 = ϕ ∧ (x ∨ x), where x is a new variable not used in ϕ. NP-completeness: HAMPATH HAMPATH: Given a directed graph G and two vertices s and t, is there a path from s to t that pass through all the vertices of G exactly once? NP-completeness: HAMPATH HAMPATH: Given a directed graph G and two vertices s and t, is there a path from s to t that pass through all the vertices of G exactly once? HAMPATH is in NP, let the certificate c be a solution (a hamilton path). To verify that c is a solution, check that c is a path from s to t that pass through all vertices of G exactly once (which can be done in polynomial time). NP-completeness: HAMPATH HAMPATH: Given a directed graph G and two vertices s and t, is there a path from s to t that pass through all the vertices of G exactly once? HAMPATH is in NP, let the certificate c be a solution (a hamilton path). To verify that c is a solution, check that c is a path from s to t that pass through all vertices of G exactly once (which can be done in polynomial time). HAMPATH = {hG, s, ti | there is hamilton path from s to t in G} Theorem HAMPATH is NP-complete Proof. Reduction from 3SAT, i.e., 3SAT ≤Pm HAMPATH NP-completeness Why is it important to know that the problem you want to solve is NP-complete? NP-completeness Why is it important to know that the problem you want to solve is NP-complete? Because it tells you something about what kind of algorithm you should try to come up with NP-completeness Why is it important to know that the problem you want to solve is NP-complete? Because it tells you something about what kind of algorithm you should try to come up with In particular, trying to come up with a fast (polynomial time) algorithm that always works is probably not a good idea... NP-completeness Why is it important to know that the problem you want to solve is NP-complete? NP-completeness What to do when you have to solve an NP-complete problem? NP-completeness What to do when you have to solve an NP-complete problem? Large inputs? NP-completeness What to do when you have to solve an NP-complete problem? Large inputs? Modify the problem NP-completeness What to do when you have to solve an NP-complete problem? Large inputs? Modify the problem Approximation algorithms NP-completeness What to do when you have to solve an NP-complete problem? Large inputs? Modify the problem Approximation algorithms Heuristics NP Is P 6= NP? Why has it not been resolved yet? NP Is P 6= NP? Why has it not been resolved yet? To prove P 6= NP, we need to exclude all possible polynomial time algorithms! NP Is P 6= NP? Why has it not been resolved yet? To prove P 6= NP, we need to exclude all possible polynomial time algorithms! We do not have a very good understanding for the internal structure of NP Summary of (time) complexity The time complexity of a Turing machine is the number of steps the machine takes (in the worst-case) as a function of the size of the input We use big-O notation for running times since we are interested in large inputs and needs something robust P is the class of problems solvable in polynomial time on a deterministic Turing machine NP is the class of problems solvable in polynomial time on a nondeterministic Turing machine NP is the class of problems for which we can verify a solution in polynomial time We believe that P 6= NP but have no idea how to prove it NP-complete problems are the most difficult problems in NP SAT is NP-complete, other problems can be proved NP-complete by polynomial time mapping reductions Summary of (time) complexity Unlike the other two parts of this course, complexity theory is a very active research area! Summary of regular languages Regular languages are the languages recognized by DFAs For every NFA there is an equivalent DFA The subset construction A language can be described by a regular expression if and only if it can be recognized by a DFA GNFA construction, closure properties There are simple non-regular languages Pumping lemma Summary of context-free languages The context-free languages are the languages generated by context-free grammars A context-free grammar is ambiguous if the same string can be derived using two different left-most derivations A language is context-free iff it is recognized by a PDA There are simple languages that are not context-free Summary of computability A Turing machine is a mathematical model of a general computer Church-Turing thesis (anything that can be computed can be computed by a Turing machine) There are “simple” and important algorithmic problems that cannot be solved on computers (undecidability) ATM is undecidable (proof by diagonalization) Other problems can be shown to be undecidable by (mapping) reductions