Presented by Yuval Shimron Course 236801 1.12.2010 Find solutions to sub-cases of the Subgraph Isomorphism Problem in polynomial time. Find More efficient solutions to some subcases that already had polynomial time solutions. Find simple paths and cycles of specific length k. This was the initial goal of the authors… 2 (1) For a fixed k, if G=(V,E) contains a cycle of length k it can be found in O(Vω) expected time or O(VωlogV) worst-case time (ω<2.376 is the exponent of matrix multiplication). (2) For a fixed k, if a planar graph G=(V,E) contains a cycle of length k it can be found in O(V) expected time or O(VlogV) worst-case time (Applies also to any non-trivial minorclosed family of graphs). 3 (3) If G=(V,E) contains a subgraph isomorphic to a bounded tree-width graph H=(VH,EH) where |VH| = O(logV), then such a subgraph can be found in polynomial time. Was not previously known even if H were just a simple path of length O(logV). Shows that the LOG PATH problem is in NC (and not just in P). 4 Randomized method Vertices are randomly colored using k = |VH| colors. If |VH| = O(logV), then with a small (but only polynomial small) probability all the vertices of the (isomorphic to H) subgraph are colored in distinct colors. Makes the task of finding this ‘color-coded’ subgraph much easier. ▪ Be patient… 5 De-randomized algorithm? Needs a family of colorings of G, such that every subset of k vertices of G is assigned with distinct colors by at least one of these coloring. ▪ In other words, a family of perfect hash functions from {1, 2, …, |V|} to {1, 2, …, k}. Only “small” loss of efficiency. 6 If acyclic – simple O(E) time for a simple algorithm. So eliminate cycles: Choose a random permutation Build G ' V , E ' by using : ▪ Direct the edges: u, v E : . u v u, v E ' v u v, u E ' 7 Every directed path of length k in G’ is a simple path of length k in G. Every simple path of length k in G has a 2/(k+1)! chance of becoming a directed path in G’. So if no path of length k was found in G’ repeat the process. The expected number of times this process is repeated is at most (k+1)!/2. 8 So we get O(E(k+1)!) time complexity. This is also the result for the directed case. ▪ Delete edges that don’t agree with . Use the following fact + DFS to reduce it to O(V(k+1)!) for the undirected case: Every graph with V vertices and at least k|V| edges contains a path of length k. So first run a DFS on the original graph. Apply the above algorithm only if no vertex of depth k was found (answered in O(k|V|) time). 9 Choose random acyclic orientation G’. Raise the adjacency matrix of G’ to the power of k-1 using O(logk) matrix multiplications. This gives all the pairs of vertices connected by a path of length k-1. Check if any of these pairs are connected. If so . If not, repeat the process. ▪ Expected number of at most k!/2 time. Complexity: O(k!(logk)Vω)=O(Vω) for a fixed k. 10 To find a path of length k-1 in a graph G we can choose a random coloring of the vertices of G in k colors. Every simple path of length k-1 in G has a chance of k!/kk > e-k to become colorful. Each vertex is colored with a different color. We can find it using lemma 3.1. 11 Use Color-Coding to find a colorful path of length k-1 in 2O(k)E worst case time (if exists). Actually it finds a path of length k that starts at a specific vertex s. ▪ but we can always add some vertex s to G (with a new color). The algorithm uses a given (random) coloring c : V {1, 2, … k} The algorithm uses a dynamic programming approach. 12 Suppose we’ve found for each vertex v the sets of colors on colorful paths of length i that connects s and v. k A collection of at most i color sets. For that we only need to record the color sets appearing on i-length paths. And not the path themselves… We inspect every color set C of that collection. 13 We also inspect every edge (v,u) in E. If c u C we add C c u to the collection of u that corresponds to colorful paths of length i+1. The graph G contains a colorful path of length k-1 iff the final collection, corresponds to paths of length k-1, of at least one vertex is non-empty. 14 The number of operations is at most k k O i E O k 2k E . i 0 i The proof holds for both directed and undirected graphs. 15 We can find all pairs of vertices connected by path of length k-1 in 2O ( k ) VE or 2O ( k ) V worst case time. To get 2O ( k ) VE time simply run 3.1 algorithm |V| times, from each vertex of G=(V,E). Use recursive approach to get 2O ( k ) V time. 16 Keep all partitions of {1,2,…,k} into two subsets C1,C2 of size k/2 each. There are k 2 such partitions. k 2 k For each partition, split G into two graphs derived from C1, C2 coloring. Recursively find pairs of vertices connected by paths of k/2-1. Store the results in Boolean matrices A1,A2. 17 Define B to be a Boolean matrix of adjacency relations between V1,V2 vertices. Compute A1BA2. You get all pairs connected by paths of length k-1 ▪ First k/2 vertices are colored by colors from C1 ▪ Last k/2 vertices are colored by colors from C2 By OR-ing all the matrices obtained from all the partitions you get your answer. Time complexity? 18 A simple path of length k-1 in a directed / undirected graph G=(V,E) can be found (if exists) in: O k 2 V expected time for undirected graph. ▪ DFS… 2 O k E expected time for directed graph. A simple cycle of size k in a directed / undirected graph G=(V,E) can be found (if exists) in either Ok O k 2 VE or 2 V expected time. Simply use lemma 3.2. 19 The previous randomized algorithms can be derandomized with a loss of efficiency. Extra logV factor to the complexity. What we need is a family of k-perfect hash functions from {1, 2, …, |V|} to {1, 2, …, k}. If we use these hash functions we know that for every subset of k vertices there exists a coloring that gives each vertex in it, a distinct color. 20 There exists an algorithm that constructs a k-perfect family of hash functions from {1, 2, ..., n} to {1, 2, ..., k}. But its size is 2O k log 2 n . There also exists an algorithm that constructs a k-perfect family of hash functions from {1, 2, ..., n} to {1, 2, ..., k2} that its size is O1 k log n . 21 So we use 2-level hashing: Mapping from {1, 2, ..., n} to {1, 2, ..., k2} by using the second algorithm. Mapping from {1, 2, ..., k2} to {1, 2, ..., k} by using the first algorithm.. And we get just the promised extra O(logV) time. The value of each element can be evaluated in O(1) time. 22 Use k-perfect hash coloring functions. Choose a random coloring (ant not a permutation) c : V --> {1, 2, … k} Remove edges (u,v) s.t. c v c u 1 . Direct remaining edges (u,v) from u to v. Again G’, the obtained graph, is acyclic. Simple path of length k in G has a probability of 2k-k to become a directed path in G’. Different from the Color-Coding method. 23 An undirected graph G is d-degenerate if every subgraph of it has a vertex of degree at most d. Smallest such d is called the degeneracy or the max-min degree of G. Maximum over the minimum degrees of all sub-graphs of G. If G is d-degenerate then clearly E d V . 24 Let G be a connected undirected graph. An acyclic orientation of G=(V,E) such that for every v we have dout v d G can be found in O(E) time. 25 A graph H is a minor of undirected graph G if it can be obtained from G by the removal and the contraction of edges. A family C of graphs is minor-closed if a minor of any graph in it is also a member of the family. If such C is non-trivial then all graphs in C are of bounded degeneracy. dC s.t. G C : d G dC . 26 Consider the family of planar graphs Cplanar It is minor-closed. Each planar graph has a vertex whose degree is at most 5. dC planar 5 . 27 Let C be a non-trivial minor-closed family of graphs and let k 3 be a fixed parameter. There exists a randomized algorithm that given an undirected graph in C finds a Ck - cycle of size k in it if one exists, in O(V) expected time. Proof: Let G = (V,E) be a graph in C that contains a Ck. Choose a random coloring c : V -> {1, 2, 3, …, k}. Ck is considered well-colored if colored in a consecutive way by the colors 1, 2, …, k. 28 The Ck in G has a chance of 2/kk-1 to be well-colored. Can we find it efficiently? Yes, but with some probability… Assume that the degeneracy of C is d = O(1). We describe a randomized algorithm that given a coloring c, finds Ck with probability of 1/(2d)k. Combining both gives a probability of at least 2 2d k so the expected time is O 2dk V . k k 1 k 29 We can assume all edges of G connect vertices that are colored by consecutive colors (mod k). Edges that don’t may be safely removed. We orientate the graph so that the out-degree of all the vertices is at most d. This takes only O(V) time. The algorithm tries to find the edge that connects the vertices in Ck colored by k and k-1: vk ,vk-1. It “flipps coins” to guess it’s orientation and index – 2d possible combinations. 30 For each guess of such index i If the orientation is from vk-1 to vk: ▪ All edges that leave vk-1 but whose index is not i are removed. Otherwise does the opposite. ▪ (for edges that leave vk) Result is the graph G’ that contains a Ck with a probability of at least 1/(2d). A forest of rooted stars. 31 Each such star is contracted into a single vertex and assigned with the color k-1. The obtained graph is denoted by G’’. G’’ contains a well-colored Ck-1 iff G’ contains a well-colored Ck. Since each edge of G’ and therefore G’’ connects consecutively colored vertices. G’’ is also a graph in the minor-closed family C. So we recursively look for Ck-1. 32 It will take us O((k-1)V) expected time. And yields Ck-1 with a probability of at least 1/(2d)k-1. Obviously it’s easy to reconstruct Ck from Ck-1. We can stop the recursion when k=3 and use an existing algorithm for finding triangles in a general graph in O E d G time. Any triangle in a three-colored graph is well- colored. O E d G is O V in our case. 33 There exists a determinist algorithm that given a graph in C, finds Ck if exist, in O(VlogV) WC time. Proof: Instead of using random coloring we exhaust a list of kO(k)logV colorings that has this property: ▪ Every sequence of k vertices is consecutively colored by 1,2,…,k by at list one coloring of the list. Instead of guessing the direction and index of each edge in the Ck we exhaust for each coloring all the (2d)k possible choices. ▪ If G contains a Ck then at least one Ck will be found this way. 34 A graph G1 is said to be isomorphic to a graph G2 if there exists a bijection: f : V(G1) -> V(G2) such that any two vertices u and v of G1 are adjacent in G1 iff ƒ(u) and ƒ(v) are adjacent in G2. 35 Let F be a directed/undirected forest on k vertices. Let G be a directed/undirected graph. A sub-graph isomorphic to F can be found if exists in: O k 2 E expected time in the directed case. O k 2 V expected time in the undirected case. 36 Proof: Start as usual, by choosing a random coloring: c : V -> {1, …, k} of G. With a probability of at least e-k the copy of F in G becomes colorful. ▪ Meaning, each vertex is assigned with a different color. Suppose that F is composed of l (directed) trees T1, T2, …, Tl with k1, k2, …, kl vertices each. Let Fi be the (directed) forest composed of T1, T2, …, Ti. 37 For each 1 i l we find the color sets that appear on colorful copies of Ti in G. Note that copies of Ti , Tj with disjoint color sets are necessarily disjoint. Then, in 2O(k) time we find the color sets that appear on colorful copies of Fi for 1 i l . If the collection corresponding to F=Fl is not empty then G contains a colorful copy of F. How do we find it…? 38 How do we find the color sets that appear on colorful copies of Ti in G? Let t be an arbitrary vertex in Ti=T. For each vertex v in G we find the color sets that appear on copies of T in which v plays the role of r. If T is a singe vertex then it’s easily done… Otherwise let e=(r,r’) be a (directed) edge in T. ▪ We break T into two (directed) sub-trees T’, T’’. 39 We recursively find, for each vertex v in G, the color sets in copies of T’ and T’’ in which v plays the role of r and then of r’. For every (directed) edge (u,v) we update u’s collection with v’s collection if they are disjoint. The complexity of this recursive algorithm is O k 2 E as required. For the undirected case we use the fact that a graph with at least k|V| edges contains as a subgraph any forest on k vertices. i 40 Remember tree-width of a graph G? The minimum tree-width over all possible tree- decomposition of G to (X,T). T = (I, F) is a tree. X = { Xi : i I} is a set of subsets of V such that: ▪ The union of all Xis equals to V. ▪ For every edge (u,v) of G there exists an i such that u,v are in Xi. ▪ If i, j, k I , and j is on the path from i to k in T then: Xi Xk X j 41 Let H be a directed or undirected graph on k vertices with tree-width t. Let G be a directed or undirected graph. A sub-graph of G, isomorphic to H, if one exists, can be found in 2O k V t 1 expected O k t 1 2 V log V worst case time. time and in Proof is similar to that of Theorem 6.1. So we will skip it... 42 In [RS86b] it is shown that if C is a minor closed family of graphs that excludes at least one planar graph G’ then there exists a (huge) constant cG’ such that every graph in C has a tree-width of at most cG’. So we can use 6.3 wherever |VH| = O(logV) and H excludes at least one planar graph. and decide in polynomial time whether G contains a graph isomorphic to H. 43 As a very special case of Theorem 6.3 we get that the LOG PATH problem is in P A path of logV vertices is a tree. In addition, all the algorithms we described are easily parallelizable. So we get that the LOG PATH problem and other problems are in NC. 44 The Color-Coding method efficiently finds kvertex simple paths, k-vertex cycles, and other small sub-graphs within a given graph using probabilistic algorithms. The Color-Coding method is a good example of demonstrating de-randomization techniques. Algorithms presented can be easily parallelized. Yielding efficient NC algorithms. 45