Final Exam CS 600 Instruction: Answer the following questions in this document or another document and submit it in Canvas according to the Final Exam Procedure. 1. (12 Points) Consider a connected communication network of routers that form a free tree T. Assume the time-delay of a packet transfer from one router to another is determined by multiplying a small, fixed constant by the number of communication links between the two routers. Develop an efficient algorithm, better than O(n3), that computes the maximum possible time delay in the network T. Answer: Find the largest distance between source point S and all other vertices in a weighted directed acyclic network For directed acyclic graphs, the longest path problem has a linear time solution. Find the topological order by initializing the distance to all vertices to negative infinity and the distance to the source point to 0. The topological ordering of the graphs represents the linear order of a graph. When the topological order is found, all the vertices in IT are processed one by one. Update the distance of each processed vertex’s neighbor using the current vertex. Algorithm: FindLongestPath() Input: Tree roots Output: maximum possible time delay 1. Initialize dist[] = {NINF, NINF,....} , dist[s] = 0 . ## s: source point ## NINF: negative infinity. ## Dist: longest distance from the source point to the other point. 2. Create a topological sequence contain all vertices. 3. Repeat the following process for each vertex u in the topology sequence For each adjacent point v of u: If (dist[v] < dist[u] + weight(u, v)) dist[v] = dist[u] + weight(u, v) Time complexity: V and E are the symbol to represent the running time, like n. 1. topological sorting is O (V+E). 2. After finding the topological order, for each vertex, all adjacent points are found by loop , and the time complexity is O (E). 3. Internal loop runs O (V+E) times. 4. Total time complexity of the algorithm is O (V+E). 2. (12 Points) Suppose you are told that you have a goat and a wolf that need to go from a node s to a node t, in a directed acyclic graph G. To avoid the wolf eating the goat, their paths must never share an edge. Design an efficient algorithm for finding two edge-disjoint paths in G, if such path exists, to provide a way for the goat and the wolf to go from s to t without risk to the goat. Answer: Consider the directed graph 'G with two vertices: source s and destination t. We want to find a route that avoid them sharing any edges. In that case, we can formula it as a maximum flow problem using the Ford-Fulkerson technique. First, we consider the supplied source and destination as sources and sinks in a flow network. Assign unit capacity to each edge. Second, we calculate the maximum flow from source to sink using the Ford-Fulkerson procedure. Then the maximum flow equals the number of edge-disjoint pathways. Ford-Fulkerson Algorithm: 1. Initialize the flow total = 0 2. Repeat the following process until no path exists between s and t. a. Run depth first search from the source vertex to find a flow path to the end vertex. b. Let f be the minimum capacity value on the path c. Add f to flow_total 3. For each edge on the path a. Decrease capacity from one side to another b. Increase capacity of the edge from other side to back 3. (12 Points) Consider a graph G and two distinct vertices, v and w in G. Define HAMILTONIAN-PATH to be the problem of determining whether there is a path that starts at v and ends at w and visits all the vertices of G exactly once. Show that the HAMILTONIAN-PATH problem is NP-complete. Answer: We first demonstrate that this belongs to the NP class, and then identify aknown NP-complete issue that can be reduced to a Hamiltonian route. Suppose a graph G, and the Hamiltonian path is solved by non-deterministically selecting the edges from G that should be included in the path. Next, we traverse the path and ensure that each vertex is visited exactly once, which can be done in polynomial time. Therefore, it belongs to the NP class. For identifying an NP-complete issue that can be reduced to a Hamiltonian route. We look for a Hamiltonian cycle in the graph, which is a route that start and finish at the same vertex. We can reduce this to the Hamiltonian route, since the Hamiltonian cycle is NP-complete. We create a graph G0 from the graph G = (V,E) in which G has a Hamiltonian cycle and G0 contains a Hamiltonian route. This is accomplished by selecting an arbitrary vertex u in G and copying it along with all of its edges, u0. Then, in the graph, add vertices v and v0 and connect v with u and v0 with u0. Thus, if we start in v, follow the Hamiltonian cycle of G back to u0 instead of u, and eventually end in v0, we get a Hamiltonian route in G0. The route of G0 have both v and v0 endpoints, and it can be converted to a G cycle. The path must have endpoints in u and u0 if we ignore v and v0. And removing u0 causes a G cycle if we close the path back to u instead of u0. The construction will fail if G is a single edge, so it must be handled separately. Therefore, we demonstrated that G contains a Hamiltonian cycle only if and only if G0 contains a Hamiltonian path, completing the proof that Hamiltonian Path is NP-complete. 4. (12 Points) Suppose we are given an undirected graph G with positive weights on its edges and asked to find a tour that visits the vertices of G exactly once and returns to the start to minimize the cost of maximum-weight edge in the tour. Assuming that the weights in G satisfy the triangle inequality, design a polynomial-time 3-approximation algorithm for this version of traveling salesperson problem. Note that this version of TSP is different than the 2-approximation for METRIC-TSP in Section 18.1, where G is assumed to be a complete graph. Hint: Show that it is possible to turn an Euler-tour traversal, E, of an MST for G into a tour visiting each vertex exactly once such that each edge of the tour skips at most two vertices of E. Answer: Let S be the sequence of vertices v1 to vn. Check the following requirements in this traveling salesperson problem: 1. Every edge traversed by adjacent vertices is an edge in G 2. every vertex in G is in V(vertices), and every node has been traversed. To demonstrate that the TSP is NP-hard, we have to show that NP reduces TSP in the polynomial time. We can prove it using the Hamiltonian Cycle. HC is NPhard and every problem in NP reduces to HC in polynomial time because every HC is NP complete. Since the sum of two polynomials is still a polynomial, so if we reduce HC to TSP in polynomial time, then we will show every vertex in NP reduces to TSP in polynomial time. We already demonstrated that given a graph G=(V,E), a simple cycle in G existsthat travels each vertex precisely once in answer 3. For a simple cycle with n vertices and n edges. We use the following algorithm to decrease HC to TSP: 1. For G=(V,E), set all edge weights to 1 and let k = |V|=n ## n is the number of nodes in G. 2. The weight of edges that are outside the G at begining is 2 3. test TSP using the changed graph, if exists a Tour on G with a cost of at least k 4. If the answer of TSP is yes, the answer of HC is yes. If TSP is NO, HC is NO As we mentioned before, reduction takes polynomial time, we will demonstrate that HC solutions are in 1-1 correspondence with TSP solutions using the reduction. There are two possible answers to this question: 1. HC has one => TSP has a YES. Thus the simple cycle C exists that visits each node precisely once, creating n edges to C. For a tour of weight n, each node weighs 1 and k = n. So TSP has a YES answer, 2. HC has a NO = > TSP has a NO. Then there is no simple cycle C that visits every node precisely once, creating n edges to C. Assume TSP's response is yes. Then there's a tour that goes around every vertex once with a maximum weight of k. Suppose there is a loop with a weight of n, then the loop needs to traverse each node whose weight is 1 and k=n. This proves that edges exist in the HC graph. This is where HC comes into play. Thus, TSP is both NP and NP-Hard, and it is NP-Complete, as requirement. 5. (12 Points) Suppose we have a Monte Carlo algorithm, A, and a deterministic algorithm, B, for testing if the output of A is correct. How can we use A and B to construct a Las Vegas algorithm? Also, if A succeeds with probability ½ and both A and B run O(n) time, what is the expected running time of the Las Vegas algorithm that is produced? Answer: Las Vegas Algorithm: a randomized algorithm that always produces the correct result but whose running time is determined by random events (probabilistic). Monte Carlo Algorithm: a randomized algorithm that always has a deterministic running time but whose output may be incorrect with some probability. We will demonstrate that the estimated run-time is <=O(n log n): First, suppose we have a (Monte Carlo) randomized algorithm called WeirdSort that accepts an unsorted array I of elements as input and returns another array O. The run time of algorithm is T(n), and the output O is always a permuted version of the input I. In this case, we know that Pr[sorted output O] >= p. We can summarize that the algorithm runs correct with probability at least p. The most basic concept works: run the algorithm once, see if the result is sorted, and if not, repeat (with fresh randomness, so that the outcome is independent of the previous rounds). To execute the algorithm once will take T(n) time. To check if the output is sorted in one "round" takes O(n) time. Thus, each run of the algorithm succeeds is independent with a probability of at least p. And the expected number of it before we succeeds is 1/p. It takes O(n) time to run the algorithm once, and O(n) time to check if the output is sorted. Each run succeeds with the probability of 1/2, so the expected running time is O(nlogn) and the worst case is O(n2). 6. (12 Points) Let S be a set of n intervals of the form [a, b], where a < b. Design an efficient data structure that can answer, in O(log n +k) time, queries of the form contains(x), which asks for an enumeration of all intervals in S that contain x, where k is the number of such intervals. What is the space usage of your data structure? Hint: Think about reducing this to a two-dimensional problem. Answer: We can use the priority search tree to solve this problem. The input of the priority search tree is (x1, x2, y1) and the output is a set of (x, y) that satisfy x1 ≤ x ≤ x2 and y1 ≤ y. For this problem, we use all intervals in S as 2 dimensional items to build the priority search tree. The input should be x and the output should be all intervals [a, b] that contain x. We have x ≥ a and x ≤ b. So we should let y1 be x, x2 be x as well, and let x1 be −∞. Then if the input of priority search tree is x, then the output of it should be all intervals than contain x. The priority search tree has n nodes and if the number of output is k, the running time of this algorithm is the same as using PST search to answer three-sided range queries, which is O(log n + k). The space usage of the data structure is O(n), as shown in our text book. 7. (10 Points) Given a set P of n points, design an efficient algorithm for constructing a simple polygon whose vertices are the points of P. Answer: We can use Graham's scan algorithm to create a simple polygon whose vertices are the points of P. For the points in the range of (0, n-1): 1. We first locate the points at the y-bottom most coordinate. If two points with the same y-coordinate value, select the one with the smaller x-coordinate value. Make and place the lowest point A0 in the first position in the output hull. 2. In the counterclockwise sequence around points [0], sort the remaining n-1 points by polar angle. If two points have equal polar angles, place the closest point at first. 3. Check if two or more points have the same angle. If yes, select the one that is farthest away from A0. Let the array's new size be m. 4. Return, if we can use m to create a triangle. 5. Create an empty stack and add the points p0, p1, and p2 to it. 6. Identify the fourth vertex by processing the remaining points (which is the minimum requirement for a polygon). 7. Remove points from the stack until it is not oriented counterclockwise. Thus, we will have a point near to the stack's top, b point at the stack's top, and c push points. Time Complexity: The approach takes O(n log n) time for n input points. 8. (10 Points) DNA strings are sometimes spliced into other DNA strings as a product of re-combinant DNA processes. But DNA strings can be read in what would be either the forward or backward direction for a standard character string. Thus it is useful to be able to identify prefixes and their reversals. Let T be a DNA text string of length n. Describe an O(n)-time method for finding the longest prefix of T that is a substring of the reversal of T. Hint: Consider using a prefix trie. Answer: At first we should compute the reversal of the DNA text string T, and denote the reversal as T’, which can be done in O(n) time. Then we build a suffix trie with T’ in O(n) time as shown in the text book. And we can use suffix trie to find the longest prefix of T that is a substring of T’, which can be done in O(n) time in the worst case. Each node at most has 4 nodes because of the type of DNA is 4. For the first character in T, we search it in the suffix trie, if it is in a node of the suffix trie, we continue to find the next character in T starting from the current node. When we compare all the characters in the current node and all of them match, we move to it child node and repeat until there is no match any more. So in the worst case, the running time is O(n). So the total time of this algorithm is O(n). 9. (8 Points) Solve the following linear program using the Simplex Method: Maximize: 18 x1 + 12.5 x2 Subject to: x1 + x2 ≤ 20 x1 ≤ 12 x2 ≤ 16 x1 , x2 ≥ 0. Answer: We first convert it to slack form. z = 18x1+12.5x2 x3 = 20−x1−x¬2 x4 = 12−x1 x5 = 16−x2 x1, x2, x3, x4, x5 > = 0 The variable x1 has a positive coefficient in the objective function. With respect to x1, the second constraint is the most restrictive. The objective value is z = 0. We can substitute x1 with x1 = 12 − x4. We get: z = 216+12.5x2−18x4 x1 = 12−x4 x3 = 8−x2+x4 x5 = 16−x2 x1, x2, x3, x4, x5 > = 0 x2 has a positive coefficient in the new objective function, and now the first constraint is the most restrictive. The objective value is z = 216. We replace it by x2 = 8 − x3 + x4 and substitute x2 . z = 316−12.5x3−5.5x4 x1 = 12−x4 x2 = 8−x3+x4 x5 = 8+x3−x4 x1, x2, x3, x4> = 0 all coefficients are negative now, so the optimal solution is x1 = 12 and x2 = 8. Thus, the final objective value is z = 316.