CSE 830: Design and Theory of Algorithms Final Programming Assignment/Contest Due Monday, April 29th 2002 The Problem: The vertex cover problem takes as input a graph G = (V; E). The goal is to find the smallest subset of vertices (i.e. the smallest V0 in V) such that every edge in E is incident on at least one vertex in V0. This is best understood through an example. Let G be the graph on V = {1; 2; 3; 4} consisting of the edges E = {(1; 2); (1; 3); (1; 4); (3; 4)}. The minimum vertex cover consists of two vertices, specifically vertices 1 and 4. Vertex cover arises when we seek a small subgraph that is representative of the graph. Finding a cover is easy: you can simply take all the vertices. However, the smallest vertex cover can be much less than the total number of vertices - consider a star; the tree where only one vertex has degree > 1. What is known about algorithms for vertex cover? The problem is NP-complete, meaning that it is exceedingly unlikely that you will be able to find an algorithm with polynomial worst-case running time. It remains NP-complete even for certain restricted graphs. However, since the goal of the problem is to find a subset of V, a backtracking program which iterates through all 2n possible subsets of vertices and tests whether each covers all m edges gives an easy O(n m 2n) algorithm. But the goal of this assignment is to find as practically good an algorithm as possible. Input and Output: There are a variety of data files available on the CSE unix system at ~torng/web/830/Contest/Public. There is also a graph generator program available in that directory that you can use to generate test graphs. Each graph is in a format such that the first two lines give the number of edges and vertices, and the rest of the file consists of a pair of vertices per line representing an edge. Vertices are numbered from 1 to n. In a combinatorially explosive problem such as this, adding one to the problem size can multiply the running time by n, so test on the smaller files first. Your program must be able take a single argument of the filename containing the graph to perform the minimum set cover on. You may assume that this file contains a legal graph. Your program must output the size of your vertex cover and then a list of the nodes in your cover. Do NOT include any extra words in your output as this will confuse my correctness checker. Implementation: You will be graded on how fast and clever your program is, not on style. Incorrect programs will receive no credit. A program will be deemed incorrect if it does not find a subset corresponding to a minimum set cover for some system of subsets. You must use C or C++ for uniformity/efficiency. The programs must be able to run under UNIX on the CSE cluster. Writing efficient programs is an iterative process. Build your first solution so you can throw it away, and start it early enough to go through several iterations, especially if you feel your programming background is weak. Use a simple backtracking approach first; don't get fancy until you have something that works. It is possible to get a program working in about 100 lines. Don't forget to use the code optimizer on your compiler when you make the final run, to get a free 20% or so speedup. The system profiler tool, gprof, will help you tune your program. What you turn in: You are to turn in via handin a listing of your program, sample runs, and a README file containing a brief description of your algorithm, any interesting optimizations you included in your program, the largest test files your program could handle in one minute or less of wall clock time, and instructions on how to compile your code. The top five self-reported times / largest sizes will be collected and tested by me to determine the winner. Other Details: Everyone must do this question individually. The idea is to think about the problem from scratch and on your own. If you do not completely understand the definition of minimum vertex cover, you don't have the slightest chance of producing a working program. Don't be afraid to ask for clarification or explanation! What kind of approaches might you consider? Instead of testing every subset of V, you should be able to develop a backtracking algorithm that prunes partial solutions (i.e. when there will be no way to complete a partial cover that could reduce the cost of our best solution to date). Can you order the vertices so that the search is likely to proceed more quickly? Starting off with a good approximate solution may help achieve faster cutoffs. There is room for cleverness in selecting data structures to minimize the time needed to test if a given collection of subsets covers G. I will also include a sample program on the CSE UNIX machines at ~torng/web/830/Contest/Public/graph.exe. This program takes the filename to solve and returns the size only of the minimum cover; it will help you verify your solution. Remember that your graph needs to also say what that cover is. Grading: The assignment has a base value of 10 basic points. In order to get the 10 basic points, you must get your program to solve random graph input instances correctly where the number of nodes in the graph is at least 25. Bonuses: Grid graphs (which can be generated by the graph generator in the directory specified above) can be solved more efficiently if you can recognize that you are dealing with a grid graph. A special case of a grid graph is a path. You get 1 extra credit advanced point if you can handle paths of arbitrary size. You get 2 extra credit advanced points if you can handle grids of arbitrary size. The person whose program handles the largest input instances gets a bonus of 5 extra credit advanced points. Bonuses will be also awarded to those whose programs exceed the mean/median value of the class by at least the standard deviation. Acknowledgements Thanks to Charles Ofria for the contest idea and all the source code. Good luck!