Approximation Algorithms for Stochastic Optimization Exercises #1 Anupam Gupta July 13, 2010 1. Secretary Problems. You are shown n people one-by-one, each person has a numerical value and you want to pick one of them (hopefully with large value). Upon seeing a person (when his value is revealed to you), you can either decide to pick him (in which case the procedure ends), or you can reject him irrevocably and go on to the next person. The length of the sequence is known, but the values are not known—they are non-negative and distinct, but drawn from an unbounded range. Given a (possibly randomized) algorithm A and a sequence σ of people, let A(σ) be the random variable denoting the value of the person picked by the algorithm. (a) Suppose we want to maximize the competitive ratio: minσ E[A(σ)]/ max(σ), where the expectation is over the randomness of the algorithm. Show that any algorithm has an competitive ratio arbitrarily close to zero. Answer: For any deterministic algorithm A, consider what happens on the two sequences hM, 1i and hM, M 2 i of length 2. On both these sequences, when the algorithm sees the first person, it has no idea about the future and hence has to make the same decision. If the algorithm chooses the first person then it has competitive ratio 1/M on the second sequence, else it has a competitive ratio 1/M on the first sequence. Setting M to be arbitrarily large shows that the competitive ratio is arbitrarily close to zero even for n = 2. For randomized algorithms, we cannot hope for an algorithm that does worse than 1/n: the algorithm that chooses a random position i ∈ {1, 2, . . . , n} and picks the person at the ith position achieves a competitive ratio at least 1/n, since it correctly guesses the position of the highest valued person with probability 1/n. To show that no algorithm can do much better, consider the n possible sequences σi := hM, M 2 , M 3 , . . . , M i , 0, . . . , 0i for values i = 1, 2, . . . , n. Suppose that pi is the probability that the algorithm stops and picks the ith person when the sequence is M, M 2 , . . . , M i , . . . , M n . Note that we have P i pi ≤ 1, and hence there exists pk ≤ 1/n. But now, on input sequence σk , the algorithm will pick the person havingPvalue M k only with probability at most 1/n, and hence the competitive ratio is 1/n for M n. pk ·M k + i<k Mk pi ·M i ≤ M k /n+(n−1)M k−1 /n , Mk which is arbitrarily close to (b) Suppose, instead, you assume that the adversary picks just a set S of people/values (not a sequence), and you see this set in a uniformly random order. Hence, the secretary competitive ratio σ S )]/ max(S), where σ S is a random permutation of the set S. Give a simple is now minS E[A(σ algorithm/analysis that achieves a competitive ratio at least 1/4. Can you do better? Answer: Consider the algorithm that rejects the first n/2 people, and picks the first person among the rest who has higher value than the highest value seen among the first n/2 people. The chance that the person with highest overall value lies in the second half and the person with the second-highest value lies in the first half is at least 1/2 × 1/2—in this case we definitely pick the optimal person, and hence this gives a competitive ratio of at least 1/4. The optimal strategy is to reject the first 1/e people, and pick the first person among the rest who has higher value: this gives a competitive ratio of 1/e: there are several proofs online, including one on the Wikipedia page. Suppose, instead of picking just one person, you are allowed to pick several people (as long as the people picked satisfy some “independence” condition). Again, the people/values are chosen by the adversary, but the algorithm sees them in random order. For the following problems, give the algorithms with constant competitive ratio, assuming that you have an algorithm that achieves a secretary competitive ratio of α for the single secretary problem. 1 (a) Each person has one of k colors, and you are allowed to pick up to k people, subject to the constraint that no two people chosen have the same color. (Assume you know how many people ni there are of color i, and you want to maximize the sum of the values.) Answer: Just run the single-secretary algorithm for each color class; this gives an αcompetitive algorithm for the sum. (b) You are allowed to pick any set of at most k people, and again want to maximize the sum of the values of these k people. Answer: One possible solution is: imagine the first n/k people you see in the random ordering to have color 1, the second color n/k people to have color 2, and so on, and run the algorithm from the previous part on this. It remains to show that the expected value acheived by picking the top person from these n/k length blocks is comparable to the value of the top k people over all. To see this, let T be the set of the top k people. For any person t ∈ T , the chance that it is the only person T that lies in its block of length n−1 from n−n/k−1 n−n/k · . . . · n−n/k−k+1 ≥ n/k in a random permutation is n−n/k / = n−1 · n−2 n−k+1 k−1 k−1 k−1 (1 − 1/k) ≥ 1/e. Now, applying linearity of expectation, the expected value of people from T who are the top person in their block of length n/k is 1/e times the total value of elements in T . Now, an α competitive algorithm for the previous part implies an α/e competitive algotithm for this problem. One can do better: R. Kleinberg (SODA 2005) shows an algorithm that has competitive √ ratio 1 − O(1/ k), and this is the best possible. If you are interested in such problems, see the paper on the Matroid Secretary Problem by Babaioff, Immorlica and Kleinberg (SODA 2008), and the many follow-up papers. Also, other papers study combinatorial optimization problems in this model: see, e.g., papers by Karp, Vazirani and Vazirani on Online Bipartite Matching (STOC 1990) and by Meyerson on Online Facility Location (FOCS 2001). 2. Stochastic Two-Stage Vertex Cover (StocVC). We are given a graph G = (V, E), a non-negative first-stage cost c0v for each vertex v ∈ V , a non-negative second-stage cost cSv for each “scenario” S ⊆ E and each vertex v ∈ V , and a probability distribution π over subsets of E. A solution to StocVC is a set C0 ⊆ V of vertices to buy in the first-stage, and if the scenario S ⊆ E is realized (which happens with probability π(S), a set CS ⊆ V of vertices to buy for scenario S, such that C0 ∪ CS is a vertex cover for the graph GS = (V, S). We want to minimize the expected cost X X X c0v + π(S) cSv . v∈C0 S⊆E v∈CS The natural ILP formulation for this problem is: X X X min c0v x0v + π(S) cSv xSv v subject to S⊆E (x0u + xSu ) + (x0v + xSv ) ≥ 1 x0v , xSv (1) v ∈ {0, 1} ∀S ⊆ E, ∀{u, v} ∈ S (2) (3) The first few parts recap in more detail what we did in lecture. (a) Convince yourself that this is an exact formulation of StocVC (i.e., the solutions to this ILP are precisely the optimal solutions to the StocVC instance). (b) Now suppose the integrality constraints x0v , xSv ∈ {0, 1} are relaxed to the fractional constraints x0v , xSv ≥ 0. Show that, given a fractional solution to the LP relaxation, setting each fractional variable to 1 if it is at least 1/4, and to 0 otherwise, gives an integer solution with cost at most 4 times the fractional solution. Infer that this gives a 4-approximation to the stochastic VC problem. Now, let us try to extend the ideas a bit further: 2 (c) Even though the support of π could be exponential, suppose someone solved the LP optimally for you, but they gave you the values of only the x0v variables for all v ∈ V , and none of the xSv values. Can you use these values to output a set C00 , and also give an algorithm that given a scenario S ⊆ E, uses these {x0v } values to output a set CS0 such that the sets (C00 , {CS0 }) form a 4-approximation to the stochastic problem? (Assume that this algorithm can solve an LP with O(n) variables and O(|S|) constraints.) Answer: Given the x0v values, use the set C00 := {v ∈ V | x0v ≥ 1/4}. Now, gives that the scenario S appears, observe that if you fix the values x0v , and solve the LP X min cSv yvS (4) v subject to yuS + yvS ≥ 1 − x0u − x0v yvS ≥0 ∀{u, v} ∈ S (5) (6) the values yv you get back will be precisely the same as the xSv values in the exponentially large LP solution. And hence you can round these the same, by setting CS0 := {v ∈ V | yvS ≥ 1/4}. This gives you the desired algorithm for the second stage. Hence, regardless of the size of the support, it suffices to know only the n fractional values {x0v } to get the 4-approximation. The techniques of Shmoys & Swamy, and those of Charikar, Chekuri & Pal, show how to find values {x0v } that correspond to a solution that has cost at most (1 + ) of the optimal cost, using only black-box sampling access to the distribution π. Their algorithms are randomzed and succeed with probability 1 − δ in time polynomial in the parameters n, −1 , log δ −1 , maxv,S {cSv /c0v }. The ideas are related to those used for the sampling question above. Finally, we can try to improve the approximation ratio: (d) Give an algorithm that achieves an approximation better than 4. In particular, can you give a 2-approximation? Answer: Choose a value α uniformly at random from the interval [0, 1/2]. For each v ∈ V , if x0v ≥ α then add v to C0 : note that v is added to C0 with probability at most 2x0v . For each S, if x0v < α but x0v + xSv ≥ α then add v to CS : note that v ∈ CS with probability at most 2xSv . Finally, note that since (x0u + xSu ) + (x0v + xSv ) ≥ 1 for each edge (u, v), at least one of u or v has its two variables sum up to at least 1/2. Say v does; since α ≤ 1/2, v will be chosen in one of the two stages, and hence will constitute a valid vertex cover. (e) Can you achieve this approximation factor if you only know the x0v values in the first stage? Answer: Same as in part (c). 3. Stochastic Two-Stage Steiner Tree (StocST). Consider a graph G = (V, E) with a root vertex r, T M first-stage edge-costs cM e and second-stage edge-costs ce ≥ ce (hence the inflation is different for different edges, but is independent of the scenario realized). Moreover, we are given a probability distribution π on subsets S ⊆ V . A feasible solution to StocST is a set of edges EM ⊆ E bought in the first-stage, and when the scenario S is realized, another set of edges ET,S ⊆ E bought in the second-stage, such that EM ∪ ET,S connects S to the root vertex r. The expected cost is: X X X cM π(S) cTe e + e∈EM S⊆E e∈CT ,S and the goal is to minimize the expected cost. ∗ (a) Single Edge. If G is a single edge e = {r, v}, when would we buy {e} in the first-stage solution EM ? T Answer: If cM e ≤ p × ce , else it makes sense to wait until Tuesday and incur an expected T cost of p × ce . (b) Path Graph. Now suppose G is a path r, v1 , v2 , · · · , vn . i. If π has support only on singletons: i.e., |S| > 1 =⇒ π(S) = 0, what is the optimal solution? 3 P Answer: For each edge e = (vi−1 , vi ), let pe = j≥i π({vj }) be the probability that some vertex that e separates from the root needs to be connected. Now, for each edge T e, use the algorithm for the single edge. In other words, for each e, if cM e ≤ pe × ce , add e to EM . ii. What is the optimal solution for general distributions (which may be supported on larger sets)? P Answer: Given a general distribution π, let π 0 ({vi }) = S:vi ∈S,S∩{vi+1 ,...,vn }=∅ π(S) be the probability that vi is the demanded vertex furthest from the root. Use the above algorithm with the distribution π 0 . iii. Now to some computational/sampling issues; assume we are in the singleton scenarios case. A. For the case you are explicitly given values π̂(vj ) ∈ (1 ± )π(vj ) for each node vj for some ∈ [0, 1/2], devise a (1 + O())-approximation algorithm. Answer: Given estimates π̂, just use it in the above algorithms, and note that the 1+ expected cost incurred is at most 1− = (1 + O()) times the optimal cost. B. If you only have sampling access to the distribution π how many samples suffice to get estimates π̂ ∈ (1 ± )π(vj )? How many samples do you require? Show that if some π(vi ) is tiny, you may need very large number of samples. Answer: Imagine π({v}) = 2−n , then just to see a single occurence of v, you need 2n samples in expectation. cT e ), you C. Suppose you only have estimates π̄(vj ) ∈ π(vj ) ± γ. Show that if γ /(n × maxe cM e can get a good approximation. How many samples do you need to get such good estimates with high probability? Answer: First consider the single-edge case where we are given p̄ ∈ p ± γ. Now it may T be the case that OPT bought the edge in the first stage (i.e., cM e ≤ p · ce ), but since T T ). The worry is ≥ (p − γ) · c > p̄ · c p̄ < p, we defer it to the second stage (since cM e e e that we might suffer a large expected cost in the second stage. However, this is not the M T M T M case, since γ /(cTe /cM e ), we get ce > p · ce − ce . Hence p · ce < (1 + )ce , and our second-stage cost is bounded by (1 + ) times the optimum cost. Applying similar arguments to the analysis from previous parts gives the result for the path: note that we sum up to n probability values, so we need more accuracy in the measurements, and hence the presence of n in the bound for γ. For the sample complexity: take N = γ22 log 1δ samples and set π̄(v) to be the fraction of samples where the scenario is {v}. If Xt is the indicator random variablePthat the tth sample returns {v}, then E[Xt ] = π(v), and since we’re setting π̄(v) = N1 t Xt , we have E[π̄(v)] = π(v). Moreover, using a Chernoff bound, it follows that N samples is enough to give us a value π̄(v) ∈ [π(v) − γ, π(v) + γ] with probability 1 − δ. iv. Wrapping up, show that for any distribution π, given only sampling access to π, you get a cT e randomized (1 + )-approximation for StocST on a path in time poly(n, −1 , maxe cM ). e Answer: Follows from the above parts. (c) Trees. Does the algorithm above also work for a tree T with a root vertex r. Answer: The argument above just required that the decision for each edge could be made independently, and this holds not just for paths but also for trees. An Aside: A very useful result of [Bartal FOCS 1996, Fakcharonphol Rao Talwar STOC 2003 ] shows that for every metric space, one can generate a set of trees such that distances in each of these trees are at least the distances in the metric space, but for each x, y, the average distance between x, y (averaging over all these trees) is only O(log n) more than the distance between x, y in the metric. Using this result, and the result for trees above, we get an O(log n)-approximation for StocST on general graphs. 4