22887 >> Nikhil Devanur: It's my pleasure to welcome Balu Sivan to our talk on near optimal online algorithms and fast approximation algorithms. So Balu is a Ph.D. student at University of Wisconsin-Madison. He was an intern here last year, and most of the work that he's going to present was done while he was an intern and he's back for a couple of weeks. So he'll tell us about his work. >> Balu Sivan: Thank you. So I'll be talking about recent joint work with Nikhil, Kamal and Chris Wilkens. So Chris was an intern here last year, too. And this is going to be near optimal online algorithms and fast approximation algorithms. So the focus of this work is really two-fold. Part one of this work is going to be on stochastic analysis of online algorithms, by which I mean online algorithms whose input satisfies some stochastic property; that is, the input, elements of input could be drawn from some IA distribution or the elements of the input can be picked from some adversity but they come in some random array level or from stochastic property. That's what I mean there. We do this analysis for finally general class of problems, resource allocation problems which I'll describe in a minute. The first part. The second part is fast approximation algorithms for large mix-it packing and covering LPs, both packing and coverage streams. I'll describe all these. So moving on to part one. This area of online algorithms with stochastic input has recently received a lot of revived interest. So this is partly motivated by it applications to online advertising. Almost all problems in online advertising squarely fit within this framework of online algorithms, the stochastic input. And the train really has been in all these areas is to go beyond worst case analysis. This is because worst case analysis we have basic bounds that you cannot go beyond constant factor approximations in many of these problems. We're not happy with that. We draw something like 99 percent approximation, not things like 60 percent of approximation. And the idea is to see if you can get 99 percent of those approximations in stochastic setting. So stochastic, as I said, people assume that the input comes from some random distribution or is drawn from some distribution. The distribution could be known to the designer or unknown to the design algorithm and all those variants. So to make things concrete, I'm going to fix a representative problem. This is the now famous AdWords problem. So it's basically you have a bipartite graph, one side is advertisers, the other side is queries. Queries are arriving online. And that is advertisers are specified bids on these queries. So BIJ is the bid of advertiser I and query J, and as the query arrives the designer or search engine has to assign this query to one of the advertisers or drop, if you will. He has to, the goal is for the search engine to maximize its own revenue subject to budget constraints. So each advertiser has specified a daily budget or some unit of time or budget for some unit of time, and that is the first constraint of LP. You respect the budget constraints. Of course, no keyword can be given to more than one advertiser. That's what the LP says. Maximize the revenue. This is the AdWords problem. We'll be using this to present our results, applies to a more general class of problems. Here M is the number of keywords, the queries, and N is the number of advertisers. So the first step is what's known about the AdWords problem. All the results for the AdWords problem are going to be specified in terms of one parameter. So this is basically the largest bid-to-budget ratio. So you take all the bids for a particular advertiser and divide it by the budget. This is the bid-to-budget ratio for that advertiser. You do this for all advertisers and take the maximum and call this gamma. Now, the first result in this for the AdWords problem was when [inaudible] was run in 2005, and there it wants to really coin this problem. So they give one minus one by approximation for the AdWords problem, the worst case setting, meaning that the adversary picks the set of keywords. It also picks the order in which the keywords are in. For this worst case setting, they give a one minus one approximation, and they also prove that you cannot go beyond one minus one in worst case. Even randomized algorithm cannot do anything beyond one minus one E. But the results require that this parameter gamma goes to 0. What that means is the bids are insignificantly small compared to budgets. So the first paper to really show that a big improvement is possible in a stochastic setting was a paper two years ago by Nikhil and Tom Hess at EC. So what they solved was the same problem, but with the assumption that the keywords arriving in random order, not in adversarial order. Once you assume this, you get one minus epsilon approximation for any epsilon. But, again, there's a restriction. The assumption here is that the parameter gamma, which I said is depending on epsilon, it's some epsilon cube by N log M and some function of epsilon N and M. So that's good. This is what we wanted in stochastic setting. But from then on, from once the paper came, there have been a lot of extensions of applying the same result with different kinds of problems, or improving this parameter gamma. So where typically we want gamma to be as big as one, we don't want to restrict it. It could be anything between 0 and 1. You don't want to put restriction saying it's going to be very small. The goal is to make gamma as big as possible. And the best improvement so far has been a paper by Agarawal and [inaudible] where they use the same idea in this except that they use a technique called doubling and cut off one factor of epsilon here. Epsilon squared. And what they show is that they also show an upper bound of epsilon squared by log N; that you cannot take gamma to as big as one. There's a restriction. If you want minus epsilon approximation, gamma is going to depend on epsilon quadratically like this, but this lower bound is for a slightly general version of the problem called resource allocation problem, which I'll describe. And so this is the status of adverse problem. So the first result of this paper is a threefold improvement of current status in this problem. So in our model we are going to assume that the inputs are drawn from some I-80 distribution, but it's unknown to the algorithm designers. And in this model, firstly we give same one minus epsilon approximation, almost best dependence on gamma possible. So our bid-to-budget ratio gamma is epsilon squared by log N by epsilon. And the results here not only work for just I-80 distributions but for adversary is allowed to vary the distribution every time. The distribution is going to be time-varying, even for such a model or results -- so we introduce this model for stochastic analysis called the adversarial stochastic input model. I'll describe what that model is. It basically allows adversary to pick distributions which vary over time but that the distributions are not too terrible. As long as that holds, our algorithm still works, makes these things formal in a minute. And these results, as I said, are going to be applicable to a fairly more general class of problems called resource allocation problems. So I'll now move on to define what the resource allocation framework is. >>: Is it clear that IADs is clear for permutation -- formally? >> Balu Sivan: Yes, for adverse problem, yes IAD is easier than random permutation. That is, if you solved random permutation then there's a way to solve I-80. Once you fix the number of keywords for any given category, then it's just a random permutation in that category. So you can consider random permutations in many different categories. If you can solve random permutation you can solve each of these categories. >>: [inaudible]. >> Balu Sivan: Yes, it is a distribution over. But it's true that I-80 is somewhat weaker than random permutation, but this adversarial stochastic input where the distribution can vary over time, that is incomparable to random permutations. I'll describe how the distributions can vary over time. Okay. So I'll first describe what the resource allocation framework is. It's the same kind of online flavor that records are arriving online and you want to serve these requests. There are a bunch of options to serve any given request. And once you pick an option to serve a request, this request option pair is going to consume some amount of every resource available. So there are N resources available. And M requests arriving online. Once you pick an option K, this request option per JK is going to give you a profit WJK. And it's also going to consume AIJK amount of resource size. The same thing, you cannot serve a request with more than one option. That's what the second constraint says. And the goal is to maximize your profit subject to capacity constraints, because each resource has a capacity, CIA is the capacity of resource I. So K is the number of options. And this could be exponential. So to just to have a concrete example in mind, I'm going to describe network routing as a resource allocation problem. So think of U as an algorithm having some graph and requests are basically requests to route from a resource to a sync in a graph, and options are basically the exponentially many paths available for you to choose. You could choose one of these parts. And the resources are the edges and the edges are capacities. And, of course, depending on which option, which path you choose, the different edges are going to get consumed. The square literature of the previous network. So for this problem, the previous based result was epsilon squared by N log MK by epsilon. So K, as I said, is the number of options. For graph, you have exponentially many parts. It could be like 2 to the N. So this is really like epsilon squared by N squared. So epsilon squared log N by epsilon. It's like a quadratic improvement. The difference is that this holds for random permutations. And our result is for I-80 distributions or time varied distribution log. Okay. So I now describe this time varying model. So here the algorithm designer is given some target opt T. So this is the benchmark against which the competitive issue is going to get defined. And now the goal of the algorithm is to get as best an approximation as possible to opt T, and adversary is going to pick time varying distributions except what we ask is for every distribution the adversary picks, the expected value of the optimal solution on that distribution, meaning that if that distribution were to be used throughout, find the optimal value in algorithm can take the expectation. That should be at least the target given to you. So do like the minimum one would ask for, at least the optimal algorithm should an expectation do as well as the target. If that is true, then how skewed the distributions may be, longer expectation is true. All our algorithms work and every one minus epsilon does hold. So that is the adversary stochastic input model, is to recall in this adversary stochastic input model for the resource allocation framework we have this one minus epsilon with the best dependence on the [indiscernible]. So I'm now going to go on to the second result in this paper. So so far I've been talking about problems where the bid-to-budget ratio gamma was really small. That is, it depended quadratically on all epsilon for all minus epsilon approximation. The question is what happens when gamma is as big as one, meaning that bids are as compared to budgets. So this problem has remained largely open since the problem was coined. The best competitive ratio known, even in stochastic settings, not in worst case, but even in stochastic the best ratio known was half. And this comes through very trivial radial algorithm and nothing better was known. But special cases of this problem have seen a lot of progress recently. So the online bipartite patching is a special case of this problem. So in this -- it's oversimplified special case meaning that all the bids are same. All the budgets are the same. It's just a matching problem. And for this problem two years ago Feldman [inaudible] gave an algorithm which beats one minus one by E but for known distributions. The right-hand side of the graph is B, basically being drawn from some distribution which is known to the algorithm designer, and B to 1 minus 1 by E. And after that there are a series of results by [inaudible] and the best is by Monshadi Garan and Savedi [phonetic] which give a .702 to this problem. And recently in this talk Modian and Yon and Qurandy [phonetic], [indiscernible] analyzed the same online bipartite problem but in a random permutation setting. That is, you could think of it as I-80 but on known distributions or something more powerful than that. In that setting they gave a .696 approximation, and that's the current best known. For this work I'm going to focus on the more general problem of AdWords in the unbounded grammar setting but in the stochastic setting. So the second result is that in the I-80 unknown distributions or even better in the adversary stochastic input model, the simple greedy algorithm gets a one minus one by E approximation against the expected optimal fraction solution. So the greedy algorithm is really simple in the sense that when a query arrives you're going to assign it to the advertiser who has the maximum effective build for that query. Effective build is basically the minimum of the bid and the remaining budget for the advertiser. It's effectively what he can contribute at that point. You can compute it for all advertisers and choose the advertiser who has the maximum effective rate. This minus one by E. Previously the same greedy algorithm has been analyzed by Goel and Mehta in 2008, and they analyze it for the same adverse problem in the random array model. They also get one minus one E except they make an exception on bids and budgets which almost boils down to saying that gamma goes to zero. Right? But here we have an unbounded gamma. So that is the second result. The third problem, third result in this paper is -- now I'm going to move to offline instances. So last year at EC, Charles, Nikhil, Kamal and others had a paper which gives sampling-based algorithm, which is randomized algorithms for problems like matchings on huge graphs. So these graphs are so huge that no classical algorithm is useful. You want an algorithm which has a simple single sweep through the graph. What we do in this paper is generalize these kinds of algorithms for more general problems. We solve -- basically we have this mix-it packing and covering LPs and we solve mix-it packing and LPs, the approximated, with pretty fast approximation algorithms. So here is the precise problem. You have these, a bunch of packing constraints there, the first set of constraints and a bunch of covering constraints, greater or equal to equalities. And you have a polytope of constraint. This is before the unit simplex prior to constraint. And the goal is basically to find out whether this set of constraint has a feasible solution or not. So we distinguish between this yes-and-no case. Yes, there is a feasible solution. No is, even if I slightly relax this right-hand side, basically multi-player one plus epsilon or one minus epsilon, depending on the sign of the inequality, even with that relaxation, this LP doesn't have any field solution. You can distinguish between these two cases with a very high probability, L minus delta or minus epsilon, for example. That's what we saw in the second result. So before stating the result, here is some notation. So let PJ be the unit simplex, basically. We assume that there is some oracle which does the following job for us: If I give a vector, then the oracle should return some vector XJ from the polytope that minimizes B.XJ. So this is -- this might look strange, but it is not really strange. For example, the network routing problem, what this translates to is if I give you a vector of exponentially many parts, you choose me the shortest path and give me. That's what this is. We know the shortest path is easy in polynomial time. And for many problems, I mean, you have natural oracles. I mean, if you have these oracles, we say that you can solve this gap of the mix-it packing, yes or no with high probability, with so many oracle calls. Gama epsilon squared some large oracle cards. So that is the third result for offline instances. So, of course, we're talking about mix-it packing covering LPs. There's a lot of -- there's a long line of previous work on this. So Young in '95 and Plotkin, Ashmore and Stardash [phonetic] in 91, they solved a very general class of mix-it packing and LPs. The thing is they acquire gamma squared, M squared, epsilon squared, some log oracle calls. So what we say is that that is for very general class of LPs. What we say is if you add the special polytope, maybe the unit simplex constraint, which is very natural because all the resource allocation problems fall squarely with the polytope constraint, some more K exchange less than one, don't give a request to more than one option, then you can cut down on the gamma squared, M squared by two comma M. The quadratic becomes linear. So that's the comparer to the previous work on this. So I presented previous as the common idea really behind all these results is the following two-stage approach. The first stage is to develop an algorithm just omniscient or knowledgeable algorithm. This algorithm has knowledge of the distribution. And based on the distribution could have knowledge of the optimal solution for the off-line instance. So by off-line instance, I mean, you construct an expected instance of the problem. Each request arrives expected number of times, and that is an off-line instance. You compute -- once you know the distribution, you can compute the expectation. Have this off-line instance. Suppose you solve it optimally, suppose even that is known to this knowledgeable algorithm, then you can use that knowledgeable algorithm to achieve the required competitive issue, describe how. And then the analysis of this algorithm should satisfy some properties. Okay. And once it does, you can remove this dependence and knowledge. You can give a knowledge, a distribution obvious algorithm, it achieves the same competitive issue. So I'm now going to describe step one, and this is best illustrated through a toy problem. So in this toy problem, what's happening is some items are arriving online. And each item has a cost. Item I has cost CI. And you can do two things in an item array, you can either put the item in a bag, it has bag capacity G. Only G items can go inside the bag, or you pay the cost for the item and then you don't put the item in the bag if you pay for it. The goal is to minimize the total cost you pay. So respecting this capacity constraint of the bag, and these items are drawn from some distribution. This cost at random some distribution, which is unknown to the algorithm designer. And, as I said, minimize the total cost. I'm going to assume for simplicity that the optimal costs turns out to be that the optimal cost is equal to the total capacity G. It doesn't matter what I use this for simplicity. For the first step is suppose we knew the distribution from which these costs arrive. Then what can we do? Then we can do a very simple algorithm. You can set some threshold alpha, then say if the cost is more than alpha, then I'm going to put the item in the bag. It's too much for me to pay. And the cost is less than alpha, then I pay for it. And you choose this threshold alpha so that probability P that discussed this more than alpha is such that the expected number of items for which you're going to pay, which are going into the bag, MP, is equal to G. The bag doesn't spill. Now, you calculate this threshold and you do this for the online algorithm, and you can prove that this online algorithm does very well. That's what I'm going to analyze now. I'm going to use an analysis to drop knowledge of the distribution. So in some simple notation before proceeding, XT, and I'm going to say that at stage T, whether you put the item into the bag or not; if you put it into the bag, it's 1. It's 0 otherwise. Why it is similarly how much cost you paid, if you paid for item I at step T, why T is CI. If you didn't pay for any item why is it zero. Now, just note that the expected value of XT is exactly P, which is G by M, according to construction. Similarly, the expected value of YT in total you're going to -- the opt is equal to G. So for one step the opt is, the value is odd by M, Z by M. These two quantities have the same expectation. >>: What's the difference between IE and ->> Martim Carbone: So IE is items used to index the items. T the step T. Algorithm proceeds in steps. So IE is used to index steps. You could think of N items and T is items proceeding in steps. I mean the same item can arrive many times, basically. It's well trying to the distribution. So I'm going to ask you: Does this some bit of more rotation, so BT is sum of overall YI still step T. And ST is overall XI to step T [phonetic]. And the goal is to ask what are these quantities. So after M steps, what is the probably that the total size you've accumulated in the bag is more than G times 1 plus epsilon. That is, what's the probability that you're spilling overcapacity. And you could ask the same thing for cost. What's the probability that you paid more than opt times one plus epsilon. Opt is G here. This is the probablity of suboptimal cost, paying more than what is necessary. And if these two probabilities are small, then we're approximately good, right? So the standard way to analyze this is the following: So you take this probability and you exponentiate both sides. The power of one plus epsilon. And you apply Markoff's inequality in that step and this step, what I'm doing, I'm just saying that SM is the sum overall XIs. So this step is based on convexity that exponential function is convex. So I can bring the exponent as a product here, basically. Now I'm conditioning, this is this probability is basically conditioned on what has happened in the first T steps. So X1 to XT are no more random variables. The remaining things are random variables, and then I take an expectation for XI, I told you that the expectation for XI is just G by M. So it's epsilon G by M. So this is basically an upper bound on the conditional failure probability of the algorithm. That the failure probability that you're going to overspill can compute the same quantity for R. But it's not interesting. The same failure of upper bound and conditional failure probability. So now we'll return the sum over these two failure probabilities, the union bound of these two failure probability. The point is all these terms are common, we can drop them off. We get a simple scaled version of the probability. So we can just analyze this, what happens to this. So I've returned the same thing here, the failure of probability of T steps is upper bounded by the sum over those two quantities. It's a scaled version. And the question really is to ask what happens to this quantity as the algorithm proceeds. In expectation or one step, namely the T plus one step, what happens to this quantity. So what happens basically FPX is going to get multiplied by one more one plus epsilon XI. And this is going to get multiplied one more plus epsilon YA. But would we have the same expectation? So you have the same old quantity multiplied by some constant. That's what happened in one step. And this is -this is what is the proof basically, the main ingredient in the proof or the knowledge of the algorithm, that the failure probability actually didn't increase by more than a constant in every step. And actually scaled things for you. That's why you see an increase there. If we didn't scale it, there was really no increase. And if you just proceed through in steps this algorithm gets a very good approximation, the failure probability is very small. If only we can ensure that the algorithm did not know the distribution, also had the same property that the failure probability did not increase by more than a constant factor, that would be great. And that's really what we are going to do. The question is, did we require knowledge of distribution to ensure what I just said, the last step? Basically you could have explicitly minimized this upper bound on failure probability. So you have a basic estimate of the failure probability which is upper bound. Why don't we explicitly minimize that? By which I mean, you have two options. Put the item in the bag or cost. If you put it into the bag, then the failure probability of repeat one's steps is just this quantity gets disturbed. It gets multiplied by one plus epsilon. The other thing remains the same because Y is zero. Or you put the item -- you pay the cost for the item, and then this doesn't get disturbed. And there you get a one plus epsilon CD plus one. You do that based on which of those is smaller. The quantity in the red rectangle. And basically what this says is that you choose that option X or Y based on which of these two is minimal. FPX or cost FPY. So this means basically -- I'm just dividing by the product of one plus epsilon Y. So we are setting a threshold on cost here. Without knowing the distribution, you can't do this. This is the same thing as the original algorithm, same kind of threshold except we have a time varying threshold now. It's a simple algorithm and easy to update the threshold, multiplicative update of the threshold. And without knowing the distribution, you can do the same thing as the original algorithm. So to recap. This was a two-step process. The first development algorithm which knew the distribution, and then we dropped the dependency or the distribution on the optimal solution. And I just wanted to point out what were the necessary steps to develop these kind of algorithms. One is that the algorithm proceeds in steps. And if the performance of the algorithm is shared by some random variable, call it Q, then the proof should basically lie in the fact that a basic estimate of Q or an upper bound of Q did not increase by too much in any step. These are the requisite properties of your hypothetical algorithm. Once it's satisfied, you read the fourth thing, which is the quality of the basic estimate you choose. If you're clever enough in choosing your basic estimate, so that minimizing it doesn't require knowledge of distribution. That's what we did. We minimized an upper bound and a failure probability. That did not require knowledge distribution. Then you're done, basically. You can remove knowledge of distribution. And whenever you see a potential function based argument. This is really what is going on in all of these arguments. So five T, potential function step T is basically a pessimistic estimate of Q conditioned on the first T steps. And people generally argue that in step T plus 1 in expectation the potential function didn't increase. And they also argue that the potential function, minimizing the potential function does not require knowledge of the solution. Therefore an algorithm can come and do this without knowledge of distribution. And it's done. So this approach is quite general, as you can see. I mean, for this problem, the metric Q, which is used to measure was the failure probability, and for a future algorithm, for the adverse problem with unbounded gamma, same metric is going to be unspent budget. Same analysis, just substitute for Q instead of the failure property, substitute unspent budget and analysis goes through. And the source of knowledge still can be quite general. First, previously this was used for derandomizing randomized algorithms. But you can use it for online algorithms and stochastic input, as I said. You can make the algorithm distribution oblivious and of course offline and online are used for both. So that's it for the proof of the resource allocation problem. Now I'm going to discuss a bunch of special cases which fall into the resource allocation framework. This is just to remind you of the framework requests and resources and capacities for resources and data options. This is the framework. So the first special case is combinatorial auctions. So here you have N items for sale, and there are CI copies of each item and buyers are arriving online. And these buyers have utilities or subsets of items. So there's N items that are two to the N subsets. Your utility overall these two subsets. The goal is to do the following: You post some prices on these items. And when the buyers arrive online they're going to pick their favorite bundle, that is, their utility maximizing bundle based on these prices. And can you approximate the social welfares of the two capacity constraints. Social welfare means the sum total of utility obtained by all buyers. Can you get a good approximation of social welfare through simple posted prices, is the question? So we make two assumptions here. The first assumption is that if you post prices, that buyers must be able to pick their utility maximizing bundles. There are exponentially many buttons. You could ask why. The minimum for solving a problem like this if you post prices by picking their favorite bundle. We also assume that bidders, once they leave the mechanism, they're going to review their utility function. That's after partic, not before partic [phonetic]. So the mapping to resource allocation is fairly straightforward. So the items here correspond to the resources, the requests here correspond to the buyers, and the options here correspond to the exponentially many bundles available. And the gamma, which was there, the bid-to-budget ratio here can be thought of as the resource consumption to capacity or one, whenever you buy in bundle you have one of every number of item. The capacity, basically the number of copies. And the minimum of that is of ICA. And incentive constraints, by that I mean suppose these bidders were to act however the algorithm tell them to act. They're not interested in maximum utility, if you assume that. There you can just apply play the previous resource allocation algorithm here. You can get one minus epsilon approximation to do the social welfare, which is good. With this assumption that the gamma satisfies something, which means that the number of copies of any items are so -- you can do this. But, of course, every bidders have incentive constraints. They're not going to act as we tell them. The other question is whether the old algorithm respects the incentives of bidders. So I didn't present the algorithm for the general problem. I only presented it for the time problem because it looks complicated to write down. But the algorithm at step K chooses an option to minimize this expression. But you can factor in terms out and it basically looks something like this, where term PI is pulled out the multiplicative factor there, the denominator. So the term PI is this. So what this term looks like, it looks like utility for a buyer, right? It looks like utility minus the sum over the prices I pay for the bundle. So if only I'm able to post these prices PI for item, the buyer is going to do precisely what I want to do as an algorithm. The algorithm is going to pick the option K to maximize debt. The buyer will do exactly that if he posts these prices. Okay. So that solves the combinatorial auction problem with incentives. And I want to point out that I mean this request, this requirement that bidders reveal their utility after leaving actually can be relaxed if only you assume that there is some target given to you that social welfare like opt, W status is given to you. You're asked approximate W stat, then there's no need for bidders to reveal their utility function, and the prices actually remain the same for all the buyers. And I think it's update, you post these prices and you're done. >>: Worst case or for ->> Balu Sivan: This is for stochastic case, the bidders are drawn from some distribution. >>: [inaudible]. >> Balu Sivan: No, if we know the utility. >>: [inaudible] is ->> Balu Sivan: Oh, okay, so yeah, the prices are going to get upgraded over time. Yeah. So we don't require -- okay. So we don't require buyers to reveal their utility, but the prices get updated over time, yes. So there are many other special cases. One is the display allocation. So in this problem, basically advertisers are, this is the advertisers getting their advertisements in Web pages. And basically how it works is they come to Web pages ask, sign a contract saying that in this month we require one million impressions to be shown. And these Web pages maximize their revenue respecting these contracts that they've already signed and maximize your revenue respecting these contracts, this fits squarely into the resource allocation framework. And network routing I already discussed and load balancing. And the other application is many algorithms actually require a training phase when they are dealing with distributions. They train their algorithm for a few samples. And then based on the training, the algorithm performs for the remaining samples. But if you use our algorithm, at some given target W start YT, you don't need to train at all. It straightaway works. So for all those algorithms it can be used as a replacement and we point out some instances in the paper. So then the natural question is what if W star is unknown? The potential function basically depended on W star, or if it was not known. If it's not known, you periodically get increasingly better estimates of W status algorithm runs. So the initial estimates of W star are going to be very bad, because you have a few samples. Where it increasingly gets better. To offset the fact that the initial estimates are erroneous, the phases of the algorithm are going to be such that they're exponentially increasing. So the later phases have much more effect on the quality of the output than the initial phases. So the inaccuracy is basically offset by this. You can get the same performance as before without knowledge of W star. >>: Excuse me, one minus epsilon? >> Balu Sivan: One minus epsilon. Without knowledge of W star. >>: Lost during the way, you don't lose a constant here. >> Balu Sivan: We don't lose a constant because there's exponentially increasing size of phase. It's the initial phase K, then it becomes twice that, then it becomes twice that and so on. So this also dealt an improvement here over the previous algorithm. The previous algorithm actually periodically required computing the entire solution. And we only require estimation of the W star optimal value, meaning that if somebody were to give us this W star, then we are done, stating no estimation required. Okay. So that is for the first problem. So now this is the second problem with unbounded gamma, the bid-to-budget ratio. It could be a budget. For this problem, as I said, the algorithm really is the same, the two-step process of developing a knowledge of that algorithm and dropping knowledge of distribution, except that the potential function is now the unspent budget, right? Now you want to consume as much of budget as possible from every advertiser, which means your goal is to minimize the unspent budget of every advertiser. You can prove that this hypothetical knowledge of this algorithm brings down the unspent budget by a factor of one minus one by M in every round. And like the previous algorithm, our goal is to choose that algorithm which explicitly minimizes the unspent budget over any given step. And that is the greedy algorithm. Because the greedy algorithm is going to consume the maximum amount in a particular step. And by the previous proof, the greedy algorithm should do the same thing; that it also brings down the unspent budget by a factor of one minus one by M. And this basically gives a one minus one by M approximation, one minus one by M times M is E, so you get a one minus one approximation. So I'll briefly mention what we do for the offline problem. The mix set packing covering problem. So we basically solve this offline problem as if it were an online instance. So what we do is we are going to sample these requests. We're going to sample the right-hand side of the LP, and sampling and dealing with them is as if there were things coming online. Now it becomes just like the online problem. And you can minimize the pessimistic estimate of the failure probability at each step and you only acquire gamma by epsilon square oracle calls. In particular, it's an improvement over the previous quadratic dependence on oracle calls because of the fact that we assume it was a unit simplex part of the requirement. So I'll mention only that much for mix-it packing covering. >>: Simplex one version or for the whole [inaudible] conditional constraint, what's that mean? >> Balu Sivan: No, it's for all. No, for every request we assume that you can give it to at most one option. So for all J, some more XKJs are at most one. >>: Many. >> Balu Sivan: Yeah, many, yes. Correct. So I'll summarize and present some open problems. So basically we saw this resource allocation problem in the small resource consumption setting with the optimal dependence possible in gamma. Minus one epsilon approximation MD unknown distribution model, and we improve the factor for the unbounded gamma setting from half to one minus one by E to a simple greedy algorithm. For the greedy algorithm, the analysis is tight, that you cannot go beyond one minus one by E. But it's not clear that this is the best you can do. I believe it does improve as well. So we saw approximation algorithm, gave approximation algorithm, mix set packing and covering, and all those previous research applied not just for IAD, but for the new model which we introduced in this paper, the adversary stochastic input model, distributions change over time. But any given distribution is not terrible. Open problems, the most interesting open problems are two-fold. One is, in general, any results, any result which held for unknown distribution model actually also holds for the random [inaudible] model also. But for our problem we were not able to prove that it holds for random permutations. I mean, we don't have counterexamples either. I definitely believe that this algorithm is whole good for the random permutation but we don't know how to prove. So it would be good to prove surprising if there's a counterexample. The second question is the following: For the worst case setting, for the adverse problem I initially said that there's a one minus one E approximation which cannot be improved. For the stochastic setting there are various one minus epsilon approximation algorithms. But these are two different algorithms, really. What I want is a single algorithm which simultaneously gets one minus one approximation and minus epsilon approximation. That would be really good. Yeah, so I mean recently there seems to be some progress towards this question that for the most general problem you cannot achieve these two simultaneously. But probably you could do it for simpler settings. Okay. That's it. Thanks for listening. [applause]. >> Nikhil Devanur: Any questions? >>: Just clarifying that improvement over -- I mean, this seems constrained -- I mean, how -- is it different significantly than the [inaudible]. >> Balu Sivan: No. The thing is Yeng for mix-it packet covering had to use [indiscernible] checkoff bounds, and [indiscernible] checkoff bounds, they are weaker than [indiscernible] check off bounds. And the reason for using [inaudible] bounds was having arbitrary polytopes. If only we have this multiplicative, if the unit simplex, you can -- you can use multiplication [inaudible] that's the main difference. The same multiplicative update kind of algorithm for the potential function is different. Use it based on the multiplicative [inaudible] that is. >>: So replacing it by some other constant? >> Balu Sivan: That goes to -- >>: Distributed? >> Balu Sivan: Yes. >>: You have an arbitrary call? >> Balu Sivan: Yeah. >>: Everything is possible. >> Balu Sivan: Yes. >>: Actually, if you had equal to we don't know? >> Balu Sivan: If you have equal to, it's a problem, yes. If you have equal to, you can't use multiplicative check-off bounds. You have to go to [inaudible] bounds. Basically I think the equal to step almost covers all the other -- the most difficult case. If only you can solve equal to you can basically solve all of the polytopes. >>: You mean, prove large, small or equal at the same time. >>: No, no, the assumption is that CA and BA is a lot compared to -- [inaudible] so those have -- those are easy constraints. In the sense that those constraints EIJK are small compared to CIs and [inaudible] compared to -- so those are easy constraints. These are the hard constraints which you have to solve in every step these hard constraints they look like this. Then you can get this gamma and factor, but this hard constraint, arbitrarily since it's hard constraint it's equal to then we need to sort PSD and -- [inaudible]. >>: Dual standard so you can just use one [inaudible] works for certain. >> Balu Sivan: Basically the dual variable for this will be positive. If it was equal to, then you will not be of any size. So you can't apply multiplication of bounds for the duals. >> Nikhil Devanur: So let's thank Balu. [applause]