22887 >> Nikhil Devanur: It's my pleasure to welcome Balu... optimal online algorithms and fast approximation algorithms. So Balu...

advertisement
22887
>> Nikhil Devanur: It's my pleasure to welcome Balu Sivan to our talk on near
optimal online algorithms and fast approximation algorithms. So Balu is a Ph.D.
student at University of Wisconsin-Madison. He was an intern here last year,
and most of the work that he's going to present was done while he was an intern
and he's back for a couple of weeks. So he'll tell us about his work.
>> Balu Sivan: Thank you. So I'll be talking about recent joint work with Nikhil,
Kamal and Chris Wilkens. So Chris was an intern here last year, too.
And this is going to be near optimal online algorithms and fast approximation
algorithms. So the focus of this work is really two-fold. Part one of this work is
going to be on stochastic analysis of online algorithms, by which I mean online
algorithms whose input satisfies some stochastic property; that is, the input,
elements of input could be drawn from some IA distribution or the elements of the
input can be picked from some adversity but they come in some random array
level or from stochastic property. That's what I mean there. We do this analysis
for finally general class of problems, resource allocation problems which I'll
describe in a minute.
The first part. The second part is fast approximation algorithms for large mix-it
packing and covering LPs, both packing and coverage streams. I'll describe all
these.
So moving on to part one. This area of online algorithms with stochastic input
has recently received a lot of revived interest. So this is partly motivated by it
applications to online advertising. Almost all problems in online advertising
squarely fit within this framework of online algorithms, the stochastic input.
And the train really has been in all these areas is to go beyond worst case
analysis. This is because worst case analysis we have basic bounds that you
cannot go beyond constant factor approximations in many of these problems.
We're not happy with that. We draw something like 99 percent approximation,
not things like 60 percent of approximation. And the idea is to see if you can get
99 percent of those approximations in stochastic setting. So stochastic, as I said,
people assume that the input comes from some random distribution or is drawn
from some distribution. The distribution could be known to the designer or
unknown to the design algorithm and all those variants.
So to make things concrete, I'm going to fix a representative problem. This is the
now famous AdWords problem. So it's basically you have a bipartite graph, one
side is advertisers, the other side is queries. Queries are arriving online.
And that is advertisers are specified bids on these queries. So BIJ is the bid of
advertiser I and query J, and as the query arrives the designer or search engine
has to assign this query to one of the advertisers or drop, if you will.
He has to, the goal is for the search engine to maximize its own revenue subject
to budget constraints. So each advertiser has specified a daily budget or some
unit of time or budget for some unit of time, and that is the first constraint of LP.
You respect the budget constraints.
Of course, no keyword can be given to more than one advertiser. That's what
the LP says. Maximize the revenue. This is the AdWords problem. We'll be
using this to present our results, applies to a more general class of problems.
Here M is the number of keywords, the queries, and N is the number of
advertisers.
So the first step is what's known about the AdWords problem. All the results for
the AdWords problem are going to be specified in terms of one parameter. So
this is basically the largest bid-to-budget ratio. So you take all the bids for a
particular advertiser and divide it by the budget. This is the bid-to-budget ratio for
that advertiser. You do this for all advertisers and take the maximum and call
this gamma.
Now, the first result in this for the AdWords problem was when [inaudible] was
run in 2005, and there it wants to really coin this problem.
So they give one minus one by approximation for the AdWords problem, the
worst case setting, meaning that the adversary picks the set of keywords. It also
picks the order in which the keywords are in.
For this worst case setting, they give a one minus one approximation, and they
also prove that you cannot go beyond one minus one in worst case. Even
randomized algorithm cannot do anything beyond one minus one E. But the
results require that this parameter gamma goes to 0. What that means is the
bids are insignificantly small compared to budgets.
So the first paper to really show that a big improvement is possible in a
stochastic setting was a paper two years ago by Nikhil and Tom Hess at EC. So
what they solved was the same problem, but with the assumption that the
keywords arriving in random order, not in adversarial order. Once you assume
this, you get one minus epsilon approximation for any epsilon. But, again, there's
a restriction. The assumption here is that the parameter gamma, which I said is
depending on epsilon, it's some epsilon cube by N log M and some function of
epsilon N and M.
So that's good. This is what we wanted in stochastic setting. But from then on,
from once the paper came, there have been a lot of extensions of applying the
same result with different kinds of problems, or improving this parameter gamma.
So where typically we want gamma to be as big as one, we don't want to restrict
it. It could be anything between 0 and 1. You don't want to put restriction saying
it's going to be very small. The goal is to make gamma as big as possible.
And the best improvement so far has been a paper by Agarawal and [inaudible]
where they use the same idea in this except that they use a technique called
doubling and cut off one factor of epsilon here. Epsilon squared.
And what they show is that they also show an upper bound of epsilon squared by
log N; that you cannot take gamma to as big as one. There's a restriction. If you
want minus epsilon approximation, gamma is going to depend on epsilon
quadratically like this, but this lower bound is for a slightly general version of the
problem called resource allocation problem, which I'll describe.
And so this is the status of adverse problem. So the first result of this paper is a
threefold improvement of current status in this problem.
So in our model we are going to assume that the inputs are drawn from some
I-80 distribution, but it's unknown to the algorithm designers.
And in this model, firstly we give same one minus epsilon approximation, almost
best dependence on gamma possible. So our bid-to-budget ratio gamma is
epsilon squared by log N by epsilon.
And the results here not only work for just I-80 distributions but for adversary is
allowed to vary the distribution every time. The distribution is going to be
time-varying, even for such a model or results -- so we introduce this model for
stochastic analysis called the adversarial stochastic input model. I'll describe
what that model is. It basically allows adversary to pick distributions which vary
over time but that the distributions are not too terrible.
As long as that holds, our algorithm still works, makes these things formal in a
minute. And these results, as I said, are going to be applicable to a fairly more
general class of problems called resource allocation problems.
So I'll now move on to define what the resource allocation framework is.
>>: Is it clear that IADs is clear for permutation -- formally?
>> Balu Sivan: Yes, for adverse problem, yes IAD is easier than random
permutation. That is, if you solved random permutation then there's a way to
solve I-80. Once you fix the number of keywords for any given category, then it's
just a random permutation in that category. So you can consider random
permutations in many different categories. If you can solve random permutation
you can solve each of these categories.
>>: [inaudible].
>> Balu Sivan: Yes, it is a distribution over. But it's true that I-80 is somewhat
weaker than random permutation, but this adversarial stochastic input where the
distribution can vary over time, that is incomparable to random permutations. I'll
describe how the distributions can vary over time.
Okay. So I'll first describe what the resource allocation framework is. It's the
same kind of online flavor that records are arriving online and you want to serve
these requests. There are a bunch of options to serve any given request. And
once you pick an option to serve a request, this request option pair is going to
consume some amount of every resource available.
So there are N resources available. And M requests arriving online. Once you
pick an option K, this request option per JK is going to give you a profit WJK.
And it's also going to consume AIJK amount of resource size.
The same thing, you cannot serve a request with more than one option. That's
what the second constraint says. And the goal is to maximize your profit subject
to capacity constraints, because each resource has a capacity, CIA is the
capacity of resource I.
So K is the number of options. And this could be exponential. So to just to have
a concrete example in mind, I'm going to describe network routing as a resource
allocation problem. So think of U as an algorithm having some graph and
requests are basically requests to route from a resource to a sync in a graph, and
options are basically the exponentially many paths available for you to choose.
You could choose one of these parts.
And the resources are the edges and the edges are capacities. And, of course,
depending on which option, which path you choose, the different edges are going
to get consumed. The square literature of the previous network.
So for this problem, the previous based result was epsilon squared by N log MK
by epsilon. So K, as I said, is the number of options. For graph, you have
exponentially many parts. It could be like 2 to the N. So this is really like epsilon
squared by N squared. So epsilon squared log N by epsilon. It's like a quadratic
improvement. The difference is that this holds for random permutations. And our
result is for I-80 distributions or time varied distribution log.
Okay. So I now describe this time varying model. So here the algorithm
designer is given some target opt T. So this is the benchmark against which the
competitive issue is going to get defined.
And now the goal of the algorithm is to get as best an approximation as possible
to opt T, and adversary is going to pick time varying distributions except what we
ask is for every distribution the adversary picks, the expected value of the optimal
solution on that distribution, meaning that if that distribution were to be used
throughout, find the optimal value in algorithm can take the expectation. That
should be at least the target given to you. So do like the minimum one would ask
for, at least the optimal algorithm should an expectation do as well as the target.
If that is true, then how skewed the distributions may be, longer expectation is
true. All our algorithms work and every one minus epsilon does hold. So that is
the adversary stochastic input model, is to recall in this adversary stochastic
input model for the resource allocation framework we have this one minus
epsilon with the best dependence on the [indiscernible].
So I'm now going to go on to the second result in this paper. So so far I've been
talking about problems where the bid-to-budget ratio gamma was really small.
That is, it depended quadratically on all epsilon for all minus epsilon
approximation. The question is what happens when gamma is as big as one,
meaning that bids are as compared to budgets.
So this problem has remained largely open since the problem was coined. The
best competitive ratio known, even in stochastic settings, not in worst case, but
even in stochastic the best ratio known was half. And this comes through very
trivial radial algorithm and nothing better was known.
But special cases of this problem have seen a lot of progress recently. So the
online bipartite patching is a special case of this problem. So in this -- it's
oversimplified special case meaning that all the bids are same. All the budgets
are the same. It's just a matching problem. And for this problem two years ago
Feldman [inaudible] gave an algorithm which beats one minus one by E but for
known distributions. The right-hand side of the graph is B, basically being drawn
from some distribution which is known to the algorithm designer, and B to 1
minus 1 by E. And after that there are a series of results by [inaudible] and the
best is by Monshadi Garan and Savedi [phonetic] which give a .702 to this
problem. And recently in this talk Modian and Yon and Qurandy [phonetic],
[indiscernible] analyzed the same online bipartite problem but in a random
permutation setting.
That is, you could think of it as I-80 but on known distributions or something more
powerful than that. In that setting they gave a .696 approximation, and that's the
current best known. For this work I'm going to focus on the more general
problem of AdWords in the unbounded grammar setting but in the stochastic
setting.
So the second result is that in the I-80 unknown distributions or even better in the
adversary stochastic input model, the simple greedy algorithm gets a one minus
one by E approximation against the expected optimal fraction solution.
So the greedy algorithm is really simple in the sense that when a query arrives
you're going to assign it to the advertiser who has the maximum effective build
for that query. Effective build is basically the minimum of the bid and the
remaining budget for the advertiser.
It's effectively what he can contribute at that point. You can compute it for all
advertisers and choose the advertiser who has the maximum effective rate.
This minus one by E. Previously the same greedy algorithm has been analyzed
by Goel and Mehta in 2008, and they analyze it for the same adverse problem in
the random array model. They also get one minus one E except they make an
exception on bids and budgets which almost boils down to saying that gamma
goes to zero. Right?
But here we have an unbounded gamma. So that is the second result. The third
problem, third result in this paper is -- now I'm going to move to offline instances.
So last year at EC, Charles, Nikhil, Kamal and others had a paper which gives
sampling-based algorithm, which is randomized algorithms for problems like
matchings on huge graphs. So these graphs are so huge that no classical
algorithm is useful. You want an algorithm which has a simple single sweep
through the graph. What we do in this paper is generalize these kinds of
algorithms for more general problems. We solve -- basically we have this mix-it
packing and covering LPs and we solve mix-it packing and LPs, the
approximated, with pretty fast approximation algorithms.
So here is the precise problem. You have these, a bunch of packing constraints
there, the first set of constraints and a bunch of covering constraints, greater or
equal to equalities. And you have a polytope of constraint. This is before the
unit simplex prior to constraint. And the goal is basically to find out whether this
set of constraint has a feasible solution or not. So we distinguish between this
yes-and-no case. Yes, there is a feasible solution. No is, even if I slightly relax
this right-hand side, basically multi-player one plus epsilon or one minus epsilon,
depending on the sign of the inequality, even with that relaxation, this LP doesn't
have any field solution. You can distinguish between these two cases with a very
high probability, L minus delta or minus epsilon, for example. That's what we
saw in the second result. So before stating the result, here is some notation.
So let PJ be the unit simplex, basically. We assume that there is some oracle
which does the following job for us: If I give a vector, then the oracle should
return some vector XJ from the polytope that minimizes B.XJ. So this is -- this
might look strange, but it is not really strange.
For example, the network routing problem, what this translates to is if I give you a
vector of exponentially many parts, you choose me the shortest path and give
me. That's what this is. We know the shortest path is easy in polynomial time.
And for many problems, I mean, you have natural oracles. I mean, if you have
these oracles, we say that you can solve this gap of the mix-it packing, yes or no
with high probability, with so many oracle calls. Gama epsilon squared some
large oracle cards. So that is the third result for offline instances. So, of course,
we're talking about mix-it packing covering LPs. There's a lot of -- there's a long
line of previous work on this. So Young in '95 and Plotkin, Ashmore and
Stardash [phonetic] in 91, they solved a very general class of mix-it packing and
LPs. The thing is they acquire gamma squared, M squared, epsilon squared,
some log oracle calls.
So what we say is that that is for very general class of LPs. What we say is if
you add the special polytope, maybe the unit simplex constraint, which is very
natural because all the resource allocation problems fall squarely with the
polytope constraint, some more K exchange less than one, don't give a request
to more than one option, then you can cut down on the gamma squared, M
squared by two comma M. The quadratic becomes linear. So that's the
comparer to the previous work on this.
So I presented previous as the common idea really behind all these results is the
following two-stage approach. The first stage is to develop an algorithm just
omniscient or knowledgeable algorithm. This algorithm has knowledge of the
distribution. And based on the distribution could have knowledge of the optimal
solution for the off-line instance. So by off-line instance, I mean, you construct
an expected instance of the problem. Each request arrives expected number of
times, and that is an off-line instance.
You compute -- once you know the distribution, you can compute the
expectation. Have this off-line instance. Suppose you solve it optimally,
suppose even that is known to this knowledgeable algorithm, then you can use
that knowledgeable algorithm to achieve the required competitive issue, describe
how. And then the analysis of this algorithm should satisfy some properties.
Okay. And once it does, you can remove this dependence and knowledge. You
can give a knowledge, a distribution obvious algorithm, it achieves the same
competitive issue. So I'm now going to describe step one, and this is best
illustrated through a toy problem.
So in this toy problem, what's happening is some items are arriving online. And
each item has a cost. Item I has cost CI. And you can do two things in an item
array, you can either put the item in a bag, it has bag capacity G. Only G items
can go inside the bag, or you pay the cost for the item and then you don't put the
item in the bag if you pay for it.
The goal is to minimize the total cost you pay. So respecting this capacity
constraint of the bag, and these items are drawn from some distribution. This
cost at random some distribution, which is unknown to the algorithm designer.
And, as I said, minimize the total cost.
I'm going to assume for simplicity that the optimal costs turns out to be that the
optimal cost is equal to the total capacity G.
It doesn't matter what I use this for simplicity. For the first step is suppose we
knew the distribution from which these costs arrive. Then what can we do?
Then we can do a very simple algorithm. You can set some threshold alpha,
then say if the cost is more than alpha, then I'm going to put the item in the bag.
It's too much for me to pay. And the cost is less than alpha, then I pay for it. And
you choose this threshold alpha so that probability P that discussed this more
than alpha is such that the expected number of items for which you're going to
pay, which are going into the bag, MP, is equal to G. The bag doesn't spill.
Now, you calculate this threshold and you do this for the online algorithm, and
you can prove that this online algorithm does very well. That's what I'm going to
analyze now. I'm going to use an analysis to drop knowledge of the distribution.
So in some simple notation before proceeding, XT, and I'm going to say that at
stage T, whether you put the item into the bag or not; if you put it into the bag, it's
1. It's 0 otherwise. Why it is similarly how much cost you paid, if you paid for
item I at step T, why T is CI. If you didn't pay for any item why is it zero. Now,
just note that the expected value of XT is exactly P, which is G by M, according
to construction.
Similarly, the expected value of YT in total you're going to -- the opt is equal to G.
So for one step the opt is, the value is odd by M, Z by M. These two quantities
have the same expectation.
>>: What's the difference between IE and ->> Martim Carbone: So IE is items used to index the items. T the step T.
Algorithm proceeds in steps.
So IE is used to index steps. You could think of N items and T is items
proceeding in steps. I mean the same item can arrive many times, basically. It's
well trying to the distribution.
So I'm going to ask you: Does this some bit of more rotation, so BT is sum of
overall YI still step T. And ST is overall XI to step T [phonetic]. And the goal is to
ask what are these quantities. So after M steps, what is the probably that the
total size you've accumulated in the bag is more than G times 1 plus epsilon.
That is, what's the probability that you're spilling overcapacity. And you could
ask the same thing for cost. What's the probability that you paid more than opt
times one plus epsilon. Opt is G here. This is the probablity of suboptimal cost,
paying more than what is necessary. And if these two probabilities are small,
then we're approximately good, right?
So the standard way to analyze this is the following: So you take this probability
and you exponentiate both sides. The power of one plus epsilon. And you apply
Markoff's inequality in that step and this step, what I'm doing, I'm just saying that
SM is the sum overall XIs.
So this step is based on convexity that exponential function is convex. So I can
bring the exponent as a product here, basically.
Now I'm conditioning, this is this probability is basically conditioned on what has
happened in the first T steps. So X1 to XT are no more random variables. The
remaining things are random variables, and then I take an expectation for XI, I
told you that the expectation for XI is just G by M. So it's epsilon G by M. So this
is basically an upper bound on the conditional failure probability of the algorithm.
That the failure probability that you're going to overspill can compute the same
quantity for R. But it's not interesting. The same failure of upper bound and
conditional failure probability.
So now we'll return the sum over these two failure probabilities, the union bound
of these two failure probability. The point is all these terms are common, we can
drop them off. We get a simple scaled version of the probability. So we can just
analyze this, what happens to this.
So I've returned the same thing here, the failure of probability of T steps is upper
bounded by the sum over those two quantities. It's a scaled version. And the
question really is to ask what happens to this quantity as the algorithm proceeds.
In expectation or one step, namely the T plus one step, what happens to this
quantity. So what happens basically FPX is going to get multiplied by one more
one plus epsilon XI. And this is going to get multiplied one more plus epsilon YA.
But would we have the same expectation? So you have the same old quantity
multiplied by some constant. That's what happened in one step. And this is -this is what is the proof basically, the main ingredient in the proof or the
knowledge of the algorithm, that the failure probability actually didn't increase by
more than a constant in every step.
And actually scaled things for you. That's why you see an increase there. If we
didn't scale it, there was really no increase. And if you just proceed through in
steps this algorithm gets a very good approximation, the failure probability is very
small.
If only we can ensure that the algorithm did not know the distribution, also had
the same property that the failure probability did not increase by more than a
constant factor, that would be great.
And that's really what we are going to do. The question is, did we require
knowledge of distribution to ensure what I just said, the last step? Basically you
could have explicitly minimized this upper bound on failure probability. So you
have a basic estimate of the failure probability which is upper bound. Why don't
we explicitly minimize that? By which I mean, you have two options. Put the
item in the bag or cost. If you put it into the bag, then the failure probability of
repeat one's steps is just this quantity gets disturbed. It gets multiplied by one
plus epsilon. The other thing remains the same because Y is zero.
Or you put the item -- you pay the cost for the item, and then this doesn't get
disturbed. And there you get a one plus epsilon CD plus one. You do that based
on which of those is smaller. The quantity in the red rectangle. And basically
what this says is that you choose that option X or Y based on which of these two
is minimal. FPX or cost FPY. So this means basically -- I'm just dividing by the
product of one plus epsilon Y. So we are setting a threshold on cost here.
Without knowing the distribution, you can't do this. This is the same thing as the
original algorithm, same kind of threshold except we have a time varying
threshold now. It's a simple algorithm and easy to update the threshold,
multiplicative update of the threshold.
And without knowing the distribution, you can do the same thing as the original
algorithm.
So to recap. This was a two-step process. The first development algorithm
which knew the distribution, and then we dropped the dependency or the
distribution on the optimal solution.
And I just wanted to point out what were the necessary steps to develop these
kind of algorithms. One is that the algorithm proceeds in steps. And if the
performance of the algorithm is shared by some random variable, call it Q, then
the proof should basically lie in the fact that a basic estimate of Q or an upper
bound of Q did not increase by too much in any step. These are the requisite
properties of your hypothetical algorithm. Once it's satisfied, you read the fourth
thing, which is the quality of the basic estimate you choose. If you're clever
enough in choosing your basic estimate, so that minimizing it doesn't require
knowledge of distribution.
That's what we did. We minimized an upper bound and a failure probability.
That did not require knowledge distribution. Then you're done, basically. You
can remove knowledge of distribution. And whenever you see a potential
function based argument. This is really what is going on in all of these
arguments. So five T, potential function step T is basically a pessimistic estimate
of Q conditioned on the first T steps. And people generally argue that in step T
plus 1 in expectation the potential function didn't increase.
And they also argue that the potential function, minimizing the potential function
does not require knowledge of the solution. Therefore an algorithm can come
and do this without knowledge of distribution. And it's done.
So this approach is quite general, as you can see. I mean, for this problem, the
metric Q, which is used to measure was the failure probability, and for a future
algorithm, for the adverse problem with unbounded gamma, same metric is going
to be unspent budget.
Same analysis, just substitute for Q instead of the failure property, substitute
unspent budget and analysis goes through.
And the source of knowledge still can be quite general. First, previously this was
used for derandomizing randomized algorithms. But you can use it for online
algorithms and stochastic input, as I said. You can make the algorithm
distribution oblivious and of course offline and online are used for both.
So that's it for the proof of the resource allocation problem. Now I'm going to
discuss a bunch of special cases which fall into the resource allocation
framework. This is just to remind you of the framework requests and resources
and capacities for resources and data options.
This is the framework. So the first special case is combinatorial auctions. So
here you have N items for sale, and there are CI copies of each item and buyers
are arriving online. And these buyers have utilities or subsets of items. So
there's N items that are two to the N subsets. Your utility overall these two
subsets. The goal is to do the following: You post some prices on these items.
And when the buyers arrive online they're going to pick their favorite bundle, that
is, their utility maximizing bundle based on these prices.
And can you approximate the social welfares of the two capacity constraints.
Social welfare means the sum total of utility obtained by all buyers.
Can you get a good approximation of social welfare through simple posted
prices, is the question? So we make two assumptions here. The first
assumption is that if you post prices, that buyers must be able to pick their utility
maximizing bundles. There are exponentially many buttons. You could ask why.
The minimum for solving a problem like this if you post prices by picking their
favorite bundle.
We also assume that bidders, once they leave the mechanism, they're going to
review their utility function. That's after partic, not before partic [phonetic]. So
the mapping to resource allocation is fairly straightforward.
So the items here correspond to the resources, the requests here correspond to
the buyers, and the options here correspond to the exponentially many bundles
available. And the gamma, which was there, the bid-to-budget ratio here can be
thought of as the resource consumption to capacity or one, whenever you buy in
bundle you have one of every number of item.
The capacity, basically the number of copies. And the minimum of that is of ICA.
And incentive constraints, by that I mean suppose these bidders were to act
however the algorithm tell them to act. They're not interested in maximum utility,
if you assume that. There you can just apply play the previous resource
allocation algorithm here. You can get one minus epsilon approximation to do
the social welfare, which is good.
With this assumption that the gamma satisfies something, which means that the
number of copies of any items are so -- you can do this. But, of course, every
bidders have incentive constraints. They're not going to act as we tell them. The
other question is whether the old algorithm respects the incentives of bidders.
So I didn't present the algorithm for the general problem. I only presented it for
the time problem because it looks complicated to write down. But the algorithm
at step K chooses an option to minimize this expression. But you can factor in
terms out and it basically looks something like this, where term PI is pulled out
the multiplicative factor there, the denominator. So the term PI is this. So what
this term looks like, it looks like utility for a buyer, right? It looks like utility minus
the sum over the prices I pay for the bundle. So if only I'm able to post these
prices PI for item, the buyer is going to do precisely what I want to do as an
algorithm. The algorithm is going to pick the option K to maximize debt. The
buyer will do exactly that if he posts these prices. Okay. So that solves the
combinatorial auction problem with incentives.
And I want to point out that I mean this request, this requirement that bidders
reveal their utility after leaving actually can be relaxed if only you assume that
there is some target given to you that social welfare like opt, W status is given to
you. You're asked approximate W stat, then there's no need for bidders to reveal
their utility function, and the prices actually remain the same for all the buyers.
And I think it's update, you post these prices and you're done.
>>: Worst case or for ->> Balu Sivan: This is for stochastic case, the bidders are drawn from some
distribution.
>>: [inaudible].
>> Balu Sivan: No, if we know the utility.
>>: [inaudible] is ->> Balu Sivan: Oh, okay, so yeah, the prices are going to get upgraded over
time. Yeah. So we don't require -- okay. So we don't require buyers to reveal
their utility, but the prices get updated over time, yes.
So there are many other special cases. One is the display allocation. So in this
problem, basically advertisers are, this is the advertisers getting their
advertisements in Web pages. And basically how it works is they come to Web
pages ask, sign a contract saying that in this month we require one million
impressions to be shown. And these Web pages maximize their revenue
respecting these contracts that they've already signed and maximize your
revenue respecting these contracts, this fits squarely into the resource allocation
framework. And network routing I already discussed and load balancing.
And the other application is many algorithms actually require a training phase
when they are dealing with distributions. They train their algorithm for a few
samples. And then based on the training, the algorithm performs for the
remaining samples. But if you use our algorithm, at some given target W start
YT, you don't need to train at all. It straightaway works. So for all those
algorithms it can be used as a replacement and we point out some instances in
the paper.
So then the natural question is what if W star is unknown? The potential function
basically depended on W star, or if it was not known. If it's not known, you
periodically get increasingly better estimates of W status algorithm runs.
So the initial estimates of W star are going to be very bad, because you have a
few samples. Where it increasingly gets better. To offset the fact that the initial
estimates are erroneous, the phases of the algorithm are going to be such that
they're exponentially increasing. So the later phases have much more effect on
the quality of the output than the initial phases. So the inaccuracy is basically
offset by this. You can get the same performance as before without knowledge
of W star.
>>: Excuse me, one minus epsilon?
>> Balu Sivan: One minus epsilon. Without knowledge of W star.
>>: Lost during the way, you don't lose a constant here.
>> Balu Sivan: We don't lose a constant because there's exponentially
increasing size of phase. It's the initial phase K, then it becomes twice that, then
it becomes twice that and so on.
So this also dealt an improvement here over the previous algorithm. The
previous algorithm actually periodically required computing the entire solution.
And we only require estimation of the W star optimal value, meaning that if
somebody were to give us this W star, then we are done, stating no estimation
required.
Okay. So that is for the first problem. So now this is the second problem with
unbounded gamma, the bid-to-budget ratio. It could be a budget. For this
problem, as I said, the algorithm really is the same, the two-step process of
developing a knowledge of that algorithm and dropping knowledge of distribution,
except that the potential function is now the unspent budget, right?
Now you want to consume as much of budget as possible from every advertiser,
which means your goal is to minimize the unspent budget of every advertiser.
You can prove that this hypothetical knowledge of this algorithm brings down the
unspent budget by a factor of one minus one by M in every round. And like the
previous algorithm, our goal is to choose that algorithm which explicitly minimizes
the unspent budget over any given step. And that is the greedy algorithm.
Because the greedy algorithm is going to consume the maximum amount in a
particular step.
And by the previous proof, the greedy algorithm should do the same thing; that it
also brings down the unspent budget by a factor of one minus one by M.
And this basically gives a one minus one by M approximation, one minus one by
M times M is E, so you get a one minus one approximation.
So I'll briefly mention what we do for the offline problem. The mix set packing
covering problem. So we basically solve this offline problem as if it were an
online instance. So what we do is we are going to sample these requests. We're
going to sample the right-hand side of the LP, and sampling and dealing with
them is as if there were things coming online. Now it becomes just like the online
problem. And you can minimize the pessimistic estimate of the failure probability
at each step and you only acquire gamma by epsilon square oracle calls. In
particular, it's an improvement over the previous quadratic dependence on oracle
calls because of the fact that we assume it was a unit simplex part of the
requirement.
So I'll mention only that much for mix-it packing covering.
>>: Simplex one version or for the whole [inaudible] conditional constraint, what's
that mean?
>> Balu Sivan: No, it's for all. No, for every request we assume that you can
give it to at most one option. So for all J, some more XKJs are at most one.
>>: Many.
>> Balu Sivan: Yeah, many, yes. Correct.
So I'll summarize and present some open problems. So basically we saw this
resource allocation problem in the small resource consumption setting with the
optimal dependence possible in gamma. Minus one epsilon approximation MD
unknown distribution model, and we improve the factor for the unbounded
gamma setting from half to one minus one by E to a simple greedy algorithm.
For the greedy algorithm, the analysis is tight, that you cannot go beyond one
minus one by E. But it's not clear that this is the best you can do. I believe it
does improve as well.
So we saw approximation algorithm, gave approximation algorithm, mix set
packing and covering, and all those previous research applied not just for IAD,
but for the new model which we introduced in this paper, the adversary
stochastic input model, distributions change over time. But any given distribution
is not terrible.
Open problems, the most interesting open problems are two-fold. One is, in
general, any results, any result which held for unknown distribution model
actually also holds for the random [inaudible] model also. But for our problem we
were not able to prove that it holds for random permutations. I mean, we don't
have counterexamples either. I definitely believe that this algorithm is whole
good for the random permutation but we don't know how to prove. So it would be
good to prove surprising if there's a counterexample.
The second question is the following: For the worst case setting, for the adverse
problem I initially said that there's a one minus one E approximation which
cannot be improved. For the stochastic setting there are various one minus
epsilon approximation algorithms. But these are two different algorithms, really.
What I want is a single algorithm which simultaneously gets one minus one
approximation and minus epsilon approximation. That would be really good.
Yeah, so I mean recently there seems to be some progress towards this question
that for the most general problem you cannot achieve these two simultaneously.
But probably you could do it for simpler settings. Okay. That's it. Thanks for
listening. [applause].
>> Nikhil Devanur: Any questions?
>>: Just clarifying that improvement over -- I mean, this seems constrained -- I
mean, how -- is it different significantly than the [inaudible].
>> Balu Sivan: No. The thing is Yeng for mix-it packet covering had to use
[indiscernible] checkoff bounds, and [indiscernible] checkoff bounds, they are
weaker than [indiscernible] check off bounds. And the reason for using
[inaudible] bounds was having arbitrary polytopes. If only we have this
multiplicative, if the unit simplex, you can -- you can use multiplication [inaudible]
that's the main difference. The same multiplicative update kind of algorithm for
the potential function is different. Use it based on the multiplicative [inaudible]
that is.
>>: So replacing it by some other constant?
>> Balu Sivan: That goes to --
>>: Distributed?
>> Balu Sivan: Yes.
>>: You have an arbitrary call?
>> Balu Sivan: Yeah.
>>: Everything is possible.
>> Balu Sivan: Yes.
>>: Actually, if you had equal to we don't know?
>> Balu Sivan: If you have equal to, it's a problem, yes. If you have equal to,
you can't use multiplicative check-off bounds. You have to go to [inaudible]
bounds. Basically I think the equal to step almost covers all the other -- the most
difficult case. If only you can solve equal to you can basically solve all of the
polytopes.
>>: You mean, prove large, small or equal at the same time.
>>: No, no, the assumption is that CA and BA is a lot compared to -- [inaudible]
so those have -- those are easy constraints. In the sense that those constraints
EIJK are small compared to CIs and [inaudible] compared to -- so those are easy
constraints. These are the hard constraints which you have to solve in every
step these hard constraints they look like this. Then you can get this gamma and
factor, but this hard constraint, arbitrarily since it's hard constraint it's equal to
then we need to sort PSD and -- [inaudible].
>>: Dual standard so you can just use one [inaudible] works for certain.
>> Balu Sivan: Basically the dual variable for this will be positive. If it was equal
to, then you will not be of any size. So you can't apply multiplication of bounds
for the duals.
>> Nikhil Devanur: So let's thank Balu.
[applause]
Download