>> Nikhil Devanur Rangarajan: So it's my pleasure to... him Balu for short. Balu is a PhD student...

advertisement
>> Nikhil Devanur Rangarajan: So it's my pleasure to introduce Balasubramanian Sivan. I call
him Balu for short. Balu is a PhD student at University of Wisconsin-Madison advised by Shuchi
Chawla. And Balu has been an intern here. In building his internship here, he worked under
the so-called [inaudible] algorithm which has had a big impact. So over to Balu.
>> Balasubramanian Sivan: Thanks, Nikhil. Thanks for the invitation. I'm very happy to be
back. So I'll be talking about optimization with uncertain inputs in this talk. And in particular,
I'll be focusing on two kinds of uncertain inputs. One is as in online inputs, where the input
comes piece by piece and the algorithm has to make a decision as soon as one piece arrives and
can’t wait for that input. So optimization subject to uncertain future is a challenge here. The
second is, as a mechanism design, where your input is distributed across several selfish
participants, and each of them may have their own well defined goals, which often conflict with
the optimization goal of the algorithm designer, and the challenge is to do optimization
respecting the incentives of the [inaudible].
So I'll begin by asking how do we formerly model and analyze these problems in theory. There
are several approaches to do this. There are two main approaches to gain currency and to see
its literature. One is to do competitive analysis where the algorithm you design faces the input
uncertainty, but the benchmark against which you compare yourself is omniscient; it ignores
the entire input to begin with, and the performance metric is measured to what's called the
competitive ratio, which is the worst case over all inputs of the performance of the algorithm,
the performance of the benchmark. So as you can see there, the OPT passes a subscript I which
means it's an instance where optimal, but the algorithm is the same for every instance. So it's
an explicitly more powerful benchmark than your algorithm; and for this reason, being a robust
benchmark, any positive research is great and a competitive analysis and a good example of
robust [inaudible] as they celebrate VCG mechanism to maximize social welfare. And for the
same reason that it is a robust benchmark like this, it often leads to, basically [inaudible], and
we’ll see this in the talk in the two examples that I'm going to talk about.
Now, a frequent alternator, in particular to step around this [inaudible] incompetent analysis, is
to perform stochastic analysis, where the idea is that the input is drawn from a loan
distribution. The algorithm knows the distribution, and it tries to optimize with the respective
data distribution. The benchmark against which you compare is the expected optimal for the
same distribution. And the performance metric is basically the ratio of the expected
performance of the algorithm, the benchmark. As you can see, both of them have a subscript F,
which means the OPT is not explicitly more powerful and you can shoot for one approximation.
And because of this, there are several [inaudible] stories in stochastic analysis. I’ll just give one
example. Meyerson’s revenue optimal mechanism is a great example of stochastic analysis.
You know the distribution.
But the biggest criticism for stochastic analysis is that you need to know the exact distribution
in order to perform this optimization. And often you have noisy data [inaudible] distributions
and that could render your algorithm really sub optimal if there is noise. So given these two
extremes, a possible middle ground would be to say that the input may be drawn from some
distribution. But I, or the algorithm designer, do not know the exact distribution. I only know
the possibly huge universe of distribution from which the input arrives, and what I'm asking for
is a single algorithm, which for every distribution in this universe performs approximately as
well as the optimal algorithm tailored specifically for the distribution. Okay. So basically we are
optimizing for the robust distribution in the whole universe. So even if you know, don't know
the exact distribution, you can use the same algorithm for all the distributions. It is in this sense
that the algorithms are prior robust because you're blind to the exact prior distribution on
inputs. As you can see there, the OPT has a subscript F which means its distribution-wise
optimal, tailored to the distribution. The algorithm is the same for all the distributions.
>>: Does the algorithm see the whole [inaudible] at once? Or>> Balasubramanian Sivan: It depends on online input, it's basically one by one, and
mechanism design you have to elicit the inputs. There, typically, you ask from all the people.
Actually, for both examples, I'm going to give in this talk, I use an even stronger metric. The
OPT is not just distribution-wise optimal, it is instance where it's optimal. Okay. So it is even
closer to competitive analysis that we are, just the presence of the distribution, allows us to
step around the negative results in competitive analysis. Now this can be pushed about
extremes. If you shrink the universe to be a size 1, that is exactly stochastic analysis. And if you
allow the universe to include arbitrary distribution, every distribution possible, then that is
competitive analysis; and the goal in prior robust optimization is to develop algorithms which
can handle [inaudible] universe as possible. But typically, these distributions, the universe has
some structure, it is not an arbitrary bag of distributions. For example, all possible product
distributions or inputs and things like that.
The question is: do we have any nontrivial algorithms that are smarter? There are some nice
examples. For example, the classic mechanism designed to be alone Kemplerer in its
generalization then what data I've got. And then Yan, and improve the algorithm settings that
is due to Devanur and Hayes. I'll talk more about this later. But these are examples of prior
robust optimization.
>>: So when you talk about optimization [inaudible] errors [inaudible] Or are you talking about
the performance of errors?
>> Balasubramanian Sivan: I'm talking about the performance of the algorithm. There is an
objective function which you have, and I'm asking how close you can get to that objective
function as compared to the optimal value of the objective function.
>>: How do you compile it, so performance [inaudible] computing speed?
>> Balasubramanian Sivan: I mean, it's not a question of runtime. If you want it to be
polynomial, but that [inaudible] performance, the [inaudible] objective function. Okay. So that
is a prior robust model. So the plan for this talk is to present prior robust algorithms for, to find
[inaudible] problems, and the take away message is that several interesting problems lend
themselves to these kind of prior robust algorithms, and you must look for them and deal with
them. So in part one of the talk, I'll present a prior robust algorithm for a resource allocation
problem, and in part two, I'll do mechanism design. I'll present a prior robust truthful machine
scheduling mechanism, and I'll conclude with some future research directions.
So part one is basically, I mean it's based on a couple of giant works. One is with Nikhil
Devanur, Kamal Jain, and Chris Wilkens, others with Nikhil Davanur and Yossi Azar. So this
resource allocation framework, the former framework I'll present in a bit, but this framework
actually is very [inaudible] and captures a lot of problems motivated by internet advertising. So
I'll quickly go to a couple of examples. One is the display as the example; if you visit any
website, proper website, you’ll see at least one advertisement like that. And at a very high
level, you can think of this display address proceeding in four phases. In phase 1, the
advertiser, which is this case, University of Phoenix, signs a contract with the publisher, MSN,
saying I want so many impressions in this period, two people [inaudible], so on, so on, etc. Now
MSN signs contracts like this with several advertisers, and then those contracts are signed, the
user webpage arrives. And MSN has to that basically in the vacant spot, which advertisement
to put. Then you deliver these ads plus the content. So this third phase is basically a resource
allocation phase. You have to decide which advertisement to put there vacant spot. And this
next example is basically something you’ve all seen a ton of times if you search for a query,
apart from the organic resource that you get, you also see some big search results. This can
also be thought of at a very high level proceeding in four phases, except that there's no
contracts here. Advertisers submit bids and budgets and then user query arrives, the search
engine has to decide which has to be matched, and basically the phase 3 is a resource allocation
phase.
So now going the formal model of this phase 3. So the model we are going to talk about is due
to MSVV, is basically an online model of repeated auctions. There is no incentives here. It's
truly algorithmic. There's a bunch of advertisers. Each will specify some budgets. These
budgets are the maximum amount you can extract from them over a day or sometime period.
And they also specific bids on various queries, and as soon as a query arrives, the search engine
has to decide whom to allocate it to. And so allocators are trying to budget that advertiser.
And you keep on proceeding like this. So the former model is that you have n advertisers,
advertiser I has budget B, i, and queries arrive as follows. So there is this huge graph, the righthand side of the graph is all possible queries, and at every step you are drawing a query
independently and identically according to an unknown distribution. So the universe, if all
distributions are basically universe of all possible i.i.d. distributions all over the place. I do not
know the distribution of the algorithm designer, and I'm going to call p, j as the probability with
each query j is going to be drawn. I did not know p, j. And B, i, j is the bid for advertiser I for
query j. So your goal is basically to get an algorithm which maximizes the revenue, which is the
sum total of all the allocated bids, respecting the budget constraints. So that's the model. Is a
model clear where the queries are drawn?
So it turns out that all the results for this problem are parameterized by the significant
parameter called the maximum bid to budget ratio. It's not difficult to see why this is the case.
And I'll illustrate this through a simple example. You consider a regular matching problem.
They do the matching bids, all the bids are one, all the budgets are one. So you have just two
advertisers. And three queries, let's say that two queries are going to be drawn online, let’s say
this is the property distribution. Suppose query one arrives first. Query two arrives first. In
respect to what you do, there is a constant probability that the second query can be allotted to
anybody, right? You can, and this guy’s budget is over. And you can basically get a gap
between optimal online and off-line revenue. The point is that if instead of the bids being one,
suppose there were one by thousand. Then each mistake you make is not going to be as costly
as the bids being one. The bids being one is a big mistake. If it's one bid over, then it's less a
mistake. So obviously, if you have smaller and smaller bid to budget ratio, you can get better
and better approximations. So all the approximation ratio, you have to be parameterized
through this bid to budget ratio and I'm going to call this gamma, after. Okay?
So I'll present what is known now. So MSVV, who coined this problem, studied this problem in
competitive setting. And what they show is that you can get a one minus one e approximation
in the case when the bid to budget ratio is zero, right? So this is the best case you can ask for
and the bids are infinitesimally small compared to budgets. And even then, you can only get
one minus one by e, no randomized algorithm can go beyond one minus one by e. Okay. So
that is for the competitive setting. And in a recent result, concurrently with our work,
[inaudible] show that it in a stochastic setting, which means you know the distribution, you can
get a one minus square root gamma approximation where gamma is the bid to budget ratio.
Okay? What this means is if gamma goes to zero, then you can get arbitrarily close to one,
circumventing the bound here that you cannot go beyond 1 minus 1 by e. The question we
asked in this work is: can you get the same one minus square root gamma approximation, but
through a prior robust algorithm? Which means you use the same algorithm in respect to what
the distribution is.
Okay. So here's our results. This is same model, i.i.d. model. We don't ask for the full
distribution. But we asked for n parameters from the distribution. Okay? I'll could explain
what these n parameters are later. But the whole distribution itself could have an infinite
support but if we only ask for a few parameters of the distribution, and given that, we get the
same one minus square root gamma approximation. We also show that you cannot go beyond
one minus square root gamma, even if you knew the distributions.
>>: [inaudible]?
>> Balasubramanian Sivan: n doesn't appear in the approximations at all.
>>: So whatever n is>> Balasubramanian Sivan: Oh, the n is the number of advertisers. Yes. n is the number of
queries, but the approximation ratio is independent of n. Yeah. So that's the result. So this
same problem has been studied in the prior robust model with a slightly different [inaudible]
than the i.i.d. universe of distributions. And in order to put our results in context, I'll briefly
goes through that model also before giving the proof of this result. So the model is the same as
this model except that instead of queries being drawn i.i.d., in adversity initially picks the set of
m queries to arrive. After that, these queries are sent according to uniformly random
permutation. So in i.i.d., if you conditionally set of queries to arrive, they arrive in uniform
random order, correct? So you can think of the i.i.d. model as a distribution or random
permutation models. Because in i.i.d., nobody's conditioning on which queries to arrive. So
i.i.d. is a distributional random permutations, and for that reason, any approximation issue that
holds for this random permutation model will also holds for the i.i.d. model, but the rewards is
not known. And there's no separation [inaudible] for this. But the distributions are unknown,
so basically the unknown i.i.d. model is really close to the random permutation model, though
one of them is stronger than the other.
>>: [inaudible]? The queries, they must be [inaudible]?
>> Balasubramanian Sivan: It could be possible that it's applications. So here is what I already
showed. So the, you know, for the random permutation [inaudible] is actually the first
[inaudible] showed that you could get arbitrarily close to one, whereas the result by Devanur
and Hayes, and this is the dependence n and m they show, and after this result there are
several other results, it's different datas. Basically, the point is that all these depend on n and
other parameters for the i.i.d. model to completely get to the [inaudible] and get this one
minus square root gamma approximation, and it’s an open question actually, whether other
algorithms actually extend to the random permutation results also. The first question is
whether random permutations actually permit this kind of approximation without any
dependence and then, but then you could ask whether this algorithm itself actually extends to
random permutations.
>>: So there are no [inaudible] and lower bounds for the permutation model?
>> Balasubramanian Sivan: There is no known lower bound except for this [inaudible]is for i.i.d.
So that is, that is a comparison for i.i.d. And random permutations. So I’ll now pool this result,
this one minus square root gamma. Here I have some [inaudible] assumptions just for the
purposes of this talk. First the bids are binary. Zero or one. The second is that the distribution
is the uniform distribution queries, that every query arrives at the same probability of one by
m. m is the total number of queries. And for the third assumption, I'll need a quick definition.
I'm going to do define what is an expected instance. For every distribution, this is a single offline instance that everything happens as per expectation, okay? So you have a query j which
arrives with probability p, j in the online model, which means the expected number of times it
should arrive is m times p, j, right? So in the off-line instance, this expected instance, you have
exactly m, p, j units of query j, okay? So for every query you have an expected number of units
of that query in this off-line instance. But for this example, m times p, j is exactly one. So you
have one unit of every query in the support of the distribution. And that is a single off-line
instance. And you can't compute it if you do not know the distribution, right? If you know the
distribution that is the off-line instance and the assumption is that you have a perfect matching
in the expected instance. The optimal solution to the expected instance is perfect B matching,
which means that every advertiser’s budget is fully consumed. I'll examine all these
assumptions later, but for now, let's say that every advertiser’s full budget is consumed.
Okay. So given these assumptions, what can you do? If you knew the distribution, then you
can run this algorithm. That's why I'm calling this the hypothetical algorithm because you do
not know the distribution. So here's what algorithm does. It firstly computes the expected
instance for which you need the distribution. You need to know the number of queries which
arrive. And it finds the optimal matching for the expected instance. So that is, you find the
matching and tabulate it as a table, which query goes to which advertiser, it stores this table,
and it uses the solution for the online problem. So what does it mean? Let’s say a sequence of
queries arrive like this; when three arrives it goes to the table and see it should be given to
advertiser seven, you give it there. And one arrives, you do the same thing. Go to the table and
see, when three arrives again, it says advertiser seven, it will give it advertiser seven, even if by
this time advertiser seven's budget was exhausted. Could be that seven’s budget is exhausted,
but that is why it is an oblivious algorithm. It doesn't depend on what has happened earlier. It
is a waste of money to give it to seven, but I'll still give it to seven, okay? It's a simple non[inaudible] algorithm.
>>: [inaudible]?
>> Balasubramanian Sivan: I mean, there is nothing like feasible. I mean, you can keep on
giving it to the advertiser, but he won't pay beyond B, i. I mean he won't get the money. So it's
a waste after B, i, that's all.
>>: Only a theorist would consider that feasible.
>>: [inaudible].
>> Balasubramanian Sivan: The claim is the simple non-active algorithm actually gets one
minus square root gamma approximation and for the assumptions they made, gamma is just
bid by budget, and the bids are all one, so it is one by the minimum budget, right? So I'm
shooting for a one by square root only by budget approximation, one minus square root on the
budget. Okay. So here is the proof for this. Let's analyze this step-by-step. In any given step,
let's see what happens. What is the probability that advertiser i gets a query? Well, I said in
the expected instance, you have full budget consumption, which means advertiser i had B, I
queries going to him. So if any of those B, i queries arrive, then advertiser i would have been
given the query. As a total of inquiries for the probability that advertiser i gets a query is B, i, by
m in upshot. So now this algorithm is basically like a balls-and-bins process. Every step
destroying a ball to advertiser i, it’s probably [inaudible] B, i by m and what is the revenue at
the end of all the steps? Because I’ve assumed binary queries, the revenue is basically the
number of queries which is matched or truncated at B, i ,like you asked, there is no user giving
beyond B, i. And this is basically, they are truncated binomial sum. We know that a truncated
binomial sum has a square root loss. And you basically get the result or sum of all these people
and that's the total revenue. So if you know the distribution, you can run this simple algorithm.
There are, I want to point out two things about this algorithm: one is the computational burden
of this algorithm increases with the size of the support, which means to compute the expected
instance, you need to know the full support and compute the expected instances as the pool
grows larger and larger, the computational burden just grows larger, and if it is infinite is it
rendered unfeasible. And secondly, this is, also this could be randomized algorithm because
the table you get for the example I had, it is [inaudible], but it could be a fraction.
>>: So I’m a bit confused about something. So this ratio you're computing is the ratio, the
algorithm which you used to what exactly?
>> Balasubramanian Sivan: What the algorithm achieves to what you can achieve through an
off-line fractional solution. Optimal off-line fractional solution. So as you-
>>: And that's essentially times this thing that your computer [inaudible]?
>> Balasubramanian Sivan: Yes. Exactly.
>>: Expectation of the Optimal>> Balasubramanian Sivan: It’s basically the OPT for the expected instance which is larger than
the expectation of OPT. Because the expectation of OPT would be feasible to [inaudible] here,
so the OPT of the expected instance would be only be larger. Okay. So that is why I said in the
first, the benchmark is [inaudible] to a stronger metric is an instance where it is optimal.
Instead of looking at off line optimal resolutions. Okay. So this is what you do if you know the
distribution, which you do not know the distribution, what do you do? So consider running the
following hybrid algorithm, which is a hybrid of the following two algorithms, one is the
hypothetical algorithm, you just press enter. The other is an algorithm A, which is going to be
inductively defined now. And H, i is the hybrid of B and A, which runs B for the first i steps, and
A for the remaining i’s, sorry, A for the first i steps and B for the remaining i steps. It's A for
[inaudible] B.
So if I assume that A has been defined for the first i steps, and then I'm going to define what it
does at i, plus one [inaudible], once the query arrives, it picks the following advertiser. The
advertiser which maximizes the current steps revenue plus the expected residual revenue
which you achieve, you run the hypothetical algorithm for the remaining m minus i minus one
steps. So you look ahead and decide the best thing to do now. And by definition, the hybrid H,
i plus one look at the higher revenue than that hybrid H, i because H, i plus one and H, i, for only
i plus one at step, H, i plus one basically runs this algorithm A, which looks ahead and makes the
decision, whereas H, i does this fewer, the hypothetical algorithm actually looks at the table
and it does something. And clearly this OPT of looking ahead and doing something is better
than looking at the table and doing something.
>>: [inaudible] hypothetical algorithm, it’s basically that you know the distribution [inaudible]?
>> Balasubramanian Sivan: I mean, I want to show that you can do this without knowing the
distribution. But for now, suppose you could do this. Suppose you could look ahead and make
this decision. Then I'm saying H, i plus one is better than H, i. Just i plus one at step is
different, but by definition it is better. Now you can slowly change this step-by-step, just like
you [inaudible] when you distinguish [inaudible] cryptography. At the end, basically the point is
if you run the algorithm A all the way, it is better than running algorithm B all the way, by
definition. And I’m already sure that B gets this square root loss, which is what we wanted. So
you see A also gets it. The question is can you run this algorithm A, because I said you have to
look ahead into the future and decide this expected, computer expected revenue. The key item
is that implementing this hypothetical algorithm actually requires knowledge of the
distribution. But just estimating this revenue doesn't require knowledge of the distribution.
And all we need is to estimate the revenue. Why is that? Because I said this hypothetical
algorithm is a simple balls-and-bins process which throws a query to advertiser A is already B, i
by m. I know the budget’s B, i, I know the number of queries m, so at any given point in time,
given the number of remaining queries, I can estimate the residual revenue with the number of
queries remaining on these numbers. So some of my binomial exclusions, one for each i. So I
can estimate this exactly and the upshot is that you can run the hybrid algorithm without
knowledge of the distribution. All I ask for is these numbers.
One point is that it is not enough to recognize that this p can be interpreted as a balls-and-pins
process, it probably be B, i by m. Even after knowing that, you cannot implement [inaudible]
because you’re not given a bunch of balls and asked to play a balls-and-pins process here. As a
query arrives, you have to decide whom to assign it to. So you can’t render this algorithm B,
but because you can estimate the revenue of B can [inaudible] some other algorithm, which
does as well as B. So that's the point. I will wrap up this analysis by examining some
assumptions I made. One is I said it's uniform distribution. What if it is not? This is, tells sort of
an automated assumption. The expected instance will now have a fraction optimal solution
instead of this integral optimal solution. That is fine, it only changes the, what the hypothetical
algorithm, [inaudible] be an optimized algorithm. But it doesn't change anything about how the
algorithm itself is defined inductively.
>>: So instead of all these algorithms really depend on the [inaudible] distribution, right? I
mean, if the distribution was very easy, say, have very low [inaudible] learn it [inaudible]?
>> Balasubramanian Sivan: Yeah. So the algorithm I presented doesn't depend on the complex,
it doesn't depend on>>: So you could imagine that for second distributions can do better than [inaudible]?
>> Balasubramanian Sivan: No, no. Even if you know the distribution, you can't go beyond
[inaudible] because I'm comparing it with the off-line optimal solution. It's not the online
optimal.
>>: I guess what I'm asking is whether the off-line solution, is it conceivable that off-line
solution can do better than the square root for some distributions?
>> Balasubramanian Sivan: What does it mean, you say off-line will do better than square root?
>>: You have, suppose you know the distribution.
>> Balasubramanian Sivan: Yes.
>>: It’s, so is the square root, the one minus the square root>> Balasubramanian Sivan: Oh yeah. It doesn't hold for every distribution, yes.
>>: [inaudible].
>> Balasubramanian Sivan: Yeah. You could, it's only the [inaudible] distributions for which you
can't go beyond [inaudible]. Okay. And the second [inaudible] I'm trying to make is that the
bids are binary. And this is a more serious assumption. You know what happens if they are not
binary? Well, they could be anywhere between zero and one. You know, you can't interpret
this hypothetical algorithm as a balls-and-pins process anymore because you are throwing balls
of different sizes, so it’s like a fractional balls-and-pins process. And this complicates the hybrid
argument, and you basically have to introduce a third algorithm F and do a two-level hybrid
algorithm to give this proof, and I'm not going to go through that. But you know, you can get
the same approximations for arbitrary bids. [inaudible].
The third assumption is that I said budgets are fully consumed in the expected instance. What
if you don't get full budget consumption? You consume something, which is lesser than that B,
i, the only difference is then the balls-and-bins probability now is probably C, i by m is sort of B i
by m. Okay? You do not know the C, i’s. And this is what our algorithm asks for. These are the
end parameters. If you give me the consumption, optimal consumption in the expected
instance for every advertiser, if you give me the C, i’s, then I can use them to estimate the
residual revenue and run the hybrid algorithm. Okay? So those are the end parameters we
need, and if you give me them, I will give what I promised. And here is an interesting open
question that is immediate from this. What if you simply, as you make a wrong assumption?
Just say that C, i is equal to B, i, right? And then run your algorithm. Does it perform well? We
are able to prove that for some special cases this already does well. And if you're able to prove
that for all cases it does well, then you're basically completely eliminated all dependence of the
distribution. You're doing as well as the [inaudible] case. I think that would be great to prove
that.
>>: So I have a question. If you [inaudible], you wouldn't expect to actually know the C, i’s,
right? But maybe you could estimate them.
>> Balasubramanian Sivan: Sure you can estimate a stand-in for is the C, i’s.
>>: But only if you know the C, i’s within epsilon>> Balasubramanian Sivan: Yes, yes. If you can know the C, i's within epsilon, these results also
will be within epsilon. Yes. Okay. So that is the immediate open question from this.
Now I'm going to generalize this to a much more gentler resource allocation framework which I
say I’ll present later. And this basically captures all these special cases I said, and then online
[inaudible] and so on. So the model is that you have m requests like m queries, you have n
resources, n advertisers. This source i has budget B, i. The major difference is that now every
request part is sort of just consuming the resource from one resource can simultaneously
consume resources from all other resources available. Okay. I'm going to introduce a third
number, which is an option to serve a given request, and if you serve request j with option K,
you get a profit of W, j, K, and it consumes some amount of resource from every advertiser. It
consumes B, i, j, K from advertiser I, the resource i. To have a complete, concrete example in
mind, think of the request as, think of you having a graph and the request says, request to
route from some source in sync the graph; the resources are the H capacities in the graph. The
option is basically which path you have to choose from the source to the sync. So you have an
exponential number of parts to choose. The number of options is exponentially n, and
obviously, different parts consume from different H’s. So that is the consumption information.
And in [inaudible], you have B, i, j, K amount of consumption from resource i. So this is the
general resource allocation framework. And here is our results. Again, the reports are drawn
i.i.d. We do not know the distribution in this case, it's completely unknown distributions. We
don't even ask for any end parameters. In other words, you can use this algorithm for the
previous results I said, the previous problem I said. The only thing is the results are slightly
worse here. You have to depend on n here. Unlike the one minus the square root gamma I
presented earlier. The optimal depends on n, and you also show a lower bound matching
which says that you cannot get beyond one minus square root gamma log n for this is general
framework. This is complete unknown distribution, and also these results hold with high
probability, you know, just in expectation, they hold with the probability of one minus square
root gamma log n.
Now high-level idea is to use hybrid argument again. But the algorithm is fundamentally
different. We use hybrid algorithm not on the expected revenue to present it. But hybrid
argument on the exponential [inaudible] functions used in [inaudible] bounds. To compare it to
what's known, the previous best was one minus square root gamma n, log K, n, and this is, K is
typically exponentially n, like in the parts example, the exponential number of parts. So if you
have two [inaudible], if you bring in n here [inaudible], it’s really square root of n square, R and
n sitting here in front of the hole and the n is gone basically; so n it typically large in the
examples. But this is for random permutations, which as I explained, is a slightly stronger than
the unknown i.i.d. model, [inaudible] so it goes in here is this algorithm, the extent of the
random permutation model with the same approximation factor.
So that's the summary of part one for the special case of adwords where you get matching
upper and lower bounds, but we have the [inaudible] and parameters of distribution. And for
the more the general resource allocation framework, we again get matching upper and lower
bounds. But this is a completely unknown distributions. So this work was done while I was an
intern in 2010, with Nikhil, and thanks to Nikhil, we were able to speak the product groups, and
they basically use this algorithm now for MSN’s display ad serving engine. Okay. For pretty
much the exact problem I said this [inaudible] problem. And it has been globally operational
since the summer of 2011. So these are the open questions, which I already presented, but I'll
repeat them. One is to see whether the i.i.d. And random permutation model have any
separation. No separation is known. My own guess is that [inaudible], our algorithms actually
work for random permutations, but we don't know. [inaudible] dependence on the end
perimeters, the C, i's. So that's it for part one. If you have questions, you can ask questions
about part one, right now I’ll be more into mechanism design.
So I'll now talk about prior robust truthful machine scheduling mechanism. This is giant work
with Shuchi Chawla, Jason Hartline and David Malec. So the problem is the very well-studied
[inaudible] problem in computer science. You have n jobs and m machines. N jobs are
different, m are different machines and your goal is to find a schedule of jobs and machines to
minimize the completion time of the last job. So if you have these jobs and machines, machine
two is basically the Makespan defining machine for this schedule. And this is called an
undulated instance because for any given job that I'm trying on different machines are
completely undulated. So this is the problem we are studying with a twist that you're going to
study this in a strategic setting. This problem was introduced in the seminal paper for
algorithmic design by the Nisan and Ronen. They say that these machines are operated by
selfish workers and the runtimes of jobs on machines are privately held by machines and do not
know these runtimes to begin with. And what we ask for is a mechanism, which is not just a
schedule of jobs on machines, along with that, some payments which you transfer to these
machines to [inaudible] them, to truthfully report their runtimes. This pair of schedule plus
payments is the mechanism, and the machines have their own selfish objective, which is to
maximize the payment they receive from you, minus the work they are forced to do, which is
the sum total of the runtimes of all the jobs they're asked to run.
The solution concept I ask for is dominance [inaudible] truthfulness, which means a mechanism
which, in respect to how other machines behave, will make each machine, each machine is
incentivized [inaudible] to runtimes. This is the strongest notion of truthfulness you could ask
for in mechanisms. Okay? So what's the motivation to study Makespan in a strategic setting?
There are several. One is that that it takes a central computer problems specifically in the
context of resource allocation. Even from an economic point of view, Makespan is, you could
think of Makespan as enforcing some kind of [inaudible]. After all, Makespan is a load
balancing between machines, so it's like a min-max fairness. What makes it really interesting
to me is that it is a nonlinear objective, unlike other traditionally straight objectives in
mechanism design, like revenue and welfare, which are all linear, and for this reason, what is
possible for Makespan is very different from welfare or revenue. Here are some differences,
for example. The social welfare objective, which is the most well studied objective in
mechanism design, you can get truthfulness plus optimality through the celebrator VCG
mechanism. If you didn't care about computation concentrations for a minute, then you can
get a truthful mechanism which optimizes social welfare all the time. But truthfulness plus
optimality itself is impossible in Makespan, as Nisan and Ronen showed in their paper. This
means that even if I give you unbounded computational time, you cannot give me a truthful
scheduling mechanism which will optimize the Makespan. There are some [inaudible]. And I'll
trace through that entire [inaudible] results in the next slide.
Another striking difference is that the kind of impressive reductions that are possible in social
welfare, where you take an arbitrary approximation algorithm and you inject truthfulness into
it, you can morph an algorithm into truthful mechanism and it will preserve the approximation
guarantee of the original approximation algorithm. Such kind of impressive general
introductions are not possible even in the simplest of settings in Makespan. This, again, is a big
difference. And this problem was basically introduced as a challenge problem for mechanism
designed by Nisan and Ronen in their seminal paper. And since then, this has become a hot
area to work on. Let's see what is known. So Nisan and Ronen gave a simple m approximation
in their original paper. But much activity has been on lower bounds. Problem. So the older
papers showed that no deterministic mechanism can get anything beyond a true
approximation. Okay. Even if you give unbounded computational time, there is no
computational restrictions here. And this was later improved to randomize mechanisms also by
[inaudible], and then this bound was improved several times later by [inaudible], later.
[inaudible]. Recently where Ashlagi, Dobzinksi, and Lavi showed that I'll put in a lower bounds
match, basically. And there's a very big harness [inaudible]. Although the [inaudible] restricted
class of mechanisms, anonymous mechanisms. Anonymous mechanisms means that you
should not use the name of the machine to make your decision, which means that if two
machines swap their runtimes, then how you treat those two machines also should be
swapped. Okay?
So given this backdrop of [inaudible] mechanisms we ask in this work if you make stochastic
assumptions can you get prior robust truthful mechanism which gets good approximation for
the Makespan, given all these hard results in the competitive setting? That's the question in
this work. So I’ll now formally present what is a distribution of the model for scheduling? So
you have n job send and m machines and the runtime of job j on machine i is X, j, i, and the X, j,
i’s are independent random variables and here's why [inaudible] distribution. The runtime
distributions are identical across machines. For a given job, the runtimes of the job on the
different machines are i.i.d. random variables, but the jobs themselves could be nonidentical.
So you see there. So basically, the machines are APRE[phonetic] identical. But if you draw the
runtimes from these distributions, any specific instantiation of these runtimes will lead you to
an undulated instance.
So I'm going to use a short form for machines being i.i.d. and jobs being non-i.i.d., but this is
what I mean. For some results, I also need the jobs that are also i.i.d., but I'll present them
when I present the results. The goal is to minimize the expected Makespan.
>>: And do you know the distribution?
>> Balasubramanian Sivan: No. I want to get the prior robust mechanism, which means for all
possible distributions, we just have to [inaudible] these conditions: independence and i.i.d. So
this is what I showed you already, that the upper and lower bounds match in the competitive
setting for a restrictive class for the lower bounds. And the stochastic setting is a completely
open, which means if I gave you the distribution, can you do something better that's not
known? Here's our results. They give a prior robust truthful mechanism which gets an R of n by
m approximation for our path. n is the number of jobs and m is the number of machines. In
particular, when n is equal to m, this is like a constant factor approximation. And this lower
bound there also holds for n is equal to m. So basically you can get a constant factor for those.
The benchmark for S is OPT-half, it is basically the non-truthful, OPT is allowed to be nontruthful. This is a non-truthful expected optimal, but it is allowed to use at most half the
machines. So it’s like a resource augmentation result. So I'll explain this benchmark OPT-half in
a bit. But this is our first result. And later we show that you can improve this to sub logarithmic
factors if you assume that the jobs are also i.i.d. And make for the distribution assumptions.
So those are our results. I'll now go to through this, OPT-half benchmark and explained it a bit.
As I said, OPT-half means that OPT is allowed to use, at most, half the number of machines
which you are allowed to use. So if you get good approximation to OPT-half, it means that you
can do well with resource augmentation. If you double the number of machines, you can
perform approximately as well as OPT. In general, this OPT-half could be much larger than OPT.
But we show that for a large family of distributions, OPT-half and OPT are within constant
factors, which means that there's really no result augmentation research for these distributions
because there is only constant factor of A for more [inaudible] distributions. Intuitively, these
are distributions which have [inaudible] no heavier than exponential distributions to use
uniform [inaudible] exponential. This is basically true, and even for a larger class of distribution,
much larger class, OPT-half and OPT are within constant factors. I wn’t to talk about that class
now. So that is about the benchmark OPT-half.
So I will now sketch the proof of how we get this R,f, n by m approximation. Vertically I'm going
to assume that n is equal to m and then I'm going to shoot for a constant factor approximation
in the proof. I'll sketch the proof. To get a flavor of what is truthful and what is possible in this
[inaudible] parameter setting, let's begin with a very simple mechanism which is truthful, and
this was introduced by Nisan and Ronen, this mechanism is called a Min-work mechanism. It
just minimizes the total work done by all the machines, which means that it tries to minimize
the sum total of the runtimes of all the machines. This has nothing to do with Makespan. So
whenever a job, I mean it takes a job, and basically schedules a job on the machine, which has
reported the smallest runtime for that job. Okay? That's what it means to minimize a subtotal
of all the runtimes. And it pays the machine the job’s runtime on the second quickest machine.
So this will result in a truthful mechanism, if you just take my word for it, I won't prove it.
>>: [inaudible]?
>> Balasubramanian Sivan: Exactly. Yes. And that also minimizes the total global runtime also.
>>: [inaudible]?
>> Balasubramanian Sivan: No [inaudible] each job, but that’s exactly what you'll do if you
want to minimize the total runtime. You're not doing anything global actually. You can do
locally to minimize the total runtime.
>>: But if you were [inaudible]?
>> Balasubramanian Sivan: This is exactly VCG, basically it’s VCG, we are doing a rewards
auction here. So instead of maximizing welfare, you are minimizing the burden, basically. So
you’re minimizing the total runtime.
>>: One more question. So the relation principle doesn't apply to certain objective functions,
that's why you can [inaudible]?
>> Balasubramanian Sivan: No. The revelation principle applies, but that just says that you
need whatever you can do with the non-truthful mechanism you can also do with the truthful
mechanism. But, I mean here, this objective is a different objective. I do a minimizing the total
work, my goal is to minimize Makespan. I'm simply getting a mechanism which is just truthful
and it has some nontrivial approximation for Makespan. That's all. You can use this mechanism
for Makespan, but it will be bad. That's what I'm going to talk about now.
So this simple mechanism also ready gives m approximation where m is the number of
machines, the simple thing because the Makespan is just a max. Max is obviously lesser than
the sum. The sum of the quickest runtimes always less than sum of the runtimes of all jobs in
OPT. Which is, at most, m times the Makespan of OPT. Okay?
>>: This was an example?
>> Balasubramanian Sivan: This was an example. You get m approximation. It is also truthful.
So what else is truthful, as to get a sense of what is truthful? It turns out that if you minimize
the total work over a constraint, restricted domain of schedules that are also truthful, okay?
You put a pre-specified restriction on what kind of the schedules are possible before you ask for
the runtimes. Then try to minimize the total work over this restricted domain of the schedules
and that is also truthful. So the actual question is why can’t you try this simple mechanism
first? Maybe it does better in a stochastic setting because that was for a worst-case I think that
an m, I showed an m approximation of Nisan and Ronen. So let's go this simple example, you
have m jobs, m machines. Everything's i.i.d. All jobs and machines runtimes are only found
between one and one plus epsilon. Now the quickest machine for a job is a uniformly random
machine, right? Which means that the Min-work mechanism [inaudible] quickest machine is
basically assigning a job to a random machine. So your basically throwing balls randomly into
bins and the fundamental factor in the balls-and-bins analysis is that the heaviest loaded bin
has log m by log, log m was, implications in logarithmic approximation is trivial, but we're
looking for much better approximations like constant factor or at least sub log to make
approximation. Okay?
So what are the problem with this Min-work mechanism? It is basically overcrowding
machines, like what is happening here. It gives too much importance to putting a job on the
best machine, quickest machine possible. So a natural next step would be to explicitly prevent
this overcrowding. So what if you say that you should not schedule more than K jobs in any
machine? Okay. So that is a restricted domain of schedules. You run the same Min-work
mechanism, you try to minimize the total work over this restricted domain of schedules,
[inaudible] most [inaudible] per machine. That's what they call Min-work, K mechanism.
Minimize the total work with utmost K [inaudible] per machine. This is truthful because the
restricted domain of schedules and is also a perimeter of time because it's a minimum size
matching problem. Each machine has a duty of K. The claim is that this Min-work mechanism
gives a constant factor approximation for OPT-half. The proof i.i.d. Is basically, roughly these
two steps. First is that Min-work mechanism results in a roughly balanced schedule unlike work
[inaudible] that lopsided schedules that [inaudible] the Min-work mechanism. Secondly, in
spite of the fact that I put a restriction on the number of jobs that go to each machine, it is not
necessary that hereafter a job goes to the quickest machine, still is basically a job roughly goes
to the quickest machine.
>>: What is K, again?
>> Balasubramanian Sivan: K is the number I'm going to put, the restriction I'm going to put
[inaudible]. Most K jobs can go to any particular machine. You cannot crowd beyond that.
You’re optimizing-
>>: [inaudible]?
>> Balasubramanian Sivan: I'll choose K, I’ll make K, 10, basically. I mean K is a constant, and I
can choose K to get a constant [inaudible] approximation, however I’ll still use K equals 10.
>>: [inaudible]?
>> Balasubramanian Sivan: So m is equal to n, but if you have more jobs than machines and n
times n by m number of jobs that's the restriction I put. Is the average basically. In n by m. So
here it's 10 and that's with R of n by m factor comes out with. Okay. So the second point is
that do a job necessarily, not necessarily goes to the quickest machine, roughly goes to its
quickest machine. So here's a proof sketch I'll just presented the two key lemmas. The first
lemma is the probability that the job goes to its i, its favorite machine. It ranks the machines
according to runtimes. It's basically exponentially decaying the rank of the machine. So it
doesn't go too below in its preference order. The second point, second lemma basically
complements the first lemma, that placing a job on its i at quickest machine is no worse than
basically five to the i independent copies of a job on the quickest machines. So OPT, basically
the only thing that [inaudible] OPT is that it puts all the jobs on the quickest machine so that's
the best thing that is possible for OPT. But it’s the quickest among m by two machines, okay?
So this i [inaudible] order statistics of phrased purely as a probabilistic result, to say that i
[inaudible] order statistic is almost stochastically dominated by an exponential number of five
to the i independent copies of a order statistics. And i [inaudible] order statistic is among m
independent [inaudible], but the first order statistic is handicapped with m by two
independent[inaudible]. That is a purely probabilistic version of what we're showing. And
these two, basically you can see that these two compared together to give a constant
approximation, because the i is actually the rank which the job goes is exponentially decaying in
i, and this is we have an explanation number here.
So I'll just summarize part two. The lemma machines that i.i.d. And the jobs on i.i.d., they give
a prior robust mechanism which, being blind to distributions, it can give a constant factor
approximation where n is equal to m where R of n by m approximation. And as I said, OPT-half
and OPT are comparable to a large class of distributions. So it's not results augmentation for
dual distribution at least. And when the jobs are all i.i.d., you can further improve these two
sublogarithmic approximations. Again with the prior robust mechanism.
So I'll wrap up with open questions. And one obvious open question is how broad a class of
distribution can you tackle with prior robust mechanisms? Okay. In particular, can you relax
this jobs being i.i.d. Here and can you have non-i.i.d. and so give this sublogarithmic
approximations? Other question is, I mean does this have much beyond the scope of this work
at least? You know, what if you relax i.i.d. assumption on machines? Everything I said breaks
down there and you need completely fresh ideas.
The second improvement, the second open question is can you do what is possible for the
competitive setting? [inaudible] on the lower bounds match at least for this anonymous
mechanisms. Can you slightly relax the model and get positive results? Here is one relaxation
possible. Computers jobs, you know computer programs need not necessarily run on one
machine. You can run the same program on multiple machines. And you can take the
completion time to be when the first copy of every job completes. It’s not a [inaudible]
constraint. This will help in insider compatibility. Now you can ask what's the user scheduling
that job [inaudible] machines? But this will help in user compatibility; for example, you can
schedule all jobs on all machines. There’s a possible schedule. That is truthful, the only thing it
does too much work. If you also put a restriction on the maximum amount of work that can be
done and a lower job to be scheduled by machines, can you do something better?
>>: You’d have to create each machine [inaudible]?
>> Balasubramanian Sivan: Yes. It could be possible that>>: [inaudible]?
>> Balasubramanian Sivan: But you want to put the restriction on the total work to be done.
You simply schedule all jobs and all missions then that is possible to get the best Makespan
possible by different [inaudible] order. That is truthful knowing that for any machine because
he don't know the runtime of machines at all. But it does, every machine does too much work.
You put a restriction on the maximum work that can be done. And that is one way to
circumvent the lower bounds competitive setting.
So I'll just briefly mention a selection of other resources that have done before, measuring
future research directions. One problem that I recently worked on is the design of optimal
crowd sourcing contests. And crowd starting contests, basically a principle or a firm has a task
to be completed and it advertises this task to a crowd, and it puts a reward, and these users
are, or the crowd basically have submissions for this task, and you evaluate the task according
to some pre-specific criterion, and spread the rewards among the winner or some winners.
And so the obvious question which arises is what format of a contest to do you use to
incentivize users to give good submissions? In particular, the question we ask in this work is
how do you get optimize the quality of the best submission you receive? A good example to
have in mind is the contest, which many of you may know, that Netflix ran recently. Netflix ran
a contest to improve the prediction accuracy of their algorithm. They wanted to basically
predict how much you given user will like a particular movie based on how much he liked a
given set of previous movies. And they promised that any user who improved the prediction
accuracy of the data algorithm, first used it to improve the prediction [inaudible] of the data
algorithm, but ten percent will get 1 million dollars. This contest is over now. So that is one
model. So what is the best contest format? So for the model we use, we say that research is
basically that there's a very simple contest format which optimizes the quality, the best
submission, which is basically you get all these submissions and you segregate these
submissions in buckets. To go back to the Netflix example, you segregate all improvements
between 10 and 12 percent in one bucket, 12 and 14 percent in one bucket, and all those users
who fall in the biggest bucket, will basically share the reward equally. And that is the optimal
format. It’s what we prove.
>>: Why is it better to [inaudible] hold him out to the winner?
>> Balasubramanian Sivan: Okay. So there are two things. One is, what is the best thing to do
if you run a static contest? Static contest means you should not decide how much amount
you're giving to which person after you have received the submissions. This is what TOPCOR
does, for example. It turns these architecture competitions where it says, two thirds of the
award will go to the best submission, one third of all award will go to the second best
submission. Among all those static contests, we proved that it is best to give everything to the
winner. Winner take all is the best. But obviously, a dynamic contest, where you are allowed
to see the submissions and then decide what to do later, will do better, right? For example,
when I said 10 and 12 percent, 12 and 14 are different buckets, the number of users who fall
within 12 and 14 will basically, very best of the submissions. After that if I do something the
dynamic is always better than static.
>>: [inaudible] you're saying this is better than giving everything to the highest [inaudible]?
Why is that?
>> Balasubramanian Sivan: Well, I mean the buckets we designed our basically bins of the
distribution, and so far, some distributions it will turn around in the buckets are all
independent, there's no bucket at all, which means that everything will go to the winner. That's
also possible. For many so-called regular distributions it might turn out that there is no
bucketing. But if the distribution doesn't, there is a [inaudible] allocation function and we do
what is called ironing of the allocation function. That results in bucketing. But it suffices to say
that it happen often that you will give everything to the highest bidder. Highest>>: [inaudible]. There's some situations where you don't want to do that. And that's what we
are more interested in. Can you explain, [inaudible]>> Balasubramanian Sivan: It depends on the distribution basically you, why that arises>>: [inaudible]?
>> Balasubramanian Sivan: So that agents have skills, basically. So you can think of the skill as
the rate at which an agent can be used for work. The quality of the submission is the product of
the skill and effort of the agent. Okay. So it’s a linear model. So skill, v times e, effort B, which
is the quality of the submission. And you have distribution over these skills. So you can map it
to auction theory, we basically, extend auction theory to more of this. Now the only thing is in
auction theory for revenue, you study the sum of the payments, you distribute the maximum
payment. So you could implement the same thing instead of, you know, dividing all this
revenue in the highest bucket equally, you can also break ties, but you can, you should break
ties basically in a consistent manner. Instead of dividing things equally, you can always say that
if all these people fall in this bucket, I'll always give everything to advertiser one if he falls in this
bucket. And then to advertiser two. So that is a way to run these things without [inaudible].
>>: [inaudible]?
>> Balasubramanian Sivan: No, even for contest distributions, it could turn out that you should
have to split rewards equally between one to two players who fall in the highest band.
[inaudible]. What is the example for, I can't think of it a good example for why this>>: [inaudible] then just giving it to the best?
>>: So even with your distribution with, so it’d be that some people they don't even want to
buy [inaudible]; they don't have its real chance of winning. But you're giving [inaudible]?
>> Balasubramanian Sivan: Yeah. So I have a mathematical answer, but it, I don't have to give
the intuition. So let's say, I’ll give from auction theory what is going on, basically. So if you,
truthfulness in single parameter settings [inaudible] it only has one private parameter. It
basically, you can give it [inaudible] in terms of allocation. So if this is v and this is X of v, then
the allocation function has to be basically [inaudible]. And if it results in more than [inaudible]
allocation function than it is truthful. But if you basically have a non-monotone allocation
function like this, but you can do is non-monotone function, the goal is to do optimization
incentive [inaudible] condition, then you basically iron this allocation function, which means
you basically flatten it out in this this region, which means that all values which fall in a
particular region are treated equally. Whenever you iron an allocation function that is going to
be discontinuity in how you allocate. The reason is, so far you are competing only with these
set of people, people with values here. If your values slightly increase, then you're on par with
all of these people. Okay? So clearly, the probability with which you're going to get served will
increase. Similarly here, if your value slightly increases, then you suddenly bypass all these
people because you only have a [inaudible] competition here. You're not treated at par with
these people, so there's going to be a discontinuity in allocation function whenever you try to
flatten the allocation through. Does this continuity allocation function will ultimately result in a
discontinuity in payments? The question is how do you ensure that this discontinuity in
payments? The answer is you basically explicitly incentivize people to bid in certain agents.
Discontinuity in payments means that there are some forbidden zones of possible payments.
So if I say that all improvements between 10 percent and 12 percent will be treated equally for
Netflix, nobody will try to give 11 percent accuracy. Everybody will go to 10. Which means the
region between 10 and 12 is a forbidden region of payment. Nobody will participate if you give
that number here. So to get what is available here, to mimic that in this contest setting, you
basically put these restrictions so it flows out of this theory. But I'm not able to come up with
an example where, it's more intuitive to do this. This is a theory though. The discontinuity has
to be modeled. And I hope you can see that this basically ensures discontinuity. Nobody will
give anywhere between 10 and 12. People will always keep to the lower end of any term.
Right? It's a waste of effort.
Other work is what I did, again in 2010, when I was an intern. And the motivating question for
this work is that computing truthful payments is often more difficult than computing the
original allocation function itself. Oftentimes, truthful payments is only a secondary
consideration. The original problem is a problem to be considered, but the secondary thing is
basically more difficult than the primary problem itself, and this is a striking observation which
needs an [inaudible] and seminal paper where, for many problems, it seems like computing
truthful payments is much harder than computing the original allocation. Can you compute
[inaudible] payments as fast as original allocation? That's a question we asked. The answer is
yes they can be if you're willing to go ahead with the more relaxed notion of truthfulness and
truthfulness of expectation. And you can do this both for single parameter domains, which is
answered by asking [inaudible], and we also due for multi-perimeter settings. And this is
application to [inaudible] auctions where you do not know the parameters.
And I'll just mention one other result. This basically has to do with revenue of maximization in
multi parameter settings. This has been an open question for long in mechanism design. If you
just have one parameter, then you know how to optimize the value. But if you have multiple, if
agents are in pursuit of multiple items and they have different values for different items, then
this question has been open for long. And we basically showed that you can get very simple
mechanisms with approximate the optimal revenue by just posting prices on items and allowing
agents to take their favorite items. And because it's been open for long, there's been several
follow-ups after work including work on several directions. For example, you make some
interesting connections to profit inequalities and statistics. There is an open question on a
paper, dependent on improving profit and equality, and basically created a new interested in
profit inequalities, and the state of the art seeing improved profit inequalities so the open
question was answered.
Okay. So I'll conclude with future research directions. One thing which is obvious to people, if
you see what has been going on the resource allocation settings is that all these problems have
been [inaudible] assumptions [inaudible]. Game theory has been completely ignored in these
problems for the obvious reason that they non-Game theory problem was already nontrivial.
But the picture is finally complete now for the non-Game theory setting. So the goal is to bring
back Game theory to online algorithms, online [inaudible] problems. I'll leave with one simple
problem where you could try bringing back Game theory in. You have the same display ad
setting, m requests and n advertisers. Advertisers want, i wants that most B, i impressions. The
only difference is that these bids tell you ideas are privately held. And let's assume that the
queries are drawn i.i.d. from an unknown distribution. If you just look at a purely online setting
without any Game theory, the algorithms I present are basically, solves this problem, but
arbitrarily close to one approximations. If you look at a purely off-line setting, without any
Game theory in it, then the VCG mechanism gets a one approximation. But if you mix these
two, if you mix online and Game theory, the constraints, we don't know how this works. Can
you get arbitrarily close to one? Or maybe not. Then you have a harness of being truthful.
This is just like in many problems whether there are two constraints, and one constraint alone
can be treated, either of them, but if you mix them, you cannot. And this has been the prime
agenda in mechanism design where computational and incentive constraints can be treated
individually well, but if they are combined, you don't know how to solve them. There's a
tension. Similarly here, there's a sort of computational incentive; you have online and incentive
constraints. Is there a separation here? Is there a harness of being truthful? What can you do
even if you know the distribution that is not known and that is unknown distributions and
separation between i.i.d. and [inaudible]? There’s a whole list of questions here. That's it.
Thanks for visiting. [applause]
>> Nikhil Devanur Rangarajan: Any more questions?
>>: So like from the standard i.i.d. model [inaudible] which cannot [inaudible]
>> Balasubramanian Sivan: No. It's not clear that you can get that with in theory because all the
[inaudible] in theory is purely out of algorithmic.
>>: [inaudible] main question is in the, are you trying to optimize in the [inaudible] setting expected
value of the algorithm or expected sort of optimal value over the distribution, so what if you take
expected value of the ratio instead of the ratio of expectations just sort of in a way -- if for example, the
optimum values of your instances can differ a lot, right, so you have to test every region over all of them
and then take expectations [inaudible]
>> Balasubramanian Sivan: That's true. That's a stronger thing, yeah. That's [inaudible] for example
to me is a hyper [inaudible] set which is better than the ratio of expectation you know. This holds with a
high probability that you get this one minus square root of gamma again and that is our stronger
benchmark and yes, the one minus square root gamma presented depends on using the ratio of
expectations, so I'm not sure, yeah.
>>: In comparable [inaudible]
>> Balasubramanian Sivan: Yes but I mean for at least the kind of instances he said were [inaudible]
variable different things. Could be that that is stronger.
>> Nikhil Devanur Rangarajan: Anymore questions? Let's thank the speaker again. [applause]
Download