Document 17864659

advertisement
>> Jin Li: Hello everyone. It's our great pleasure to have Professor
Baochun Li to come and give another talk at Microsoft Research.
Baochun is, let me see, Bell Candidate Endowed Chair in computer
engineering of University of Toronto. He did a number of excellent
works, including put network coding into UUC engines for peer-to-peer
video delivery basically and UC is a company in China. Baochun not
only write a large number of papers, he himself write a lot of code and
his code is excellent quality and I have seen some of that. It's great
to have excellent research scientist which combine excellent research
skills and coding skills.
Without further ado, let's hear what Baochun has to talk about
optimized datacenter operation with practical complexity.
>> Baochun Li: Thank you, Jin, and it's always my pleasure to be here
and I visited Microsoft Research quite a few times. And today what I'm
going to do is I'm going to talk about some of my recent work, jointly
done with my Ph.D. student, and his name is Henry Hong Xu, and he just
graduated and is going to be joining City University of Hong Kong as
assistant professor. It's on optimizing datacenter operations with
practical complexity.
So what I'm going to talk about today is going a little bit beyond
optimality. And this is our recent effort of trying to do something
that is more than solving problems to achieve optimal solutions.
So the context of this particular talk in our work is related to data
centers. And, of course, these days there are lots of these metadata
centers. For example, this photo is from one of the Google data
centers, and each one of these, they're processing a lot of requests.
They're doing a lot of work. For example, we have from New York Times
some news related to the number of Web pages and the number of video
views and the number of search queries, they're all in the order of
billions every single day. So it's a lot of work done by the data
centers. And what we care about in research is related to performance.
So we care about performance. And related to data centers, there are
two things we care about the most. One is related to datacenter server
utilization. We want to increase the utilization of servers. And the
other is to reduce the consumption of energy and in general reduce the
operational costs related to these data centers.
So if you actually take a look at the related work in the literature,
we saw a lot of papers on optimizing costs and improving performance of
data centers. And all of these or most of these are related to
optimizing some of the operations in the data centers to achieve some
objectives. And optimization theory is widely used in these papers.
For example, if we actually do a very simple Google Scholar search
related to the keyword datacenter optimization, we can see an
increasing trend of the number of papers related to datacenter
operation, the optimization of the actual operation of these data
centers.
But we have to watch out here. We have to watch out for something. We
have to watch out for complexity. So what we care about is we care
about performance. But however as we achieve additional performance,
we actually move up in the actual operational points.
So we actually tried to achieve more performance by going for more
complexity in our algorithms. And the question that I have here today
is should we actually operate here where we actually get the maximum
performance but a bit higher complexity or should we do a little bit of
trade-off and try to reduce the complexity probably by a wide margin
and without sacrificing much performance margin.
So that's something that we'd like to do today. So here's an outline
of this particular talk. So the first I'm going to talk about within
the single datacenter. So I'm going to talk about the traditional
problem of virtual machine placement. So I want to talk about placing
virtual machines on to physical servers. This is of traditional
problem, and basically the problem is that we have a large number of
virtual machines and we want to decide which physical server is going
to run these virtual machines and this is in general related to a
problem in the category of resource management, related to the data
centers.
And the second part of the talk, probably about one-third of the time,
at the end of this talk, I'm going to very briefly expand on that and
try to talk about multiple data centers with users and trying to make
requests. And these requests are going to be satisfied by multiple
data centers. And the response will be fed back to the users. And
this is typically related to multiple geographically distributed data
centers. This is in the category of workload management.
So two parts. Resource management and workload management. So let's
get started from resource management first. What we care about is the
traditional problem of virtual machine placement. And this is the
problem we have selected to show how do we actually provide a better
trade-off in terms of complexity versus performance.
The traditional problem here of making decisions related to VM
placement, it typically has a direct impact on the performance of
applications running on these data centers. For example, if these
applications are Web services, then the CPU frequency and utilization
affects the actual performance. If these are databases, then the disk
IO and memory throughput will affect the actual performance of the
applications.
So there's a lot of prior work in this particular area, and if we
actually do, for example, a very simple Google Scholar search for some
of the keywords related to VM placement and optimality, you can see a
lot of papers in this particular area, but a lot of them are using
combinatorial optimization. So they're trying to solve optimization
problems to achieve optimal solutions. And the problem here is
complexity.
So we take a look at one of these papers that summarizes the running
time of different algorithms in this particular area, VM placement.
And different algorithms for about a thousand VMs. They actually try
to run for more than 15, almost achieving 30 minutes, depending on the
algorithm, and to us I don't think this is actually satisfactory.
For a thousand VMs you have to run this algorithm for half an hour.
And this is not satisfactory. So the problem here is complexity. So
we care about complexity.
So what we wanted to do is that we wanted to trade off a little bit of
complexity but we don't want to sacrifice too much performance, but
obviously we cannot achieve optimality. So we want to be used stable
matching theory. Stable matching theory has been developed back in
1960s by Gale and Shapley, and this particular algorithm is a very
simple algorithm to achieve stable matching.
It's so simple if we have NVMs and M servers and we actually run the
stable matching, the algorithm that actually achieved stable matching
which I'm going to talk about briefly next with an example, we actually
have all the order of N times M in terms of the time complexity. So
this is a very simple algorithm.
And the traditional model of stable matching is in the college
admissions kind of model. And this is about admitting students to
colleges and each college can admit multiple students and each student
can only attend one college.
And the input to the algorithm would be the preferences. So the
college they have ranking of students. The student they have ranking
of colleges. And the output of the algorithm is what we call stable
matching. I'm going to talk about next as a solution concept of the
actual output.
So VM placement is also a stable matching problem in the sense that
each server can actually accommodate multiple VMs and each VM is going
to be placed at one server. So here's a ploy example of stable
matching as a concept. It's a solution concept. Suppose on the
left-hand side we have the virtual machines and right-hand side we have
the servers, and each virtual machine has a ranking of its preferences
of servers, and each server has a ranking of its preferences of virtual
machines.
So the college admissions problem and the seminal Gail and Shapley
paperback in the 1960s, actually they just won the Nobel Prize of
economics using this particular piece of work, is to achieve stable
matching. What do we mean by stable matching?
Let's consider one matching, one possible matching, which is to match
V1 to S1, V2 to S2 and V3 to S3. This is one possible matching.
And we claim this matching is not stable. And the reason this is not
stable if we consider V2 and S3, we consider V2 and S3. These are the
two nodes that if we actually tried to think about V2 it's currently
matched to S2, but it prefers S3 to S2.
If we think about S3 currently matched to V3 but it prefers V2 to V3.
It's mapped to V3 but prefers V2. So in general if we actually tried
to establish a matching between V2 and S3, we actually have an unstable
matching, because each one of them they prefer a different partner than
it's current partner. So this pair of V2 to S3 in this particular
example is called a preference blocking pair. And a matching is only
stable if there does not exist a preference blocking pair. So if we
can find something like this, it's not a stable matching.
Now, the algorithm is to try to achieve what it tries to compute is a
stable matching result of the matching of all of the matching, and the
algorithm is called deferred acceptance. This is proposed by Gail and
Shapley. And here's an example of running this algorithm. So the
algorithm is trying to say let's try to let the virtual machines first
propose to the servers.
And each one of these virtual machines obviously in the first iteration
is going to propose to its most preferred server, V1 proposing to S1,
V2 to S1 and V3 proposing to S3.
And what we have here is we have this first iteration and the servers
is going to choose their preferences. So S1 currently receiving two
proposals is going to choose V1 as compared to V2. So S1 is going to
reject the proposal from V2 and accept V1. And then V1 is going to
cross S1 out of its list, and in the next iteration it's going to
propose to S3. Now, S3 previously has accepted the proposal from V3
but now it has actually two proposals here and it's going to evaluate
and try to take a look at its preference list and say, you know, V2 is
better than V3. So it's going to reject V3, accept the proposal from
V2, and now V3 is going to cross S3 out of its list and V3 is going to
try to propose to the next one on this list which is S2, and deferred
acceptance is guaranteed to terminate based on the seminal paper of
Gail and Shapley. And if it is the simplest form of stable matching,
which is one-to-one, it is proven that using deferred acceptance stable
matching can always be found. Of course, there's not only one stable
matching. There could be multiple of these. But one of these can
always be found using this deferred acceptance algorithm. And if we
have the one to many college admission, this model is essentially the
same. Using deferred acceptance plus stable matching can always be
found. So in our situation, we claim that classical, the classical
model does not apply. What's the problem in our situation? The VM
placement. Well, our problem here is that previously when we talk
about the matching between VMs and servers, we kind of assume that the
VMs, they consume the same amount of resources. Now in reality VMs
they actually require different amount of resources. In addition to
that, the servers, they actually can have different capacities as well.
And the size heterogeneity of both VMs and actually servers, it's a new
and difficult challenge of stable matching theory. So what we wanted
to do is we want to adapt the algorithms and achieve stable matching,
well, we first have to define what stable matching means if we have
size heterogeneity of both VMs and servers. So VMs they require a
different amount of resources and servers have different capacities.
So we want to develop a new stable matching theory with size
heterogeneity, and here's an example of what we have.
>>: So question, so the college student admission exam, students have
preference for schools, right?
>> Baochun Li:
That's right.
>>: So VMs, why would we have preferences over servers?
>> Baochun Li: So the question is why do we have preferences for VMs
to have preferences of different servers. Well, the VMs, they are
tenants, different applications belonging to same tenant, different
applications they could have co-location preferences and tiered
service, for example, and we believe that we should allow the
applications or the tenants to express these preferences.
So not only the servers have preferences of VMs but the virtual
machines also have preferences of the servers.
>>: [inaudible] in the sense that might have wanted some together.
>> Baochun Li: Exactly. So the question is, the preferences can be
quite complicated. So I'm going to very briefly talk about that we
have an engine that actually tries to convert policies to the list of
preferences. And we have to be able to do that in order to use this.
So we have to be able to map the actual policies. For example, load
balancing consolidation into the preferences, because it's really -it's pretty flexible to actually do that.
Now, what we have here is that we have a new model. We have the new
model with size heterogeneity. And what we have here is that different
virtual achiness have different sizes. Right now virtual machine has
size one as compared to V2 and V3 both of which has a size one. And
then we have the capacities. Different servers have different
capacities. Capacity 2-1-1. So in this new model, again, we could
consider one kind of matching, matching V1 to S1. Both of them have a
size two.
And V2 to S2 and V3 to S3. So that's something we could do. And then
we could also think about the typical notion of a preference blocking
pair. So consider V3 and S1. V3 currently is mapped to S3, and V3
actually prefers S1 more than S3. S1 is currently mapped to V1 but it
prefers V3 to V1.
So in this case this is a traditional sense of preference blocking, so
this is a preference blocking pair, V3 and S1. However, we have to
change the definition slightly in the sense that if V prefers S. A
virtual machine prefers a server, server prefers a virtual machine to
some of its VMs because a server can accommodate multiple VMs because
of the different sizes, different capacities of the servers.
And by rejecting them, some of them, you know the server can run this
particular virtual machine. And that's kind of the slightly revised
definition of preference blocking, and we could use that to define
preference blocking pair and if there's a preference blocking pair,
obviously it's not stable.
It's not a stable matching. So the objective is to find a stable
matching in this new model with size heterogeneity. And unfortunately
the deferred acceptance algorithm does not work. Does not work in this
particular situation.
So let's think about one example here. So we have in this example four
different virtual machines. The first one has a size two. The next
three had size one.
Three servers, server one and server three had size two. Server two
has size one. And each one of them had their preferences.
So let's try to run the deferred acceptance algorithm using this
particular example. In the first round, each virtual machine is going
to propose to their most preferred server. So V1 to S1, V2 to S1, V2
to S1 and V3 to S2 and V4 to S2. And now S1 is going to take the
proposal from V1. Reject the proposal from V2 because V1 really has a
size two. S1 doesn't have additional capacity to run additional
virtual machines. And V1 is more preferable than V2 for S1.
And now after S2 rejects the proposal from V3 and accepts the one from
V 4 tentatively, V3 crosses S2 out of its list and in the next round V3
is going to propose to S1. Let's see what happens when V3 proposes to
S1.
So V3 is more preferable than V1. S1 currently has V1. V1 with a size
2. S1 with a capacity of two. So S1 in this particular case is going
to reject V1. And they're going to accept the proposal for V3. Reject
V 1 accept V3.. and by doing so as one's capacity increases. Before
accepting V3 it has no additional capacity, right now because V3 has
the capacity of size of one, S1 had one additional capacity.
And because of that, after additional round, V 4 is going to propose to
S1. When V 4 proposes to S1, S1 has this additional capacity. So S1
is going to accept the proposal from V 4. It's going to accept the
proposal from V 4.
So S1 right now will have both V3 and V 4. So after S1 accepts V 4
you're going to see there's a problem here. The problem here is S1 had
previously rejected V2. But now it accepted V 4. Previously when it
rejects V2 it's because it does not have additional capacity. So it
cannot actually accept V2. It prefers V1 to V2. Now it accepts V 4.
It's because that it does have additional capacity after it rejects V1
and takes V3. So in that sense we do have a preference blocking pair
which is V2 and S1. V2 prefers S2 -- S1 more than S2. And S1 prefers
V2 more than V3. But more than V 4. So that's the preference blocking
pair. And that's something that we don't want to see.
So it's not a stable matching after running the deferred acceptance
algorithm. It only happens with different virtual machine sizes when
we have size heterogeneity. And this is no good. We don't want to
have this.
So how do we actually try to change this? Well, the intuition here is
to revise the deferred acceptance algorithm. Whenever a virtual
machine is rejected, we want to remove any less preferred VMs from the
service preference.
So what we want to do is we wanted to say can we actually, after S1
rejects V2, it's going to remove V 4, which is less preferred than V2
from its preference list. So if that is done, after running all of the
iterations of this deferred acceptance algorithm, we can see that V 4
is going to be left unmatched.
After V 4 is left unmatched, the result is still stable. It's still
stable matching. So the revised deferred acceptance algorithm of
crossing out the less preferred VMs after we reject a particular VM is
going to reach stability in its result of matching. The VMs to the
servers.
But here it's kind of feels like it's kind of strange because V 4 is
left unmatched. But it's still stable. Remaining matching is still
stable. So this is correct, which means that the revised deferred
acceptance is correct algorithm of achieving stable matching.
So we have proved this theorem, the revised deferred acceptance always
find the stable matching in the same time complexity as the original
stable matching. And the stable matching can always exist in our new
model.
Now, if we take a look at the previous example, we can see there does
exist some kind of a better result of stable matching. So what do we
mean by better? Stable matching is better than another one if each
virtual machine is at least as well off. So let's take a look at one
example. This particular example here. Think about V2. V2 is
currently matched to S2. But it actually prefers S1 better. So it
prefers S1 than S2. Now, V2 is currently matched to S2 out of the
revised deferred acceptance algorithm. But if V2 actually tries to be
matched to S1, it's going to be a better -- because V2 is better off.
Nobody else is worse.
So in that sense, because S1 has currently is mapped to V3 it has
additional capacity to actually accommodate V2, we call it capacity
blocking. So it's not preference blocking it's capacity blocking in
the sense it's better off because S1 does have additional capacity.
So if we do something like that, then we're trying to say, okay, we can
actually make the stable matching better in this sense of using the
capacity blocking as the notion of improving it, we could achieve
optimal stable matching. So the goal here is to find an optimal stable
matching that does not have capacity blocking pairs.
And what we wanted to achieve is we wanted to devise another algorithm,
to achieve optimal stable matching, and this algorithm is a multi-stage
deferred acceptance, and the intuition is very simple. We wanted to
iteratively improve the stable matching by allowing the VMs to
repropose to the servers.
So we want to run multiple stages. In each one of these stages, we
want to run the revised deferred acceptance with selected VMs and
servers.
And the theorem here is that we can prove is the multi-stage deferred
acceptance, it actually finds optimal stable matching. So this is
something that we can actually try to get.
So what we're doing here is that we're trying to devise new algorithms
to actually find stable matching, in this case optimal stable matching,
with size heterogeneity in the VM placement problem.
And what we did is that we actually run some of the simulations,
trace-driven simulations, to actually try to evaluate the performance
of the result of running these algorithms.
What do we mean by performance? We actually try to evaluate the result
of the matching and see how good it is. And we try to find out how
good it is by looking at the VM priority, the application type, the
server attributes. We try to combine these into a single performance
score based on related work.
2006 and 2012, find a few papers that actually tries to combine
multiple attributes and try to integrate them into one performance
score.
What we wanted to do is that we want to compare our deferred
acceptance, the two different versions, multiple stage and revised
deferred acceptance, and the optimal. The optimization.
And we tried to say let's try to increase the number of VMs from about
300 to about a thousand. Let's try to look at the runtime. And if you
actually run the optimization algorithm, the runtime will increase
dramatically. And because of the linear time complexity as compared to
the number of VMs, the number of servers of the actual stable matching
algorithms that we have the running time is better. It scales much
better.
But what we do care is the performance. If we actually have increase
of the number of VMs, the actual performance is not that much worse as
compared to optimality. If you actually run optimal, to obtain an
optimal solution it's probably only about 20 percent better than the
result that we get from the stable matching. Obviously this is just
one set of simulations. We use traces from the existing papers, and we
have a number of VM priorities and small number of application types
and server attributes.
It's just one instance of simulation that is trace-driven. However, it
does give a glimpse of the performance that we have in terms of using a
stable matching algorithm rather than the actual -- getting the optimal
solution.
>>: A question here, when you show the optimal algorithm, what's the
criterion? What is optimal ->> Baochun Li: So in this particular case we're running the optimal
solution based on the performance score that we have. So we're trying
to optimize the performance score of combining the VM priorities, the
application types and the server attributes.
>>: Performance score is calculated on let's say basically -- basically
server resource to execute this VM, the time.
>> Baochun Li: Exactly. So for any kind of matching of matching VMs
to the servers, based on the server attributes, the different kind of
servers, based on the application types of preferring different kind of
servers, preferring, for example, the latency in that case, the CPU
utilization would be one possibility of considering that, and try,
optimization algorithm is trying to maximize the aggregate of all of
these.
>>: When you're doing optimization, basically try to calculate the OPTR
algorithm, what's the reason for using PT. Because the reason is
basically this, the general problem here is actually combinatorial,
right?
>> Baochun Li:
Right.
>>: You cannot basically afford to run the actual combinatorial other
than to find optimal ->> Baochun Li: The running time is going to be increasing. But we
could still compute, because it's a small number of VMs, we can still
compute the optimal solution with the running time.
>>: Absolute optimal.
>> Baochun Li:
Yes, it's the optimal solution.
>>: Exhaustive search basically ->> Baochun Li:
It's combinatorial optimization, yes, yes.
>>: Okay. Really usually in combinatorial optimizations, there are
approaches, basically in this approach I can see you're trying to
search for deferred basically, basically you're extending a prior
algorithm used for [indiscernible] for this approach, right? Usually
in other basic approaches, the general approach is that first I get
basically I say good algorithm. Let's say I basically use your
algorithm. Then I'm going to look at pairs or triple pairs, try to do
swagging, basically it's this, I'm looking at two pair of server and
VMs, and its current basic location. I try to see if I flip the
alignment, is any one better off or not. Or can I flip three pair
basically on that. I'm pretty sure you're familiar with LDP C coding
and this kind of literature, they do such things.
>> Baochun Li: Those are more or less related to VM migration. So
here we're looking at a simpler problem in the sense of placement. So
we're not doing migration. We're doing initial placement. So we're
not doing migration in the sense that ->>: I'm not also talking about placement, because here -- you're
basically saying okay I look at your algorithm let's say revise
algorithm and look at result. And then I try to develop a search
algorithm, try to see if I do swiping on this algorithm, can I further
improve the performance or not.
>> Baochun Li:
True.
>>: And usually basically there is this. You try to do a round, search
another round, basically after -- I mean basically you search through
enter, basically cannot find any more of this swiping, you basically
terminate. So this is like basically using the pink covers of the
algorithm at the start and start to do gradient decent on this complex
surface. Basically it will go to a local minimum, because that's the
definition of you cannot find any further swiping on this algorithm.
I'm just wondering basically, I mean, have you tried ->> Baochun Li: I understand your point. You're not talking about
migration. We're not considering the cost of migration. So the cost
will be zero. We're just taking the stable matching result and try to
further improve on that.
So we haven't tried that particular direction of kind of first getting
to a good enough solution and try to see what's the complexity of
further improving on that. So we haven't really studied that
particular direction of research. So what we did is we kind of
compared with the existing solutions in the literature of just directly
solving the optimization.
So if we actually try to improve that, it might be possible that if
after a smaller number of steps you can actually increase the
performance quite a bit. Quite dramatically.
>>: I think basically I also have questions basically on some
practicalities. I think that probably would drag this point too far
away. Let me basically discuss with you after the talk is over.
>> Baochun Li:
Sure.
>>: So I have a quick question. On the stable matching side, are you
looking at one resource multiple dimension resources and try to match
all of them at the same time?
>> Baochun Li: In this particular case we're looking at just
preferences. So we convert the actual resource preferences including
multiple dimensions into a preference list. So we have to do that.
>>: [indiscernible] CPU capacity, memory capacity also.
>> Baochun Li: In this case we just have one dimension. It's the size
and that's it. And of course if you have multiple dimensions you're
going to have to convert, and conversion is not precise but we have to
do that.
>>: I'm just wondering on the literature side, whether that thing has
been looked at or not, multiple dimensions stable matching.
>> Baochun Li: No. No. I do not think so.
stable matching is not unique.
So multiple dimension
So what we did is we also did an implementation. So we did an
implementation -- I gave a fancy name, called it anchor, and what it
does is that it has a resource monitoring system that actually monitors
the capacity of the servers and tried to feed that result into the
policy manager, and the policy manager is basically a conversion
mechanism that converts different policies of servers, say
consolidation and load balancing, into the preference list. And then
the matching engine just runs our stable matching, the deferred,
multi-stage deferred acceptance algorithm. So this we believe is a
nice approach, because it's a unified system that supports both
different placement policies with just one matching engine. So the
policy manager can convert different policies into the preferences, but
we can actually just use one algorithm. And that's the beauty of this
particular system. So unified system. For example, if we actually
have the policy of consolidation versus load balancing from the
operator's side of datacenter operators, we could actually convert that
into different preferences and we actually try to use those preferences
in feeding to the same matching engine and try to get the result.
Here's the experimental result of allocating ten VMs using this real
world implementation. Every 30 seconds on the 20-note cluster that we
have in our lab. What we did is that we run it over time about
150 seconds and we can see that the yellow dots, they're consolidation
using consolidation policies, which means that we want to consolidate
VMs to servers as much as we can. And the blue dot is used load
balancing policies. We don't care about the energy cost, we don't want
to turn off servers and we care about performance of the applications
by doing load balancing. And we can see that the number of active
servers is higher than the consolidation when we use load balancing
over time which means that really the differences of the policies make
a difference in the result of stable matching. And our claim as the
anchor of an incantation of feeding different policies into the
deferred acceptance algorithm is going to be effective in realizing
these policies. So in general, to summarize, we have existing -- we're
going to try to provide optimal solutions using combinatorial
optimization. And we want to move back a little bit. We want to say
can we actually reduce the complexity because of the actual nature of
scalability. We have to run the algorithm in a large scale datacenter
we have more than a thousand VMs mapped to these data centers and try
to sacrifice some performance but hopefully achieving much better in
terms of complexity.
So the second part of this particular talk is going to be related to
workload management. So workload management is concerned with multiple
data centers. And these data centers, they're geographically
distributed and we have multiple users trying to send requests to these
data centers. This is again a traditional problem.
So again we assume that we have geographically distributed data centers
in this case this is from Google the different locations from data
centers around the world. And we assume that we also have a number of
users trying to send their requests to data centers.
So the simplifying assumption here is that the data that can be used to
satisfy these requests is replicated to in this case the most simple
case, all of the data centers around the world.
So in that sense if we actually have a user and the user is trying to
satisfy its requests, the request can be satisfied with a combination
of the responses from a number of data centers around the world. So
the result -- the question that we have here is how do we map the
requests to these data centers? How do we map the requests?
And this is typically called request mapping, and this is well studied.
It's quite extensively studied in the literature. There's another
problem. And the problem is what we call response routing, in this
case what we wanted to do is we want to say we want to send response
back to the user after processing these requests and how do we send
responses back to the user? From a single datacenter. So let's zoom
into one of these data centers, say the one in Seattle or Oregon, what
do we have here? We have this one datacenter and we do have multiple
ISPs of handling the actual Internet connection of this particular
datacenter. We assume multiple ISPs. They actually have different
bandwidth. They have different latency in terms of sending these
requests out. And they also have different costs for bandwidth. And
in this particular work we only consider the different costs for
bandwidth. So they have different charging models in that sense. And
we have different costs for bandwidth. Of course, we want to optimize
in terms of the cost of using bandwidth, and this problem response
routing of the multi-homed server has also been studied in the
literature. So what we wanted to take a look at is we want to take a
look at is we want to look at both of these problems of request routing
and request mapping and response routing. So these two problems, they
have been extensively studied all the way back to 2004 to about 2011 in
SIGCOMM papers and the like and these papers, they're trying to study
these problems, both problems separately. They're not trying -they're not trying to study the two problems in the same setup.
However, what we care about -- what we claim is that these problems,
they're related to each other. They're inherently related. And the
intuition here is that the request mapping from the users to the
server, they determine the demand for traffic and the demand for
traffic affects the response routing.
So we believe that these two problems should be studied jointly rather
than separately. And if we actually study these problems separately,
they will lead to objectives that are misaligned and the result, the
solution of solving these problems separately will not be optimal as
compared to if we study the problems jointly.
So that's the kind of the motivation of studying both of these problems
in the same setup. So we have recently a paper, Henry and myself joint
work, and that studies the joint problem request mapping and response
routing, in the same problem. In INFOCOM. So here are the system
models. First, what is a user? In our model, a user is a unique IP
prefix. This is in reality common practice, for example, it's used in
Akamai, related to request traffic we assume in our system model that
the requests are arbitrarily splittable among the data centers. This
is also common practice, we can actually split the request using either
DNS or http proxies and we assume that's the case.
A little bit more problematic is the response traffic. The response
traffic we also assume that the responses will be arbitrarily
splittable among different ISP links handling the Internet connection
from the datacenter to the Internet.
It is not commonly the case for BGP. However, we did find papers, for
example, there's an INFOCOM 2000 paper hashing based traffic splitting
that actually tries to study the possibility of arbitrarily splitting
the traffic among different ISPs. So here we also try to assume that
is the case, because if we cannot assume that is the case, we cannot
handle -- we cannot jointly study the problem, the two problems.
And finally in terms of the time scale of running this optimization, we
assume that we can run this hourly and this is common practice in the
literature. Lots of SIGCOMM papers study request mapping. They run
the operation hourly, and they assume that the near future traffic is
predictable.
So these are some of the system models we actually tried to assume in
our result. So here's the formulation that we have. First of all, we
consider the actual mapping between the user and the datacenter. And
the datacenter had multiple ISP links. Multiple ISPs handling its
connection to the Internet so we consider a term called stop
datacenter, and that is a TUPL of datacenter and ISP link. So
different ISP links will be different stops and different data centers
will be different stops as well.
We also have a user. And the user, user I, they actually try to send
the proportion of its requests to each one of these stop data centers.
So the stop datacenter is the TUPL of datacenter and ISP link.
And this is a proportion between 0 and 1. So what we wanted to do is
we want to do this joint request mapping and response routing. So we
first consider our optimization objective. So, first of all, the
demand of requests. So D sub I is the demand from the user. We assume
this demand is arbitrarily splittable to datacenter stops. And then if
we tried to minimize the cost, we consider the cost of the datacenter.
In this case, we assume that we know the energy costs related to each
one of these requests for a particular datacenter stop. And that's
something that we wanted to minimize. In terms of the actual
performance of using the datacenter to satisfy the request, we consider
the performance with regards to latency. So L sub IJ would be the
latency of satisfying the user I's request, using the datacenter stop
J, and we assume that we have this utility function. And this utility
function is a concave function decreasing and differentiable. And so
the longer the latency, the worst it is. So we want to minimize the
cost in terms of latency.
And after that, the first term is going to be the request mapping.
It's going to be the costs related to request mapping, and what we
wanted to do is that we want to put the actual cost in terms of the
response routing into the same system as well. So what we wanted to do
is that we want to consider the bandwidth cost of datacenter using
different ISPs. So different stops is going to have different
bandwidth costs related to the unit of responses satisfying the
requests.
And this second term is related to response routing. So if we actually
tried to take the joint consideration request mapping response routing
and minimize the total cost, we actually have the entire formulation of
a convex optimization problem.
So typically we can solve this convex optimization problem using
traditional approaches but the problem here is that it's a large scale
problem. With respect to users, because each user is basically IP
prefix, it's on the order of ten power of five in terms of the number
of variables it's order of ten power seven. Number of constraints, ten
power of five related to the number of users. So typically we can
solve this using doing decomposition with gradient methods the
traditional way of solving this problem, but we believe that this is
having two different drawbacks traditionally associated with the
composition of subgradient methods solving this problem in a
distributed fashion, but the first drawback is we have to adjust the
step size using adaptive algorithm, and this is a tricky problem of
adjusting the step size, and then even if we do so, this convergence is
still pretty slow. So what we wanted to do is that we want to make it
even better. So we want to solve this problem, but we want to solve
this problem using a different approach. So we identify the structure
of this particular formulation, and we want to solve this using ADMM.
Automating direction method of multiplers. And this is actually
studied back in the 1970s, but reexamined in Boyd's paper in 2011.
Basically applying to a problem with a certain structure and that
structure is nicely observed in our problem formulation of joint
request mapping and response routing.
So the idea of this particular algorithm of ADMM is we want to consider
each one of these request mapping and response routing, each one of
these sub problems separately but in an automating way. So we want to
study each one of these separately but we want to link them up in the
dual solution in each iteration.
So we first solve for request mapping, and we get the per user sub
problem. We solve for that. And then we also solve for request
mapping, request routing, and that's the per stop datacenter sub
problems, and we actually try to update the dual variables and then go
back and try to solve for request mapping response routing again.
So this is the basic idea of ADMM. This only works it converges
successfully for a certain structure of the problem, and that structure
of the problem is observed in our problem. So here's an evaluation of
the performance. We tried to study this in the choice driven
simulation using Wikipedia request are traces. We look at some of the
empirical power and bandwidth prices that we've connected over the
Internet. And latency data using IE Plane, using Planet Lab nodes,
also available from the existing literature using trace-driven
simulation what we did is we run this ADMM and we run different
variations, two different variations for ADMM. The first variation is
the traditional ADM and go all the way to optimality. Solve it to
optimality and the second one, the green one is the one we run 20
iterations rather than going for optimality. What we consider here is
that we think about the cost of the result of the solution here. We
can see that the green is very similar to the blue, which means that if
we run it for 20 iterations, it's about the same performance in terms
of reducing the cost per request as compared to solving for optimality.
And if we think about the latency for each one of the users, again, a
situation for considering performance, we also try to compare it and
see they're very similar as well. So solving ADM for only 20
iterations tries to achieve the same result, the same performance as
solving for optimality using ADMM. So which means that if we use ADM
it's already faster than subgradient in terms of the traditional method
of using subgradient method. And right now if we run it for 20
iterations, the fixed number of iterations, it's actually even better.
It's even faster, but it achieved very similar performance.
If you think about using subgradient methods, with very simple adaptive
step size rule, diminishing step size rule we compared the two
algorithms, ADMM and subgradient in terms of the number of iterations,
the C DF shows that ADMM converges much faster. So typically converges
within 60 iterations and using subgradient methods it converges more
than 60 iterations. So this is much faster, and, again, if we just run
it for 20 iterations, obviously it has not converged yet but the result
will be very close to if we solve it for optimality. So the conclusion
is that for using ADMM, it has much faster convergence. And
interestingly, if we take a look at this INFOCOM 2013 paper that we did
last year, later on, after that, this year we saw more papers done by
other researchers. There are two significant metrics papers this year
coming up next month related to joint request mapping and response
routing, which means that this problem is indeed an interesting
problem. And it's an important problem of getting to better solutions.
And we also ourselves, we've studied request mapping as a problem alone
but taking into account the actual cooling factor of data centers using
the external temperature as the empirical traces basically we consider
the energy consumption of data centers, and we have a paper as well to
appear in a conference called I C A C. And basically these are some of
the follow-up work after our work last year related to either request
mapping itself or in other groups, joint request mapping and response
routing.
So to summarize, in the second part of the talk, we basically try to
say existing work actually tries to solve for optimality, and the
performance is not good enough, because they consider different
problems separately. And our work, if we actually try to consider
request mapping and response routing jointly, we can actually achieve
better performance. But if we consider using a better solution method
of solving this optimization problem, we can actually reduce the
complexity as compared to subgradient method in terms of the number of
iterations that we run the algorithm. So in general, just to recap
this entire talk, we first talk about existing work using combinatorial
algorithms, combinatorial optimization algorithms to solve for VM
placement problem. And our work is trying to say let's sacrifice the
complexity a little bit and get a little bit less performance. And
hopefully the performance trade-off is not that dramatic. And in our
simulations we show the performance is not that much worse, probably
20, 30 percent worse as compared to optimal solution, in the instance
of VM placement problem. And then we think about possibilities of
doing workload management using distributed optimization and if we
consider request mapping and response routing jointly, in our INFOCOM
2013 paper we have shown that we could solve for optimality but achieve
less complexity and better performance as compared to considering these
two problems separately. Basically considering, for example, request
mapping alone.
So that's kind of a recap of the entire talk, and for any additional
questions, one can check out my papers by going to my website, Google
my first name to get to my website. Thank you. [applause].
>>: I like the first part the resource [indiscernible].
>> Baochun Li: I like it better, too. That's why I spend about
two-thirds of the time in the first half. And one-third of the time in
the second half. And in total it's about 50 minutes.
>>: I actually have questions related to resource management. I think
it's very relevant problem. But I think it's many of those we can
probably discuss afterwards, because I think it's more than a problem,
more like a discussion. I'm just wondering if you guys have any
discussion you want to discuss here, maybe we can use move out.
>> Baochun Li:
Sure.
>>: Thank you so much for coming.
>> Baochun Li: No problem.
Download