>> Jin Li: Hello everyone. It's our great pleasure to have Professor Baochun Li to come and give another talk at Microsoft Research. Baochun is, let me see, Bell Candidate Endowed Chair in computer engineering of University of Toronto. He did a number of excellent works, including put network coding into UUC engines for peer-to-peer video delivery basically and UC is a company in China. Baochun not only write a large number of papers, he himself write a lot of code and his code is excellent quality and I have seen some of that. It's great to have excellent research scientist which combine excellent research skills and coding skills. Without further ado, let's hear what Baochun has to talk about optimized datacenter operation with practical complexity. >> Baochun Li: Thank you, Jin, and it's always my pleasure to be here and I visited Microsoft Research quite a few times. And today what I'm going to do is I'm going to talk about some of my recent work, jointly done with my Ph.D. student, and his name is Henry Hong Xu, and he just graduated and is going to be joining City University of Hong Kong as assistant professor. It's on optimizing datacenter operations with practical complexity. So what I'm going to talk about today is going a little bit beyond optimality. And this is our recent effort of trying to do something that is more than solving problems to achieve optimal solutions. So the context of this particular talk in our work is related to data centers. And, of course, these days there are lots of these metadata centers. For example, this photo is from one of the Google data centers, and each one of these, they're processing a lot of requests. They're doing a lot of work. For example, we have from New York Times some news related to the number of Web pages and the number of video views and the number of search queries, they're all in the order of billions every single day. So it's a lot of work done by the data centers. And what we care about in research is related to performance. So we care about performance. And related to data centers, there are two things we care about the most. One is related to datacenter server utilization. We want to increase the utilization of servers. And the other is to reduce the consumption of energy and in general reduce the operational costs related to these data centers. So if you actually take a look at the related work in the literature, we saw a lot of papers on optimizing costs and improving performance of data centers. And all of these or most of these are related to optimizing some of the operations in the data centers to achieve some objectives. And optimization theory is widely used in these papers. For example, if we actually do a very simple Google Scholar search related to the keyword datacenter optimization, we can see an increasing trend of the number of papers related to datacenter operation, the optimization of the actual operation of these data centers. But we have to watch out here. We have to watch out for something. We have to watch out for complexity. So what we care about is we care about performance. But however as we achieve additional performance, we actually move up in the actual operational points. So we actually tried to achieve more performance by going for more complexity in our algorithms. And the question that I have here today is should we actually operate here where we actually get the maximum performance but a bit higher complexity or should we do a little bit of trade-off and try to reduce the complexity probably by a wide margin and without sacrificing much performance margin. So that's something that we'd like to do today. So here's an outline of this particular talk. So the first I'm going to talk about within the single datacenter. So I'm going to talk about the traditional problem of virtual machine placement. So I want to talk about placing virtual machines on to physical servers. This is of traditional problem, and basically the problem is that we have a large number of virtual machines and we want to decide which physical server is going to run these virtual machines and this is in general related to a problem in the category of resource management, related to the data centers. And the second part of the talk, probably about one-third of the time, at the end of this talk, I'm going to very briefly expand on that and try to talk about multiple data centers with users and trying to make requests. And these requests are going to be satisfied by multiple data centers. And the response will be fed back to the users. And this is typically related to multiple geographically distributed data centers. This is in the category of workload management. So two parts. Resource management and workload management. So let's get started from resource management first. What we care about is the traditional problem of virtual machine placement. And this is the problem we have selected to show how do we actually provide a better trade-off in terms of complexity versus performance. The traditional problem here of making decisions related to VM placement, it typically has a direct impact on the performance of applications running on these data centers. For example, if these applications are Web services, then the CPU frequency and utilization affects the actual performance. If these are databases, then the disk IO and memory throughput will affect the actual performance of the applications. So there's a lot of prior work in this particular area, and if we actually do, for example, a very simple Google Scholar search for some of the keywords related to VM placement and optimality, you can see a lot of papers in this particular area, but a lot of them are using combinatorial optimization. So they're trying to solve optimization problems to achieve optimal solutions. And the problem here is complexity. So we take a look at one of these papers that summarizes the running time of different algorithms in this particular area, VM placement. And different algorithms for about a thousand VMs. They actually try to run for more than 15, almost achieving 30 minutes, depending on the algorithm, and to us I don't think this is actually satisfactory. For a thousand VMs you have to run this algorithm for half an hour. And this is not satisfactory. So the problem here is complexity. So we care about complexity. So what we wanted to do is that we wanted to trade off a little bit of complexity but we don't want to sacrifice too much performance, but obviously we cannot achieve optimality. So we want to be used stable matching theory. Stable matching theory has been developed back in 1960s by Gale and Shapley, and this particular algorithm is a very simple algorithm to achieve stable matching. It's so simple if we have NVMs and M servers and we actually run the stable matching, the algorithm that actually achieved stable matching which I'm going to talk about briefly next with an example, we actually have all the order of N times M in terms of the time complexity. So this is a very simple algorithm. And the traditional model of stable matching is in the college admissions kind of model. And this is about admitting students to colleges and each college can admit multiple students and each student can only attend one college. And the input to the algorithm would be the preferences. So the college they have ranking of students. The student they have ranking of colleges. And the output of the algorithm is what we call stable matching. I'm going to talk about next as a solution concept of the actual output. So VM placement is also a stable matching problem in the sense that each server can actually accommodate multiple VMs and each VM is going to be placed at one server. So here's a ploy example of stable matching as a concept. It's a solution concept. Suppose on the left-hand side we have the virtual machines and right-hand side we have the servers, and each virtual machine has a ranking of its preferences of servers, and each server has a ranking of its preferences of virtual machines. So the college admissions problem and the seminal Gail and Shapley paperback in the 1960s, actually they just won the Nobel Prize of economics using this particular piece of work, is to achieve stable matching. What do we mean by stable matching? Let's consider one matching, one possible matching, which is to match V1 to S1, V2 to S2 and V3 to S3. This is one possible matching. And we claim this matching is not stable. And the reason this is not stable if we consider V2 and S3, we consider V2 and S3. These are the two nodes that if we actually tried to think about V2 it's currently matched to S2, but it prefers S3 to S2. If we think about S3 currently matched to V3 but it prefers V2 to V3. It's mapped to V3 but prefers V2. So in general if we actually tried to establish a matching between V2 and S3, we actually have an unstable matching, because each one of them they prefer a different partner than it's current partner. So this pair of V2 to S3 in this particular example is called a preference blocking pair. And a matching is only stable if there does not exist a preference blocking pair. So if we can find something like this, it's not a stable matching. Now, the algorithm is to try to achieve what it tries to compute is a stable matching result of the matching of all of the matching, and the algorithm is called deferred acceptance. This is proposed by Gail and Shapley. And here's an example of running this algorithm. So the algorithm is trying to say let's try to let the virtual machines first propose to the servers. And each one of these virtual machines obviously in the first iteration is going to propose to its most preferred server, V1 proposing to S1, V2 to S1 and V3 proposing to S3. And what we have here is we have this first iteration and the servers is going to choose their preferences. So S1 currently receiving two proposals is going to choose V1 as compared to V2. So S1 is going to reject the proposal from V2 and accept V1. And then V1 is going to cross S1 out of its list, and in the next iteration it's going to propose to S3. Now, S3 previously has accepted the proposal from V3 but now it has actually two proposals here and it's going to evaluate and try to take a look at its preference list and say, you know, V2 is better than V3. So it's going to reject V3, accept the proposal from V2, and now V3 is going to cross S3 out of its list and V3 is going to try to propose to the next one on this list which is S2, and deferred acceptance is guaranteed to terminate based on the seminal paper of Gail and Shapley. And if it is the simplest form of stable matching, which is one-to-one, it is proven that using deferred acceptance stable matching can always be found. Of course, there's not only one stable matching. There could be multiple of these. But one of these can always be found using this deferred acceptance algorithm. And if we have the one to many college admission, this model is essentially the same. Using deferred acceptance plus stable matching can always be found. So in our situation, we claim that classical, the classical model does not apply. What's the problem in our situation? The VM placement. Well, our problem here is that previously when we talk about the matching between VMs and servers, we kind of assume that the VMs, they consume the same amount of resources. Now in reality VMs they actually require different amount of resources. In addition to that, the servers, they actually can have different capacities as well. And the size heterogeneity of both VMs and actually servers, it's a new and difficult challenge of stable matching theory. So what we wanted to do is we want to adapt the algorithms and achieve stable matching, well, we first have to define what stable matching means if we have size heterogeneity of both VMs and servers. So VMs they require a different amount of resources and servers have different capacities. So we want to develop a new stable matching theory with size heterogeneity, and here's an example of what we have. >>: So question, so the college student admission exam, students have preference for schools, right? >> Baochun Li: That's right. >>: So VMs, why would we have preferences over servers? >> Baochun Li: So the question is why do we have preferences for VMs to have preferences of different servers. Well, the VMs, they are tenants, different applications belonging to same tenant, different applications they could have co-location preferences and tiered service, for example, and we believe that we should allow the applications or the tenants to express these preferences. So not only the servers have preferences of VMs but the virtual machines also have preferences of the servers. >>: [inaudible] in the sense that might have wanted some together. >> Baochun Li: Exactly. So the question is, the preferences can be quite complicated. So I'm going to very briefly talk about that we have an engine that actually tries to convert policies to the list of preferences. And we have to be able to do that in order to use this. So we have to be able to map the actual policies. For example, load balancing consolidation into the preferences, because it's really -it's pretty flexible to actually do that. Now, what we have here is that we have a new model. We have the new model with size heterogeneity. And what we have here is that different virtual achiness have different sizes. Right now virtual machine has size one as compared to V2 and V3 both of which has a size one. And then we have the capacities. Different servers have different capacities. Capacity 2-1-1. So in this new model, again, we could consider one kind of matching, matching V1 to S1. Both of them have a size two. And V2 to S2 and V3 to S3. So that's something we could do. And then we could also think about the typical notion of a preference blocking pair. So consider V3 and S1. V3 currently is mapped to S3, and V3 actually prefers S1 more than S3. S1 is currently mapped to V1 but it prefers V3 to V1. So in this case this is a traditional sense of preference blocking, so this is a preference blocking pair, V3 and S1. However, we have to change the definition slightly in the sense that if V prefers S. A virtual machine prefers a server, server prefers a virtual machine to some of its VMs because a server can accommodate multiple VMs because of the different sizes, different capacities of the servers. And by rejecting them, some of them, you know the server can run this particular virtual machine. And that's kind of the slightly revised definition of preference blocking, and we could use that to define preference blocking pair and if there's a preference blocking pair, obviously it's not stable. It's not a stable matching. So the objective is to find a stable matching in this new model with size heterogeneity. And unfortunately the deferred acceptance algorithm does not work. Does not work in this particular situation. So let's think about one example here. So we have in this example four different virtual machines. The first one has a size two. The next three had size one. Three servers, server one and server three had size two. Server two has size one. And each one of them had their preferences. So let's try to run the deferred acceptance algorithm using this particular example. In the first round, each virtual machine is going to propose to their most preferred server. So V1 to S1, V2 to S1, V2 to S1 and V3 to S2 and V4 to S2. And now S1 is going to take the proposal from V1. Reject the proposal from V2 because V1 really has a size two. S1 doesn't have additional capacity to run additional virtual machines. And V1 is more preferable than V2 for S1. And now after S2 rejects the proposal from V3 and accepts the one from V 4 tentatively, V3 crosses S2 out of its list and in the next round V3 is going to propose to S1. Let's see what happens when V3 proposes to S1. So V3 is more preferable than V1. S1 currently has V1. V1 with a size 2. S1 with a capacity of two. So S1 in this particular case is going to reject V1. And they're going to accept the proposal for V3. Reject V 1 accept V3.. and by doing so as one's capacity increases. Before accepting V3 it has no additional capacity, right now because V3 has the capacity of size of one, S1 had one additional capacity. And because of that, after additional round, V 4 is going to propose to S1. When V 4 proposes to S1, S1 has this additional capacity. So S1 is going to accept the proposal from V 4. It's going to accept the proposal from V 4. So S1 right now will have both V3 and V 4. So after S1 accepts V 4 you're going to see there's a problem here. The problem here is S1 had previously rejected V2. But now it accepted V 4. Previously when it rejects V2 it's because it does not have additional capacity. So it cannot actually accept V2. It prefers V1 to V2. Now it accepts V 4. It's because that it does have additional capacity after it rejects V1 and takes V3. So in that sense we do have a preference blocking pair which is V2 and S1. V2 prefers S2 -- S1 more than S2. And S1 prefers V2 more than V3. But more than V 4. So that's the preference blocking pair. And that's something that we don't want to see. So it's not a stable matching after running the deferred acceptance algorithm. It only happens with different virtual machine sizes when we have size heterogeneity. And this is no good. We don't want to have this. So how do we actually try to change this? Well, the intuition here is to revise the deferred acceptance algorithm. Whenever a virtual machine is rejected, we want to remove any less preferred VMs from the service preference. So what we want to do is we wanted to say can we actually, after S1 rejects V2, it's going to remove V 4, which is less preferred than V2 from its preference list. So if that is done, after running all of the iterations of this deferred acceptance algorithm, we can see that V 4 is going to be left unmatched. After V 4 is left unmatched, the result is still stable. It's still stable matching. So the revised deferred acceptance algorithm of crossing out the less preferred VMs after we reject a particular VM is going to reach stability in its result of matching. The VMs to the servers. But here it's kind of feels like it's kind of strange because V 4 is left unmatched. But it's still stable. Remaining matching is still stable. So this is correct, which means that the revised deferred acceptance is correct algorithm of achieving stable matching. So we have proved this theorem, the revised deferred acceptance always find the stable matching in the same time complexity as the original stable matching. And the stable matching can always exist in our new model. Now, if we take a look at the previous example, we can see there does exist some kind of a better result of stable matching. So what do we mean by better? Stable matching is better than another one if each virtual machine is at least as well off. So let's take a look at one example. This particular example here. Think about V2. V2 is currently matched to S2. But it actually prefers S1 better. So it prefers S1 than S2. Now, V2 is currently matched to S2 out of the revised deferred acceptance algorithm. But if V2 actually tries to be matched to S1, it's going to be a better -- because V2 is better off. Nobody else is worse. So in that sense, because S1 has currently is mapped to V3 it has additional capacity to actually accommodate V2, we call it capacity blocking. So it's not preference blocking it's capacity blocking in the sense it's better off because S1 does have additional capacity. So if we do something like that, then we're trying to say, okay, we can actually make the stable matching better in this sense of using the capacity blocking as the notion of improving it, we could achieve optimal stable matching. So the goal here is to find an optimal stable matching that does not have capacity blocking pairs. And what we wanted to achieve is we wanted to devise another algorithm, to achieve optimal stable matching, and this algorithm is a multi-stage deferred acceptance, and the intuition is very simple. We wanted to iteratively improve the stable matching by allowing the VMs to repropose to the servers. So we want to run multiple stages. In each one of these stages, we want to run the revised deferred acceptance with selected VMs and servers. And the theorem here is that we can prove is the multi-stage deferred acceptance, it actually finds optimal stable matching. So this is something that we can actually try to get. So what we're doing here is that we're trying to devise new algorithms to actually find stable matching, in this case optimal stable matching, with size heterogeneity in the VM placement problem. And what we did is that we actually run some of the simulations, trace-driven simulations, to actually try to evaluate the performance of the result of running these algorithms. What do we mean by performance? We actually try to evaluate the result of the matching and see how good it is. And we try to find out how good it is by looking at the VM priority, the application type, the server attributes. We try to combine these into a single performance score based on related work. 2006 and 2012, find a few papers that actually tries to combine multiple attributes and try to integrate them into one performance score. What we wanted to do is that we want to compare our deferred acceptance, the two different versions, multiple stage and revised deferred acceptance, and the optimal. The optimization. And we tried to say let's try to increase the number of VMs from about 300 to about a thousand. Let's try to look at the runtime. And if you actually run the optimization algorithm, the runtime will increase dramatically. And because of the linear time complexity as compared to the number of VMs, the number of servers of the actual stable matching algorithms that we have the running time is better. It scales much better. But what we do care is the performance. If we actually have increase of the number of VMs, the actual performance is not that much worse as compared to optimality. If you actually run optimal, to obtain an optimal solution it's probably only about 20 percent better than the result that we get from the stable matching. Obviously this is just one set of simulations. We use traces from the existing papers, and we have a number of VM priorities and small number of application types and server attributes. It's just one instance of simulation that is trace-driven. However, it does give a glimpse of the performance that we have in terms of using a stable matching algorithm rather than the actual -- getting the optimal solution. >>: A question here, when you show the optimal algorithm, what's the criterion? What is optimal ->> Baochun Li: So in this particular case we're running the optimal solution based on the performance score that we have. So we're trying to optimize the performance score of combining the VM priorities, the application types and the server attributes. >>: Performance score is calculated on let's say basically -- basically server resource to execute this VM, the time. >> Baochun Li: Exactly. So for any kind of matching of matching VMs to the servers, based on the server attributes, the different kind of servers, based on the application types of preferring different kind of servers, preferring, for example, the latency in that case, the CPU utilization would be one possibility of considering that, and try, optimization algorithm is trying to maximize the aggregate of all of these. >>: When you're doing optimization, basically try to calculate the OPTR algorithm, what's the reason for using PT. Because the reason is basically this, the general problem here is actually combinatorial, right? >> Baochun Li: Right. >>: You cannot basically afford to run the actual combinatorial other than to find optimal ->> Baochun Li: The running time is going to be increasing. But we could still compute, because it's a small number of VMs, we can still compute the optimal solution with the running time. >>: Absolute optimal. >> Baochun Li: Yes, it's the optimal solution. >>: Exhaustive search basically ->> Baochun Li: It's combinatorial optimization, yes, yes. >>: Okay. Really usually in combinatorial optimizations, there are approaches, basically in this approach I can see you're trying to search for deferred basically, basically you're extending a prior algorithm used for [indiscernible] for this approach, right? Usually in other basic approaches, the general approach is that first I get basically I say good algorithm. Let's say I basically use your algorithm. Then I'm going to look at pairs or triple pairs, try to do swagging, basically it's this, I'm looking at two pair of server and VMs, and its current basic location. I try to see if I flip the alignment, is any one better off or not. Or can I flip three pair basically on that. I'm pretty sure you're familiar with LDP C coding and this kind of literature, they do such things. >> Baochun Li: Those are more or less related to VM migration. So here we're looking at a simpler problem in the sense of placement. So we're not doing migration. We're doing initial placement. So we're not doing migration in the sense that ->>: I'm not also talking about placement, because here -- you're basically saying okay I look at your algorithm let's say revise algorithm and look at result. And then I try to develop a search algorithm, try to see if I do swiping on this algorithm, can I further improve the performance or not. >> Baochun Li: True. >>: And usually basically there is this. You try to do a round, search another round, basically after -- I mean basically you search through enter, basically cannot find any more of this swiping, you basically terminate. So this is like basically using the pink covers of the algorithm at the start and start to do gradient decent on this complex surface. Basically it will go to a local minimum, because that's the definition of you cannot find any further swiping on this algorithm. I'm just wondering basically, I mean, have you tried ->> Baochun Li: I understand your point. You're not talking about migration. We're not considering the cost of migration. So the cost will be zero. We're just taking the stable matching result and try to further improve on that. So we haven't tried that particular direction of kind of first getting to a good enough solution and try to see what's the complexity of further improving on that. So we haven't really studied that particular direction of research. So what we did is we kind of compared with the existing solutions in the literature of just directly solving the optimization. So if we actually try to improve that, it might be possible that if after a smaller number of steps you can actually increase the performance quite a bit. Quite dramatically. >>: I think basically I also have questions basically on some practicalities. I think that probably would drag this point too far away. Let me basically discuss with you after the talk is over. >> Baochun Li: Sure. >>: So I have a quick question. On the stable matching side, are you looking at one resource multiple dimension resources and try to match all of them at the same time? >> Baochun Li: In this particular case we're looking at just preferences. So we convert the actual resource preferences including multiple dimensions into a preference list. So we have to do that. >>: [indiscernible] CPU capacity, memory capacity also. >> Baochun Li: In this case we just have one dimension. It's the size and that's it. And of course if you have multiple dimensions you're going to have to convert, and conversion is not precise but we have to do that. >>: I'm just wondering on the literature side, whether that thing has been looked at or not, multiple dimensions stable matching. >> Baochun Li: No. No. I do not think so. stable matching is not unique. So multiple dimension So what we did is we also did an implementation. So we did an implementation -- I gave a fancy name, called it anchor, and what it does is that it has a resource monitoring system that actually monitors the capacity of the servers and tried to feed that result into the policy manager, and the policy manager is basically a conversion mechanism that converts different policies of servers, say consolidation and load balancing, into the preference list. And then the matching engine just runs our stable matching, the deferred, multi-stage deferred acceptance algorithm. So this we believe is a nice approach, because it's a unified system that supports both different placement policies with just one matching engine. So the policy manager can convert different policies into the preferences, but we can actually just use one algorithm. And that's the beauty of this particular system. So unified system. For example, if we actually have the policy of consolidation versus load balancing from the operator's side of datacenter operators, we could actually convert that into different preferences and we actually try to use those preferences in feeding to the same matching engine and try to get the result. Here's the experimental result of allocating ten VMs using this real world implementation. Every 30 seconds on the 20-note cluster that we have in our lab. What we did is that we run it over time about 150 seconds and we can see that the yellow dots, they're consolidation using consolidation policies, which means that we want to consolidate VMs to servers as much as we can. And the blue dot is used load balancing policies. We don't care about the energy cost, we don't want to turn off servers and we care about performance of the applications by doing load balancing. And we can see that the number of active servers is higher than the consolidation when we use load balancing over time which means that really the differences of the policies make a difference in the result of stable matching. And our claim as the anchor of an incantation of feeding different policies into the deferred acceptance algorithm is going to be effective in realizing these policies. So in general, to summarize, we have existing -- we're going to try to provide optimal solutions using combinatorial optimization. And we want to move back a little bit. We want to say can we actually reduce the complexity because of the actual nature of scalability. We have to run the algorithm in a large scale datacenter we have more than a thousand VMs mapped to these data centers and try to sacrifice some performance but hopefully achieving much better in terms of complexity. So the second part of this particular talk is going to be related to workload management. So workload management is concerned with multiple data centers. And these data centers, they're geographically distributed and we have multiple users trying to send requests to these data centers. This is again a traditional problem. So again we assume that we have geographically distributed data centers in this case this is from Google the different locations from data centers around the world. And we assume that we also have a number of users trying to send their requests to data centers. So the simplifying assumption here is that the data that can be used to satisfy these requests is replicated to in this case the most simple case, all of the data centers around the world. So in that sense if we actually have a user and the user is trying to satisfy its requests, the request can be satisfied with a combination of the responses from a number of data centers around the world. So the result -- the question that we have here is how do we map the requests to these data centers? How do we map the requests? And this is typically called request mapping, and this is well studied. It's quite extensively studied in the literature. There's another problem. And the problem is what we call response routing, in this case what we wanted to do is we want to say we want to send response back to the user after processing these requests and how do we send responses back to the user? From a single datacenter. So let's zoom into one of these data centers, say the one in Seattle or Oregon, what do we have here? We have this one datacenter and we do have multiple ISPs of handling the actual Internet connection of this particular datacenter. We assume multiple ISPs. They actually have different bandwidth. They have different latency in terms of sending these requests out. And they also have different costs for bandwidth. And in this particular work we only consider the different costs for bandwidth. So they have different charging models in that sense. And we have different costs for bandwidth. Of course, we want to optimize in terms of the cost of using bandwidth, and this problem response routing of the multi-homed server has also been studied in the literature. So what we wanted to take a look at is we want to take a look at is we want to look at both of these problems of request routing and request mapping and response routing. So these two problems, they have been extensively studied all the way back to 2004 to about 2011 in SIGCOMM papers and the like and these papers, they're trying to study these problems, both problems separately. They're not trying -they're not trying to study the two problems in the same setup. However, what we care about -- what we claim is that these problems, they're related to each other. They're inherently related. And the intuition here is that the request mapping from the users to the server, they determine the demand for traffic and the demand for traffic affects the response routing. So we believe that these two problems should be studied jointly rather than separately. And if we actually study these problems separately, they will lead to objectives that are misaligned and the result, the solution of solving these problems separately will not be optimal as compared to if we study the problems jointly. So that's the kind of the motivation of studying both of these problems in the same setup. So we have recently a paper, Henry and myself joint work, and that studies the joint problem request mapping and response routing, in the same problem. In INFOCOM. So here are the system models. First, what is a user? In our model, a user is a unique IP prefix. This is in reality common practice, for example, it's used in Akamai, related to request traffic we assume in our system model that the requests are arbitrarily splittable among the data centers. This is also common practice, we can actually split the request using either DNS or http proxies and we assume that's the case. A little bit more problematic is the response traffic. The response traffic we also assume that the responses will be arbitrarily splittable among different ISP links handling the Internet connection from the datacenter to the Internet. It is not commonly the case for BGP. However, we did find papers, for example, there's an INFOCOM 2000 paper hashing based traffic splitting that actually tries to study the possibility of arbitrarily splitting the traffic among different ISPs. So here we also try to assume that is the case, because if we cannot assume that is the case, we cannot handle -- we cannot jointly study the problem, the two problems. And finally in terms of the time scale of running this optimization, we assume that we can run this hourly and this is common practice in the literature. Lots of SIGCOMM papers study request mapping. They run the operation hourly, and they assume that the near future traffic is predictable. So these are some of the system models we actually tried to assume in our result. So here's the formulation that we have. First of all, we consider the actual mapping between the user and the datacenter. And the datacenter had multiple ISP links. Multiple ISPs handling its connection to the Internet so we consider a term called stop datacenter, and that is a TUPL of datacenter and ISP link. So different ISP links will be different stops and different data centers will be different stops as well. We also have a user. And the user, user I, they actually try to send the proportion of its requests to each one of these stop data centers. So the stop datacenter is the TUPL of datacenter and ISP link. And this is a proportion between 0 and 1. So what we wanted to do is we want to do this joint request mapping and response routing. So we first consider our optimization objective. So, first of all, the demand of requests. So D sub I is the demand from the user. We assume this demand is arbitrarily splittable to datacenter stops. And then if we tried to minimize the cost, we consider the cost of the datacenter. In this case, we assume that we know the energy costs related to each one of these requests for a particular datacenter stop. And that's something that we wanted to minimize. In terms of the actual performance of using the datacenter to satisfy the request, we consider the performance with regards to latency. So L sub IJ would be the latency of satisfying the user I's request, using the datacenter stop J, and we assume that we have this utility function. And this utility function is a concave function decreasing and differentiable. And so the longer the latency, the worst it is. So we want to minimize the cost in terms of latency. And after that, the first term is going to be the request mapping. It's going to be the costs related to request mapping, and what we wanted to do is that we want to put the actual cost in terms of the response routing into the same system as well. So what we wanted to do is that we want to consider the bandwidth cost of datacenter using different ISPs. So different stops is going to have different bandwidth costs related to the unit of responses satisfying the requests. And this second term is related to response routing. So if we actually tried to take the joint consideration request mapping response routing and minimize the total cost, we actually have the entire formulation of a convex optimization problem. So typically we can solve this convex optimization problem using traditional approaches but the problem here is that it's a large scale problem. With respect to users, because each user is basically IP prefix, it's on the order of ten power of five in terms of the number of variables it's order of ten power seven. Number of constraints, ten power of five related to the number of users. So typically we can solve this using doing decomposition with gradient methods the traditional way of solving this problem, but we believe that this is having two different drawbacks traditionally associated with the composition of subgradient methods solving this problem in a distributed fashion, but the first drawback is we have to adjust the step size using adaptive algorithm, and this is a tricky problem of adjusting the step size, and then even if we do so, this convergence is still pretty slow. So what we wanted to do is that we want to make it even better. So we want to solve this problem, but we want to solve this problem using a different approach. So we identify the structure of this particular formulation, and we want to solve this using ADMM. Automating direction method of multiplers. And this is actually studied back in the 1970s, but reexamined in Boyd's paper in 2011. Basically applying to a problem with a certain structure and that structure is nicely observed in our problem formulation of joint request mapping and response routing. So the idea of this particular algorithm of ADMM is we want to consider each one of these request mapping and response routing, each one of these sub problems separately but in an automating way. So we want to study each one of these separately but we want to link them up in the dual solution in each iteration. So we first solve for request mapping, and we get the per user sub problem. We solve for that. And then we also solve for request mapping, request routing, and that's the per stop datacenter sub problems, and we actually try to update the dual variables and then go back and try to solve for request mapping response routing again. So this is the basic idea of ADMM. This only works it converges successfully for a certain structure of the problem, and that structure of the problem is observed in our problem. So here's an evaluation of the performance. We tried to study this in the choice driven simulation using Wikipedia request are traces. We look at some of the empirical power and bandwidth prices that we've connected over the Internet. And latency data using IE Plane, using Planet Lab nodes, also available from the existing literature using trace-driven simulation what we did is we run this ADMM and we run different variations, two different variations for ADMM. The first variation is the traditional ADM and go all the way to optimality. Solve it to optimality and the second one, the green one is the one we run 20 iterations rather than going for optimality. What we consider here is that we think about the cost of the result of the solution here. We can see that the green is very similar to the blue, which means that if we run it for 20 iterations, it's about the same performance in terms of reducing the cost per request as compared to solving for optimality. And if we think about the latency for each one of the users, again, a situation for considering performance, we also try to compare it and see they're very similar as well. So solving ADM for only 20 iterations tries to achieve the same result, the same performance as solving for optimality using ADMM. So which means that if we use ADM it's already faster than subgradient in terms of the traditional method of using subgradient method. And right now if we run it for 20 iterations, the fixed number of iterations, it's actually even better. It's even faster, but it achieved very similar performance. If you think about using subgradient methods, with very simple adaptive step size rule, diminishing step size rule we compared the two algorithms, ADMM and subgradient in terms of the number of iterations, the C DF shows that ADMM converges much faster. So typically converges within 60 iterations and using subgradient methods it converges more than 60 iterations. So this is much faster, and, again, if we just run it for 20 iterations, obviously it has not converged yet but the result will be very close to if we solve it for optimality. So the conclusion is that for using ADMM, it has much faster convergence. And interestingly, if we take a look at this INFOCOM 2013 paper that we did last year, later on, after that, this year we saw more papers done by other researchers. There are two significant metrics papers this year coming up next month related to joint request mapping and response routing, which means that this problem is indeed an interesting problem. And it's an important problem of getting to better solutions. And we also ourselves, we've studied request mapping as a problem alone but taking into account the actual cooling factor of data centers using the external temperature as the empirical traces basically we consider the energy consumption of data centers, and we have a paper as well to appear in a conference called I C A C. And basically these are some of the follow-up work after our work last year related to either request mapping itself or in other groups, joint request mapping and response routing. So to summarize, in the second part of the talk, we basically try to say existing work actually tries to solve for optimality, and the performance is not good enough, because they consider different problems separately. And our work, if we actually try to consider request mapping and response routing jointly, we can actually achieve better performance. But if we consider using a better solution method of solving this optimization problem, we can actually reduce the complexity as compared to subgradient method in terms of the number of iterations that we run the algorithm. So in general, just to recap this entire talk, we first talk about existing work using combinatorial algorithms, combinatorial optimization algorithms to solve for VM placement problem. And our work is trying to say let's sacrifice the complexity a little bit and get a little bit less performance. And hopefully the performance trade-off is not that dramatic. And in our simulations we show the performance is not that much worse, probably 20, 30 percent worse as compared to optimal solution, in the instance of VM placement problem. And then we think about possibilities of doing workload management using distributed optimization and if we consider request mapping and response routing jointly, in our INFOCOM 2013 paper we have shown that we could solve for optimality but achieve less complexity and better performance as compared to considering these two problems separately. Basically considering, for example, request mapping alone. So that's kind of a recap of the entire talk, and for any additional questions, one can check out my papers by going to my website, Google my first name to get to my website. Thank you. [applause]. >>: I like the first part the resource [indiscernible]. >> Baochun Li: I like it better, too. That's why I spend about two-thirds of the time in the first half. And one-third of the time in the second half. And in total it's about 50 minutes. >>: I actually have questions related to resource management. I think it's very relevant problem. But I think it's many of those we can probably discuss afterwards, because I think it's more than a problem, more like a discussion. I'm just wondering if you guys have any discussion you want to discuss here, maybe we can use move out. >> Baochun Li: Sure. >>: Thank you so much for coming. >> Baochun Li: No problem.