>> Yuval Peres: To continue, our next speaker is... and he'll tell us about Online Buy at Bulk Network...

advertisement

>> Yuval Peres: To continue, our next speaker is Ravishankar Krishnaswamy from MSR India and he'll tell us about Online Buy at Bulk Network Design.

>> Ravishankar Krishnaswamy: Thank you Yuval and it's a pleasure to be here. I'll be talking about a network design problem called the Buy at Bulk Network Design. Hopefully by the end of this talk you will understand the model and everything else we can get in this model and some flavor of the techniques that we have. I won't go into any of the proofs. it will just be a very high-level overview of the work. This is joint work with Deeparnab Chakrabarty, also of

MSR India, Alina Ene, who is at Warwick, and Debmalya Panigrahi who was a postdoc here and currently is faculty at Duke. I will start off just talking about multi commodity flows in general.

It's a very classical problem in optimization. You are given a graph, and undirected graph let's say. Edges have capacities and you are also given a collection of demands. For the simplicity of this slide let's assume all of the demands are one. The question that we ask is does this graph augmented a unit flow that you can simultaneously route from each source to the corresponding sink. This is a toy example here. We have this graph. Let's assume all capacities are also one and we have these sourcing paths so that the red vertices form a path, green vertices, blue. And you ask is there a routing of one unit of flow from all of the sources to the sinks concurrently. In this example the answer turns out to be yes and you can find the routing and so on. So it's a very fundamental problem in optimization. It comes up in a variety of places and people have designed a lot of algorithms for this problem. Like near linear times, optimized runtime. We've optimize like various conditions under which you can find integral routings and so on and so forth. It's been studied very extensively. What we are focused on is this buy at bulk design network and what we are focused on is how do you design the network which routes these flows? So given a network where finding the flow is the flow problem , you are looking at one level above this optimization. We are trying to design networks which can support good flows. This is sort of the problem that AT&T would be interested in. When it's building the infrastructure for U.S.'s internet networks they would be interested in such a problem. Estimate sort of how much demand is there which wants to send a package of data and it has to try to build the cheapest network that can control this flow. That is what is called the buy at bulk problem. It's been studied since the '90s, so let me just describe the problem.

Here you are given again a graph G and the collection of demand paths. Let's say AT&T has somehow managed to estimate the total demand that wants to go from various cities, various sources to various sinks. And each SiTi path also has a number Di which is the bandwidth that Si wants to send to Ti. You could think of your house has being Si and the network server as being

Ti and this 10 Mbps I want dedicated by 10 Mbps [indiscernible] or something like that. And originally AT&T placed with cable types, so for every edge in the graph there is a collection of possible cable types, so fiber or optical network or copper wire and so on. And each of them can provide a certain capacity and they, at a certain price. This is the optimization AT&T wants to do. What it has to do is it has to buy cable types so that you can actually provision bandwidth to support concurrently all the flow that's possible. So we have to get a cheap selection of cables so that you can route all of the flow that is possible. And what's the objective function? We want to minimize the cost of all of the cables that we have bought.

Here is a simple toy example. Let's say that is one source and one sink and there are two edges in this graph. You could directly take an orange cable which is a very high-bandwidth, so one

thousand bandwidth and the cost is 300. Or you can take two hops and end up using the blue cable and each of them is 50 unit cost but they give me 100 bandwidth, and let's say the total demand is 2250. In this case you could just by two cables of the first type and three cables of the other type and you can find that there is a cost of 900. If the demand slightly exceeds then the better solution might be to buy three cables of the top portion. Buying three cables of the orange type has the same cost and can give me bandwidth of 3000. As you see, if the bandwidth demand increases over time you could just not used the bottom blue cable and you should start using the orange cables. The overall problem is you have a graph and on each edge you have these cable types. They all have different costs like economies of scale and you have to figure out how to buy which cable types of that you cannot a given flow. And informal way you can think of this is jointly optimizing a collection of knapsack problems. Every edge there is a knapsack instance sitting there and we want to figure out how much bandwidth to delegate on that edge and what cable types to buy on that edge so that the multi-commodity flow is feasible. Any questions about the problem?

>>: Could you be more explicit about the link between your toy example and the knapsack problem?

>> Ravishankar Krishnaswamy: In particular, there is only one edge and there are many cable types, so this is one edge s-t and there is a bandwidth of say 100 and they have multiple cable types, like each of them has a cost and a certain capacity. I need to figure out what is the cheapest subset of these cable types that can provide me the capacity.

>>: So it's a covering knapsack?

>> Ravishankar Krishnaswamy: A covering knapsack. And overall I have a graph and now I want to have these many governing knapsacks that are jointly optimizing. As an example, this was I think in the early 2000s this is actually AT&T's physical network, so they were using four types of cables and each of them depending on how long the cable type was used, there would be a different cost. This probably was to satisfy the demand then that AT&T had. You can see a lot of demand in the Northeast and in these various places. The question that we are solving is the how to build such a network. Moreover to figure the types of problems that we can model using this framework. This is a very general model and can capture a lot of different types of cost functions. For instance, if the cost was equal to the bandwidth then it would be like a linear cost function. There is not really no economies of scale. You can just independently find shortest paths and then find the routing and so on. Or you could have the other extreme which is infinite economies of scale. There is an initial start of cost and after that you get infinite bandwidth. This sort of like a very classical problem called a Steiner tree problem where you just want to find the minimum cost tree that just connects up all this. Again, a very fundamental well studied problem. And then you could have an arbitrary concave function which is also characterized by these cable types. So you can model a variety of problems in this framework. That's what we're going to be looking at. In fact, we're going to be looking at an online version of this problem. Everything is as stated. The only thing is that these demands arrive one at a time. They arrive incrementally. This captures the fact that the entity doesn't

know ahead of time all the man's that it's ever going to have to provision for, so there is uncertainty in provisioning. This is one way in which we have modeled uncertainty in certain problems. This is the model that we are going to be looking at. You could think of emerging markets. So a new service could set up a data center and a lot of demand could be going to it.

Or you could have a new city that emerges and a lot of demand is originating in that city for various things. In fact, there is a new city that is being build in India as we speak, so that's one of the things. What is the constraints that the online algorithm has? It cannot un-buy cables that it has already bought. If it has already paid money and bought a certain link, it has to expend the money so the money spent is money used up. The standard notion, so this is what we expect an algorithm to do. How do we measure the performance of an algorithm? The way we measure for online problems is this notion of competitive [indiscernible]. What we want is the total money that the algorithm has spent over time until satisfying a certain demand. In order to be compatible with the total cost that the optimal solution would have spent if you knew that this was the demand it had to solve. The denominator is just the optimal off-line solution for the current input and the numerator is the total money that the online algorithm has spent. We want this ratio to be as small as possible. That is the most commonly used model for online measuring the performance of online algorithms and it's called comparative ratio. This problem has been studied extensively since the 1990s. In fact one of the earliest papers introduced this problem also motivated the online problem. They said that they motivated the problem in an online framework, but most of the results that we knew until now

-- I mean the off-line problem is a very well learned but still the online problem was not as well understood. The focus of this talk is going to be on the online problem. Demands arrive over time and we have to provide these links over time as well and be competitive with the off-line optimal solution. For a very special guest when all the sources, all the demands have a common sink, think of all of us wanting to use Netflix. In that case and then online algorithms are known for this problem. We did learn algorithms for this. But the multi-commodity, we didn't know online algorithms. And like I already explained, this framework captures a lot of other network problems as well, like the standard tree and group Steiner tree and so on. And a common theme across all of these problems is that, which is also true for our problem, is that the single commodity, nontrivial single commodity algorithms were designed before nontrivial multi-commodity algorithms, so we always understood the single commodity before we understood the multi-commodity. Similarly, for online algorithms as well. So online Steiner tree was understood before online Steiner forest. Online group Steiner tree was understood before online group Steiner forest. The same for online multi-commodity, but single commodity was built before online multi-commodity was built. Is that a theme that is going on here is one of the questions. What we show is we give a generic reduction for online problems from the multi-commodity version with a single commodity version. We show that if you can solve the single commodity version then you can use our reduction sort of for free and it's all multi-commodity version also for free in the sense that we lose algorithmic factors in the approximation ratio but that's as much as we lose. As a byproduct we get the first online multicommodity buy in bulk algorithm. Very important previous work for us which we extensively build on is a very nice algorithm by Chekuri et al. It gives a similar reduction for the off-line problems. It says that for off-line problems multi-commodity problems are almost as easy as a

single commodity problems. You can think of our work as an online analog of that result.

That's basically the take away from this.

>>: Am I correct in remembering the single online commodity problems that the competitive ratios are usually at least logarithmic to begin with? In that sense the fact that your reduction incurs probably log over n? It started with an algorithm that's already [indiscernible]

>> Ravishankar Krishnaswamy: True. It's like comparing log squared with log to the seventh or something like that. If they are all polylog then it's fine, but if your [indiscernible] constants then you do lose. That's right. Any other questions? Okay. Essentially our algorithm sort of an online analog of this reduction theorem. In an online manner we will do some multicommodity problem to single commodity instances. Let me, since this extensively builds on this work, let me go over the high-level approach that they follow. What they show is a very nice decomposition that says that I can decompose a multi-commodity instance into a collection of single commodity instances just that two properties are true. Firstly, in this decomposition every source and sink are connected to the same single commodity in a sense. So they basically find many hub instances, single commodity instances. For every pair of si ti both si and ti connect to the same hub.

>>: Do you mean they are neighbors in the graph?

>> Ravishankar Krishnaswamy: They are not neighbors. You can think of let's say I want to connect into Netflix, and then they find a certain hub that says Chicago and both i's connect to

Chicago using a part and Netflix also connects to Chicago using a part. The next slide should make it clear. Furthermore, they show that if you independently solve this single commodity in census and add up the cost overall it's small. Logarithmically factor worst than the actual optimal solution. Let's say the original optimal solution is something like this. The blue pair is connected using some part. The green pair is connected using some part and so on. It's not necessarily really structured. And you don't know this. I can decompose this instance into a collection of these hub-based instances where you can see s1 and t1 connect to this hub. The orange pairs connect to this hub. The pink pairs connect to this hub and so on. So they not only show an existence of a decomposition, they also partition the sources and sinks into the correct buckets.

>>: When you say they connect to a hub you mean the only connection is going here?

>> Ravishankar Krishnaswamy: Yes. So once you know this you can undo the partition and solve the single commodity problem for each of these separately.

>>: Does it kind of follow from tree [indiscernible]?

>> Ravishankar Krishnaswamy: Almost, because there are two, the cost and the capacity. It's not quite a tree in and of itself, but it's almost [indiscernible]. What they show is that you can always decompose an optimal solution into this sort of a partition. They say if you add up these

costs separately it's at most not too much than the other general cost. So given this existence, how do they find the partitioning? They actually use a very greedy like algorithm to find the partition. They iteratively find the best hub and the best subset of sources and sinks that each connect to this hub below that subset of demands and find the next best hub and so on. In a greedy fashion [indiscernible]. You can think of each tree as a set and you want to color all of the terminals using these trees so they can find a greedy like algorithm to solve this problem.

So very nice analysis but inherently off-line because you need to know all the sources and the sinks to run a greedy algorithm, so you run the best subset and move on and find the next best subset and on. Okay. To motivate our algorithm let's look at a modular view of Chekuri et al algorithm. What they show is they can partition the demands into these common hubs so they can assign each demand pattern to a certain hub such that single commodity and since independently assign the best cables for best connecting these. Find the best cables for this and this, put them all together. That's not a bad idea. So that's what they show. What we are going to do is just make both of these tips online. We make an assignment algorithm online and so in some sense a terminal pair comes. We decide which hub should they both connect to. If I connected Netflix then that algorithm will say let me connect to Chicago and Netflix is also connected to Chicago and the choice of Chicago is what the algorithm means. And so for each terminal one set of [indiscernible] go to this hub. So this we do online. If our partitioning is good then we can like follow from previous works. Then all we need to do is online algorithms for each of these common hubs and put them all together for our solution. And what we show as our main result is that this assignment can work online in a reasonably good way. There is an online assignment scheme that the cost of these independent solutions if you add them up it's not too much worse and there's only a polylog factor that you lose. After this it's okay to say that multi-commodity algorithm works. How do we solve the online problem?

Essentially we use linear programming. We first tried to find an assignment with just fraction.

Then later we went round this to make it integral. To better understand the next slide let ZIR denote a fractional variable. It captures the extent which this si ti pair is sent to this hub r. We have for every possible club we will have available it says how much of this assigned table to I want to connect to this hub. That is the main variable in our LP. So what is the LP? First we want every si ti pair to be connected to some help. So summation ZIR over all possible hubs is at least one. And then we have several single commodity LPs in say this overall outer LP so each LP sub r tries to satisfy an si ti, so all the terminals that are sent to it this LP tries to satisfy them to the extent that the LP decides, so extent of ZIR. So this is what I call the output LP and then there are several LPs for each hub solution. And the object is just to minimize the sum of all these hub like solutions. First week, from the check off-line decomposition of this LP we know that this LP makes a good solution. So there is a good solution which is at most log n times worse than opt for this composite LP. So all that they have split up the overall multicommodity into a collection of single commodity instances. There exists a good solution and whether we can find it online that's the main question. So two questions. First we can we solve this problem from online well? Can we solve a fractional version of it online? Secondly, can we round it. These are the main questions that always come up with online LP-based approaches. For the first question you have to take my word for this, but I won't go over the details, but you can solve this LP online fractionally. The rough idea is to, it's an algorithm inspired by the online set cover algorithm which is sort of like the basic algorithm that many

online algorithms use. We roughly increase at assigned rate at the rate inversely proportional to cost increase to the hub. So si and ti are assigned to a hub r at an inverse rate to the increasing cost if I send it to hub part. If I assign both the same ti to Chicago and find the cost increases a lot, then that assignment is not favored over some other assignment that the cost increase is last. Notice that this is not a static [indiscernible]. It's how much the LP dynamically increases. So this is slightly messy but this is at a fundamental level it's pretty simple. It's not also what I say will not be very illuminating. Take my word that such LPs can be solved online fractionally. For a large class of these LPs you can actually find fractional solutions using a multiplicative [indiscernible] algorithm. Therefore, we can actually find a fractional solution to this LP online. It is only a polylog competitive compared to the optimal solution, so this can be done. The second question that remains is how do we round this LP. But here there is a problem. It won't really be a problem, but at this moment there's going to be a problem. We don't necessarily know how to round these LPs online. Online rounding is already difficult. And this LP furthermore is very complex. Online [indiscernible] has a nice rounding, but many online problems we don't know how to round things well. So what do we do with just having a fractional solution? The idea is that we don't necessarily need to round the LP. We just round assignment variables. We can leave other variables completely fractional and we argue that's and we argue that that's still okay. The only thing to do is rounding the other variables. And that rounding is very simple. What's the rounding? Every hub chooses a uniformly random number between zero and one and scales it's LP solution by that number theta. Essentially, each of the [indiscernible] LPs are randomly scaled by some quantity randomly cable. And how does the SIt [indiscernible] get assigned to a hub? You just do a simple thresholding. If ZIR is at least this random number theta, then assign sin table to this hub [indiscernible]. So this simple calculation says that every SIT assigned to a hub with constant probability because I mention the ZIR is at least one. Furthermore, each scaled LP can fractionally [indiscernible] the assignment. So we have scale by this theta r and ZIR was at least theta and so fractionally the

[indiscernible] LP can satisfy the assigned terminals. Finally, the expected cost of every scaled

LP is small because it's simple scaling by one by x and x is randomly between one and one over n, so the expected value of this is just log n. So what we have done is found a fractional solution to this with the restriction that the ZIRs are integral. Everything else can be fractional.

Cables are still bought fractionally but the ZIRs alone are integral. The overall cost is pretty competitive with respect to opt. After this the talk is almost done. The reason is even though we don't know how to round the inner LPs we know that the LPs are good. So the integrality gap of each inner LP is small. What do I mean? After the previous slide we have shown that the sum of the fractional costs of these inner LPs is at most polylog times opt. And the integrality gap of each inner LP is small. That is because the inner off-line rounding algorithms for general LPs. That's what the sum of the optimal solutions of each inner LP is also small. So our assignment for this gives us a certificate that our assignment did not screw up. That is all that we care about. So our online assignment scheme is good. That is pretty much all that I wanted to show outline. And if you know that in online assignment is good we can just run the existing online algorithms and we can put the two together and complete that. And so what is the overall summary? We had a multi-commodity instance and we used an existence of this decomposition result to formulate a composition LP and then solve it online. And we only partially rounded the outer variables and then used the existence of small integral

[indiscernible] of the inner LPs so that we could get confidence that are assignment was okay.

After that we can just need it to the appropriate online single commodity algorithm so each hub will be running a single commodity algorithm. So you just feed it to appropriate hub and then put all the solutions together and discover a solution. To conclude, we studied this online buy at bulk problem and there were two open questions. So on this already broad loose open question known as a very concrete open question. The loose open question is where else can we use like a partial rounding approach where only partially rounding certain variables and keep other certain variables not rounded and then do something with them. The concrete open question is here I could love buying multiple cables so I could buy the same cable many times, so there are certain obligations where I cannot buy the same cable many times so each cable can only be bought once. Even off-line one source, one sink. That problem is wide open.

We don't know how to approximate the following network design problem where I have a graph and I have 13 cables. I can buy each cable at most once and there is a single source and a single sink wanting to send a certain flow and I want the cheapest network which can support that flow. That's, we don't know how to solve that problem.

>>: You mean [indiscernible].

>> Ravishankar Krishnaswamy: There is no known trivial, I think. Very bad approximation.

Maybe like n or something. Yes so that's pretty much it. Thanks. [applause].

>> Yuval Peres: Any questions?

>>: You had a tantalizing question about whether multi-commodity was inherently harder than single commodity online [indiscernible]. You should have said most [indiscernible]. Is there any super constant separation known with relation to competitiveness?

>> Ravishankar Krishnaswamy: I would think so but I wouldn't bet any money on it.

>>: [indiscernible]

>>: That's for off-line.

>> Yuval Peres: So now we have a longer break. Let's thank the speaker again.

Download