>>: Thank you for coming. This is wonderful you are here at lunch time. Appreciate that. I also have very happy to introduce to you Jennifer. Professor Rexford is actually a professor at Princeton University where she also got her bachelors from and then she got her PhD from Michigan, the University of Michigan. She's unique in the sense that she is in academia now and she's teaching, but she has spent a substantial amount of time in industry. So she has been at AT&T for about eight and a half to nine years before deciding to academia. So she has a good sense of the priorities of research labs and as well as you know academia. And so in fact the fact that her students are so -- you know, hard commodities for us to get is a good thing. But she's visiting us for a few weeks, and as part of that I sort of gently suggested that maybe she should tell us a little bit about things that she's been working on. She's very accomplished. She was a chair of SIGCOMM for four years, she also got some awards, Grace Hopper awards, et cetera, but you can read that in the bio. So without further ado, Jennifer. >> Jennifer Rexford: Thank you. Thanks. It's been a really pleasure to be here in the past few weeks. And actually Victor's intro was perfect because this particular piece of work was borne out of some issues that grappled with when I was at AT&T, and I had been wondering about them for a long time, and I hooked up at Princeton with Mung Chiang who is an optimization theory guy in our electrical engineering department and this work is done in conjunction with him looking at some theoretical techniques that he has for creating network protocols that are implicitly solutions to optimization problems. And so what I want to talk about first, though, is in the networking research community the past few years has been a kind of buzz word that if you're in the networking community you've heard more times than you can count about clean slate network architecture. And I wanted to dwell on this for a moment because this talk is really about that topic. So what does that mean? So network architecture is more than tuning a particular protocol in the Internet or tweaking its performance. It's really about the basic definition and placement of function, what things should be done by the routers and switches, what should be done by the end host, what should be done by automated management systems or human operators? So I want to in this talk revisit the traditional definition placement of function for a particular network task that I'm going to call traffic management. So that's what I mean by network architecture. So what does clean slate network architecture mean? Well, it means thinking about this problem without the constraints of today's artifacts. So what does that mean? It doesn't mean ignoring the speed of light or cost of memory, but it means not worrying about some niggly little detail in some existing IATF standard that often, when I was at AT&T, would constrain the way I was allowed to think about a problem if I wanted it to be used in practice. So why is that useful? Well, part of the goal is to build us a stronger intellectual foundation for designing, deploying and managing networks by being able to think from scratch about how protocols should be designed and used. And I there's some belief that -- to make the Internet better we may even have to change the Internet architecture in somewhat fundamental ways and so there's still an open question about whether this is more than just an intellectual exercise but might in some cases actually help us make the Internet better in a more practical way. So that's all well and good and that people have been talking about this for a while. But one of the things that I found perplexing about it is not really care how one does this, how does one do? Okay, I decided I wanted to be a clean slate network architect. What do I do? What do I do when I start my day? And so I've been talking a lot with Mung Chiang about what to do, and he has a -he and a number of others in his area, Frank Kelly, Steven Low, et cetera, have a real interesting body of techniques for viewing distributed algorithms, protocols, simplistic solutions to optimization problems. And these are algorithms that in fact the math tells us how to derive. And so you could do this talk really two ways, and I'm going to do it I'm going to do it primarily the first way, which is I'm going to walk you through the design process that we went through in solving a problem that I had worked on from a much more bottom up perspective for a number of years when I was at AT&T. If you're someone that likes the journey more than the outcome, this is a talk for you. If you real are interested in the outcome, I will get to that, but it's going to be somewhat superficially treated because I'm real trying to walk you more through a design process than I am preoccupy with the particular outcome we get. So we're going to look at traffic management. So what do I mean by that? I mean that what is it that allows us to compute for each path through the network, between every pair of nodes, what bit rate I'm going to send traffic on that path. And so I define it that way because it encompasses most of the major resource allocation issues that the networking community things about, routing, the computing of the paths, congestion control, the adapting of the sending rates of the sources, traffic engineering, the tuning of routing by the management systems to decide which paths that traffic should go on. And so what I'm going to talk about is going to encompass those three topics but it's going to revisit what division of labor we should have between the routers, the end host and the network operators in solving that problem. So what I want to look at this? It's a pretty broad topic that is pretty basic in networking. But frankly, more importantly, it's at least something I think there's some hope in getting traction on mathematically. And so it seemed like a good place for us to start our activity. And why can we get traction on this mathematically? Well there's been really lovely work by Steven Low and Frank Kelly on reverse engineering and TCP congestion control that shows it's implicitly maximizing user utility, sum over all the users on the Internet. People have recently made some great progress of doing forward engineering using these same techniques, designing new variants of TCP, like TCP fast in Steven Low's group. And also people have used optimization theory to tune the existing protocols, the existing router, the existing routing protocols. And in fact work I was involved in at AT&T took this approach. We assumed the protocols are given, but they offer us knobs that we can use optimization theory to tune. So important problem. Pretty broad in networking and some at least hope that will make progress thing mathematically. And yet a problem that's not solved enough that it's a waste of time for us to work on it, it's still an area where there's not really a good wholistic view. And part of the recently for that is traffic management in the Internet arrived, came -- we arrived at it fairly in an ad hoc fashion, which is true for most protocols in the Internet. So what do we do today? Well, the routers, they talk to one another and they compute paths through the network dynamically based on the current topology. The end hosts, we're in congestion control algorithms that increase and decrease their sending rates in response to congestion on the paths they're using, and the network administrator is sitting on high or automated systems acting in his behalf to tinker under the hood with the configuration of the individual routers to coax them into computing different paths so that when the users do their thing, the traffic flows more effectively through the network. And I as I mentioned before, this evolved really organically over time without much conscious design. So why is that? Well, you always needed the routers to compute paths. That's been pretty basic from the very beginning. Congestion control is added to the Internet in the late 80s when there are starting to be fears of congestion collapse as the Internet became more popular. And traffic engineering became important in this sort of early to mid '90s when the Internet became commercial and ISP started to see significant growth in Internet traffic with the emergence of the web and had commercial motivations to have performance be good and the links to be used efficiently. So these three things are running completely independently. The vendors and the IATF dictate the routing protocols, the IATF and the end host dictate congestion control, and the network administrators in each ISP in the dictate the traffic engineering. So as you might expect, this works, using the network and probably several of you right now in fact, but it doesn't necessarily work well. And so what's wrong with that? Well, the interaction between these protocols and practices are not captured. Congestion control assumes routing isn't changing. Traffic engineering assumes the offered load is inelastic when in fact we know it's fluctuating in response to congestion. Traffic engineering itself, given today's routing protocols is complicated. It's and NP-hard problem to tune the link weights that the routers use to compute shortest paths, to do traffic engineering. And every time and adjustment is made in the network, the routers spend a bit of time talking amongst themselves, leading to transient disruption. So people tend not to adapt the routing of the traffic on all that fine a time scale out of fear of causing transient disruptions. And finally, even though the topologies have multiple paths, they're used only in very primitive ways. Most of the Internet protocols make very limited at best equal cost multi path use of multiple paths. So in light of all these things, the question that mung and I were curious about is if you could start over knowing all these things we know now, what would you do differently. And that's essentially what this talk is about. And I'm going to be fairly deliberative about process here because the process we went through for designing the protocols really I think the part that's more interesting than the protocol itself. So first thing we're going to do is formulate the problem we think the protocol should be solving. Sounds basic but you'd be amazed how many protocols in the Internet we still don't know what problem they're actually trying to solve, and we're not even sure they solve them. So we're going to formulate a problem. I'm going to try inspiration from today's traffic engineering practices and congestion control protocols. I'm going to use mung's magic optimization theory stuff to derive distributed algorithms that are provably table and optimal realizations of that optimization problem. So that's where Mung is very happy but I'm not because the theory only tells us that the thing is table and optimal, not how quickly it's going to converge or how sensitive the protocol is to tuneable parameters. So I'm not happy. So we're going to run a bunch of simulations to understand how well these in fact four protocols at all are provably stable and optimal. Because some are better than others in practice. Simulations will tell us that some are better than others and in fact we don't like any of them. But each of them have some ingredients that we'd like to cherry pick. And so I'm going to use a little bit of human intuition to pull the best features of each of these protocols Apple design a new protocol that is not provably table or optimal or it is only provably stable and optimal under a narrower set of conditions. So it's not and optimization decomposition, although we can at least say some things about it. But it actually is much simpler and much better in practice. So you know when you watch a movie and it says it was inspired by real events, this is the place in the talk where that happens. Where the theory inspired us to do what we did, but we started to deviate from the theory. Finally even that algorithm is extremely abstract. Traffic is fluid, feedback delays are constant. So we're going to translate that into a packet based protocol that we can actually simulate in detail on a packet level simulator and make sure that the properties of the protocol that we think we've inherited from the theory are still there when it becomes a real protocol. That's the story of the talk. So feel free, by the way, to interrupt me any time if you have questions. Okay. So the most important thing that we're going to do is the beginning which is what problem do we want this protocol to solve? And so just talk a moment to look at what TCP does and what network operators do to get inspiration for what things we'd like the combined system to do. So TCP is a bunch of source destination pairs in the Internet and index them by I, and they have implicitly solve as shown by Frank Kelly and Steven low and others a utility maximization problem. So essentially it solving in a distributed fashion aggregating the aggregate utility subject to the fact that the routes combined the traffic that goes on those routes, has to make sure no links are above capacity. And the variable is the sending rate that the source is sending its traffic at. So all the different variants at TCP have a slightly different definition of utility but they all have the same basic property that as you send more you're happier, but they're diminishing returns. And there are a whole bunch of families of off of fairness definitions that come from economics, there's sort of log utility functions. All the different TCP major TCP variants have been traced back to some definition of the utility function that looks like this. And essentially what that utility function represents is some notion of user satisfaction where more is better but it's real important to get moral, you don't have much to begin with. And in fact these applications we run on the Internet are elastic. If we can take advantage of the extra bandwidth when it's available, when we can still function when it's not. Okay. So that's the problem TCP solves. Yeah? >>: Are you looking at throughput as the metric that you care about in some sense? >> Jennifer Rexford: Yeah. That's a great question. Yes. And in fact as a separate work that we did as a follow on to this, we looked at delay instead and tried to have a minimization of aggregate delay. And we end up with a different protocol but with it it goes through exactly the same design process to get at that protocol. >>: [inaudible] users, right? How does that ->> Jennifer Rexford: Yeah. And then we looked at what do you do if you have a weighted sum? And essentially the same theory can be used to help us solve that problem, too. So I'll touch on that really briefly at the end. But you could view this as just an example. And you're totally right, we chose to throughput here, and if you opportunity, you'd end up with a completely different protocol with similar structure but the details would be totally different. Any other questions? So that is the con investigation control piece. So what do network operators do today? Well, they look at their network. And they don't want the links to be congested. And so they're trying to minimize some sum of congestion over all the links. And typically that congestion function looks something like a queuing delay formula, where they're pretty happy as long as the links are below 30, 40, 50 percent utilized and get increasingly unhappy when the link approaches capacity or exceeds it. And they need to do that in such a way that the links utilization is a sum of all the traffic imposed on routes that traverse that link divided by the capacity. So unlike in the previous problem where the source rate was our variable, here it's a given. If the traffic has been measured it's the offered traffic as far as this network knows. And in fact the routing is the variable. Whereas in the other problem, the reverse was true. Okay? So the goal here then is that it is to have a cost function that's the penalty for approaching link capacity. So why do we care with this? Because the network is not terribly robust when its links are operating very near capacity. A small burst of arrival of new traffic from a new source will cause the network to go over a cliff and drop tons of packets on the floor. And so while the user is driving the network into being as heavily loaded as possible by sending as much as you can, the network operator is saying whoa and holding back, trying to keep the links from being over[inaudible]. Yeah? >>: [inaudible] same one from the previous slide then in that [inaudible] right? >> Jennifer Rexford: You want what, I'm sorry? >>: [inaudible] you want to see L, and from the previous slide then you want at most one, right? >> Jennifer Rexford: In theory, yes, but if your routing has mismatched the traffic, you might have utilization over capacity. >>: So [inaudible] explain quickly what is the meaning for CL and CY? >> Jennifer Rexford: So CL is the capacity of link L, and RLIXI is the load put on link L by just that one flow that's traversing link L. And then I'm going to sum up over all flows that traverse link L. And in theory, it can't be above 100 percent, but I'm going to allow it to be over 100 percent. You could sort of think of the parts above 100 percent and it's getting dropped on the floor because the link can't carry them. And that's why I allowed this function to shoot up to near infinity, but to still have a meaning what is in access of 1. >>: So [inaudible]. >> Jennifer Rexford: So L is a link, and it is every flow I will traverse a sequence of links if RLI is one, that link is on the path, and if it's zero, that link isn't on the path. >>: So your sum over all I but only if those relate to L? >> Jennifer Rexford: Well, exactly. But you could think of RL easement as 0 for the links that don't -- that are not on the path traversed by L. >>: By I? >> Jennifer Rexford: By I. Exactly. And if it's multipath that LRI might be 50-50 you know on the two paths for example that carry traffic for flow I. Yeah. So that's the problem that the operator is solving. And this essentially avoids bottlenecks in the network. Yeah. >>: They are also optimizing for latency and a couple other factors, right, or is that cost? >> Jennifer Rexford: So that's true. We don't capture that here, but you're right it kind of implicitly gets captured in the sense that if you take really circuitous paths it imposes load on so many links that you tend to be biased towards paths that don't have high delay. But you're right, we're not explicitly capturing that here. But you could imagine extending that the problem to have it. >>: [inaudible] first order [inaudible] and the second order is latency, then the third order might be cost or [inaudible]. >> Jennifer Rexford: Right. So here we're just cap capturing the congestion piece but you're right the delay will matter to you. Yeah. >>: [inaudible] essentially can [inaudible] in terms of looking at the average ->> Jennifer Rexford: Right. So the limitation of the theory is that we're thinking really of everything as a fluid so that there isn't burstiness. But you could view indirectly that we're capturing the burstiness by caring about this, because if -- in theory there's no reason we couldn't run the link almost at 100 utilization but our runs to do so is based on the fact that that's going to be bad when traffic is bursty. So the theory didn't capture it but we're including this piece in the problem because we're worried about it. And these are exactly the issues that, you know, we're going to end up with a protocol that we think we could say should go about. But in practice we don't real know how it performs when the traffic is bursty, we just have a hunch that we've designed it with those constraints in mind and when we simulate we'll capture the actual burstiness of the traffic. Yeah. Good question. So then what do we do? So essentially what we do is we combine these two objectives together. We say we want to maximize user utility minus the congestion penalty that we're composing -- imposing on the network with some weighted -- weight between the two. So one way to think about that is we're trying to balance the goals of the end host to maximize throughput which tends to push the network towards being bottlenecked while being aware of the network operator's desire to minimize queuing delay and avoid congestion in the network. And if W is zero we're essentially doing very much the same objective function that TCP has. If W is very large, we're essentially just doing traffic engineering and being credibly conservative about what traffic we even let in the network in the first place. And we'll see that having W equal to 0 is a fairly fragile place to be, but having W just a tiny bit larger than 0 is actually the sweet spot for us. But we almost maximize throughput without making the network quite so fragile as it would be if that was all we cared about. So I'll come back to this penalty weight again a little later when we do the numerical experiments. Okay. So what do we do? Now we've got this optimization problem that we formulated, and now we need to find a distributed protocol that solves that problem. So we're going to use the mathematical techniques that Mung Chiang and his brethren have come up with. But one problem we're going to run into is there's a sort of watershed in the optimization community about problems that are convex and problems that aren't. And we don't really know how to deal with the ones that aren't terribly well. And so the first thing I'm going to do is I'm going to force the network to make the problem that Mung solves be convex. So this is the dynamic one that I have, I tell him a problem, he says that problem's really hard, I'm going to work on it, he says it's cool, I say no, it's hard, that's my problem, I'm going to make it easy. And so my job is to make Mung's life easy and his is to make mine hard. So convex problems are great because there's a local minimum that's also the global minimum, you can use gradient techniques to find it, and those are amenable to distributed implementation. Non convex problems don't have that property can easily get stuck in a local minimum, and so you have a hard time making a distributed algorithm that will find the global minimum like that. Okay. So we want to have convex problems. We know how to handle them. And we know how to drive distributed solutions in fact that are computationally simple that provably converge. Okay. So why is our problem not convex the way I've talked about it so far. Well, the main reason is I'm assuming single path routing, or at least I haven't stated what those routes are. Getting back to the question that you asked a moment ago, what is this R. It's the set of links that flow I converses in the network. And it does restrict it some way, like a hundred percent of the traffic has to go on one path or a hundred percent on the other as would be the case with today's single path routing. Or even if I use ECMP that I require only even divisions of 1 over N across the paths I don't have a convex problem because I have these kind of weird constraints imposed by the way I can split over multiple paths. So I'm going to rephrase my problem in terms of multipath routing where I'm going to assume now that I have a set of paths between every ingress egress pair. And this gets back I think a question you asked about -- somebody asked about delay. Yeah, it was you. So I'm going to pick these paths smartly so that I pick paths that don't have high delay. Okay. So I'm going to introduce a little practical stuff under the hood here to pick not all possible exponential set of paths but just a few that are reasonably disjoint and don't have real huge delay. And them I'm going to now say the problem I'm actually solving is to maximize utility summed over the utility I get across all the rate, aggregate rate that I'm sending where that rate is split over multiple paths such that the sum of the link loads again, the link loads individually can't be more than their capacity. So essentially I've just converted what was originally a single sending rate question to now a multi rate sending problem of how much I'm going to send on each of multiple paths and I'm going to assume I can do this in any disk completely arbitrarily fine. It could be 50-50, it could be 49-51 if I need it to be. And that's possible today using a variety of techniques that help multiple packets of the same flow stay on the same path using hashing techniques program to do pseudo random splitting rather than truly arbitrary weighted random splitting. Yeah? >>: So because the convexity do you have to pick the path carefully or ->> Jennifer Rexford: No. No. The main thing that I need is the splitting. Yeah, really good question. I don't actually really care. The main thing I'm getting out of how I pick the paths is because I'm not choosing to represent every possible path on the graph, I'm stepping away from optimal it, because I originally was allowing any path potentially to be used. But now I'm essentially not allowing the path 987321546, and it could have been that the optimal solution would have used it and I'm not. So it's more that I need to pick a reasonable set of paths that the optimal solution doesn't lie far away from what I'm letting get expressed here. But the good thing is like picking the K shortest paths or the K shortest paths that are at least somewhat disjoint from one another seems to work pretty well. Most of the graphs I've looked at. I know data centers probably don't look like this but don't have so much path diversity that that's hard to do. And the data center that might not actually be the case. >>: So you're assuming you're varying the rate long each of the paths so one path might [inaudible] traffic at a greater rate than another? >> Jennifer Rexford: Exactly. So you could think of that two ways. You could think that the component that's sitting here, which could be an end host or an end switch either is computing those rates and sending at them or is computing an aggregate rate so that the host is told send at whatever the sum turns out to be, or a network is just doing proportional splitting based on the relative ratio between those waits. Or you could think of a single component doing both those functions. Yeah. But the user utility is defined just in terms of the aggregate because it's sort of oblivious to which one it's using. >>: The [inaudible] because I didn't get that. Because in one case you have [inaudible] just choosing the path and now you're saying [inaudible] being integer I'm going to use some point to find a split, right? >> Jennifer Rexford: Right. So one way to think about it is the way I've formulated the problem here before I changed the problem I was requiring this to be 100 or 010 or 001. And so I actually have discontinuities that are introduced by that constraint which essentially leave me with situations that are a bit like -- a bit like this picture on the right. Now I got continuity so I can make progress by just imperceptibly shifting a portion of traffic from one path to another. And all the solutions in between are feasible ones in my model, whereas in the earlier model I was forcing things to be discontinuous. I could only have integer solutions in a sense. >>: Right. >> Jennifer Rexford: In the -- before I made this change. >>: So just the fact that you move from an integer to just a linear program. >> Jennifer Rexford: Exactly. >>: Solution is what makes it ->> Jennifer Rexford: Exactly. Yeah. Exactly. Just not the set of paths, it's purely the fact that I now can have a continuous solution rather than a discreet one. >>: It's basically going from [inaudible] linear program to standard [inaudible]. >> Jennifer Rexford: Exactly. So essentially I'm in the realm of multi-commodity flow now. Exactly. It's the same reason MP elas can achieve optimal engineering but OSPF can't. Because the ability to do arbitrary splitting is the key that's keeping the optimization problem from being tractable and optimal. >>: [inaudible]. >> Jennifer Rexford: I'm only restricting myself to that for computational simplicity and in the sense that I don't really want this node in practice to have to deal with exponential number of paths because I'm going to keep state in the end host and router in proportion to that. So it's purely -- the theory didn't really require me to be so restrictive. It's more that I was think in practice not realistic to enumerate all the paths. And I also don't think it helps much beyond a certain point. So there's no reason I couldn't conceivably do that, it just didn't make practical sense. Yeah? >>: If you are -- if you were to fix paths up front, even fixing one path [inaudible] I mean what you mean by single path routing is letting the problem choose the route. That's what's not ->> Jennifer Rexford: Even in this setting if I picked three routes and I said pick one of the three, I'm still going to have a problem. >>: Yeah. But if you just pick, you say, you know, you have to go on this path what [inaudible]. >> Jennifer Rexford: Yeah. That's right. Exactly. If there was only one path here I would be fine, as long as I didn't allow -- it's more that if I have multiple and I restrict you to using only one of them. You're totally right. And in fact, we'll see in practice most of the solutions put all the traffic on one path. There are just a few node pairs that often in practice take advantage of the need to split. So part of it is a computational problem I have to get through to be able to get the math to work. And part of it is that occasionally there really is a part and a place in the network where you need that flexibility for efficiency reasons. Yeah? >>: Well, [inaudible] if you are -- if there is just one edge, just two nodes. >> Jennifer Rexford: Right. >>: If your [inaudible] complicated, you can still of a number [inaudible] right? [inaudible] if. >> Jennifer Rexford: In terms of the number of paths you could. >>: No. >> Jennifer Rexford: Sorry, I didn't ->>: Just if one path, one edge. But your functions [inaudible] sophisticated. >> Jennifer Rexford: Yeah. Yeah. Yeah, that's fair. >>: Then it's still ->> Jennifer Rexford: Well, the nice thing here is my U function is concave and my F function is convex. Because essentially I've got this diminishing return function utility increases but diminishes, and if you think of it like an MM1Q, it's essentially increasing and convex. And that's what's letting us make that decomposition. In fact what this is fixing is the constraint. I had a non convex constraint that I'm fixing by moving to this flexible splitting over multiple paths. I already had a convex objective function before I started because U function and the F function have property that I need to make that true. Now, as you can imagine if I try to handle other classes of [inaudible] because utility functions are not as friendly as aggregate throughput is. That might not be true anymore. But for throughput it is. And for certain delay formulations like minimizing aggregate delay, that's also true. It's not true for, you know, delay bounds where you actually have a sharp discontinuity in the curve. So our delay work that was asked about earlier, we actually assume that we're trying to minimize aggregate delay rather than maximizing the percent of traffic that satisfies a delay bound which actually wouldn't fit in this kind of framework. Okay. So here is our -- here is our new problem now. So what do we do. So what are -- the distributed solutions that pop out of the math are going to have this basic flavor to it. The links are computing -- the routers have multiple paths. They may be set up in advance. They may be computed dynamically, or they may be set up in advance by the network administrator. They're going to monitor the load on their incident links and compute a so called price that's going to be reflective of the congestion on the link. They're going to either feed that back to the ingress node which could be a router or a host or they'll be picked up as the packets flow through the network similar for folks that are familiar with ATM network, similar to the RM cell that ATM networks had. And that source based on all of these prices is going to update its path rate so it's going to both change the aggregate rate at which it's sending and the percentage of that traffic it sends on each of the K paths. The network administrator is tuning some parameters that get tuned on the very, very coarse time scale and otherwise can really take the rest of the day off. Okay. The network is doing most of the work here. These other parameters which I cryptically refer to here are going to be some tuning parameters that pop out of the math, and that you are going to be the bane of our existence and it's going to be why the so call optimal protocols are not going to be the final answer to the question. Okay. So what do we do? So I'm not going to go into a lot of detail about exactly the theoretical techniques, I'll just say that the decomposition techniques we use are pretty standard. You know, there's a little bit at work to turn the crank and make them pop out, but we didn't really do any significant innovation there. We're just going to try to give you a flavor for what that process looks like. So essentially what we have are prices that are penalties for getting close to violating a constraint like the capacity constraint on each link. We're going to punish a link that's very, very close to overloading its lunch. The path rates are going to be updated based on those penalties to shy away from paths that are congested. And so you can get an example of this in a single path setting. Thing about what TCP congestion control does, which can be expressed in a similar kind of way, just it's a little simpler than the problem that we have. The link prices are things like packet loss, or packet end to end delay. And the sources are updating additively increasing or multiplicatively decreasing based on the prices they get back from the network In the case of TCP those prices are implicit and the observations, the sources themselves make about loss and delay. In our case, they're going to be explicit and provided by the network by marking the markets or by providing feedback to the end hose. So our problem is going to be very similar in spirit, it's just more complicated because our objective has two terms in it rather than just aggregate utility, and we're using multiple paths instead of one. But the basic technique is pretty much the same. And so just give you an illustration of what kind of transformation we'll make at the problem in order to make it distributed. So one constraint we have and it's to really the main one is we have to stay below link capacity. Okay? So we're going to do things like this. We're going to say okay, let's suppose the link load is YL. I want to make sure YL stays below C. And one way I can do that would be to do some sort of subgradient feedback update where I essentially look at the link load and see how I'm doing. And in I'm in fact at a very high load, I'll actually increase the price because the gap, if link load is higher than RL, the target load I'm trying to get at, I'll actually increase the price and that will make the path that this link traverses -- I'm sorry, the link, the path that's traversing this link look less attractive, which will ultimately lead the source to send less track on it. So here we've already introduced one of these so called other parameters that I'm going to come back to later, so the step size is going to be something the network operator has to tune. So that's unfortunate. And we'll see later it actually is -- it is problematic in practice as well. But that's the kind of thing we're doing. We're taking each of the constraints and now you can see this is decomposed in the sense that a link alone can do this computations by itself. So each of the links in the network are doing a simple kind of update like this. And other parts of the problem, other constraints and other parts of our objective function get decomposed in similar ways into things either the edge node is able to do on its own or the links are able to do on their own. Yeah? >>: [inaudible]. >> Jennifer Rexford: So in fact I didn't go through this here, but we have -- you can end up now decomposing, making link load match YL into its own update. It's a made up variable. So it's made up. It's made up. So another constraint I now have is link load equals YL. That I can solve by saying YL minus link load has to be zero. And I can measure how well I'm doing. >>: [inaudible]. >> Jennifer Rexford: Has no real meaning, it's artificial construction just to allow me to decompose the problem. And it's just one way of decomposing the problem. And in fact, we use four different decomposition techniques to vary in terms of which concentrate they go after first and in what way they decompose it. But they're all pretty standard ways of taking linear constraints and decomposing them into updates. Okay. So just to give -- so essentially another example, this we call defective capacity, essentially gives us an early warning of impending congestion. We're doing something to get a sense of how much load the link should have. We're noticing that we're doing worse than that, and so we're going to panic a bit by penalizing traffic from using this link. And there's another parameter that allows us to overshoot the actually load on the link to go a little faster, to send more traffic than the network can handle in order to converge more quickly. And I'm not going to go into that. But essentially each of the decompositions we do of a very specific physical meaning in helping the links or the sources understand the congestion state that they're in based on local computations they can do. So skipping some of the details because again, as I mentioned, my goal here is to convey the journey more than the outcome, essentially we end up with four different decompositions. We take all the techniques the optimization community has come up with for slicing and dicing these kind of complex optimization problems, and they all have the same flavor. The links update a price, the sources update their splitting rates based on that. And they all just differ in which order they try to address each of the constraints in the problem. And so we end up with four different algorithms with different number of tuning parameters and different degrees of aggressiveness in responding to congestion. So some of them are essentially going to have different dynamics than others. They're all provably optimal. They're all provably stable. They are by definition by the way they were derived distributed algorithms for solving the optimization problem where the components doing the computation of a links and the edge nodes. Okay. So that's all well and good. The theory tells us that we're stable and optimal. But we're real not done. Because we don't actually know how they perform in practice. And in fact, practice is going to in fact deviate quite a bit from the theory in some of the cases. So again we know they're going to converge. But they're only going to converge for diminishing step size. They're only going to converge on some time scale that we can have extremely, extremely weak balance on, not strong enough balance for us to be happy that they're going to perform well in practice. And these tuneable parameters, beyond telling us that they have to be diminishing with time which isn't real practical, we have no guidance from the theory on how to tune. Okay? So we have something that is provably optimal and stable, but still not very useful. So what we did is because at this point our so called protocols are really numerical computations, right, they're not even algorithm -- they're not even distributed algorithms in any really meaningful sense, they're real just function evaluations that we're doing. So essentially we went into MATLAB and we did a huge multi factor experiment where we varied every parameter and all these protocols and made a ton of graphs to study the rate of convergence and the sensitivity of each protocol to its tuning parameters. So the first thing we looked at is how much do we care about the relative value of the user's utility function when it's zero is all we care about, versus being robust by being a little bit conservative in what we put on the network. So Y axis equal 100 means we've maximized aggregate utility. As it starts to drop off we're being so conservative that conservatism is costing the users performance problems. What we find in a lot of our experiments is if we operate at W equals zero, the system is extremely fragile, hard to make stable, but fortunately we've got a relatively wideband of values at W where we get pretty close to optimal behavior despite not being exactly optimal. So we can make W equal to half or 6 or something and still be pretty much in a regime where we're going to be able to nearly maximize user utility. And you'll find you can be a much, much more robust by making that tradeoff. And these are three different network topologies that we studied that on. So that's W. So we have a sense of how to set that it's topology dependent. It relates a bit to the number of diverse paths in the network. But for the most part we can still as long as we're W a half or so, we're not in bad shape. So these are the kind of graphs we looked at for a whole bunch of the different protocols. So protocol might have a particular tuneable parameter like a step size. We'll sweep its value and we'll look at how long it takes to converge with an epsilon of optimal. And the Y axis is the number of iterations. Think of them as round trip time. How many round trip times do we have to keep iterating and updating before we converge. And so in a graph like this we would look at it and say, well, we converge to near optimal within a few dozen iterations. That's not bad. And we look at the width of these curves to get a sense of how sensitive it is to our parameters, so if we end up setting the step size over here, we're in big trouble because it's going to take 300 round trip times to converge. So we're reasonably happy that this could be worse, it could be narrower, but it's still not great if we don't get the value right we're not going to converge all that quickly. And so we compared all these protocols based on graphs like this. And what we ended up with was seeing that small values of W below sort of a sixth to a half are pretty dangerous but once you had a large value of W, things are pretty good. And the schemes that had larger number of tuning parameters really didn't do better, even if you could find the best tuning parameter to use for them, they didn't really converge must faster than the simpler schemes. As you might expect, schemes that allow a little bit of more aggressive transmission overshooting occasionally converge more quickly, and in some sense doing direct updates from current observations of network conditions was more effective than contracting past history. Okay. And so we took away from this that all of the protocols, all four of the -- well, distributed algorithms is probably more appropriate at this point, had some interesting flavor to them, but they each were somewhat unsatisfying in one respect or another. And so we took away these very basic observations and we cherry picked aspects of the computations that each of these algorithms are doing to try to take the direct update rules and the techniques that had fewer tuneable parameters, and we constructed new protocol that's simpler. So just to show you what that looks like, essentially what we do on every link is we keep track of the capacity, the link, and how much the load is either in excess or underneath that, and we update the loss price, if you will, based on that. So think of the loss price as saying, well, if link load is higher than link capacity, I'm going to lose the traffic that's in excess of the link capacity. So logically you could think of this P price as the fraction of traffic that's going to get lost. And the other price as essentially the increasing queuing delay, this is a F prime, so it's the derivative of that F function that looks like queuing delay. So this is a little bit like the link measuring how much queuing delay has been increasing in the recent past. And each of these are equations that appear in various of the other protocols we derive theoretically, so that's where we got them from. And essentially we accumulate a price along the path by summing those up. And so the source looks at each path in terms of the sum of this loss and delay base price list. Note that versions of TCP do something kind of similar to this, they just do it implicitly, right. They look at loss and they look at delay. We're not so different in the end in doing that as well, although the details are a little different. Yeah? >>: [inaudible]. >> Jennifer Rexford: So we do the computation roughly in sort of the ramp of time, time scale. You could do it -- so you imagine feeding back stuff in the packets themselves as they're flowing. >>: But isn't that [inaudible]. >> Jennifer Rexford: It could be. I mean, you could choose to do it more slowly. It's just if you want to converge more quickly when things change you might choose to do it more quickly. >>: [inaudible] to the close capacity and TCP tile to react and [inaudible] right. >> Jennifer Rexford: Right. So what we have going for us that TCP doesn't is feedback here is explicit. And so we know immediately what the congestion state of a link is, rather than waiting for package to get dropped before we know. So we're getting in some sense a much earlier warning. In exchange, in fairness, for the overhead of making that information explicitly available. Yeah? >>: But are you multiplying many, many TCPs into this network? >> Jennifer Rexford: Yeah. Yes, you could think of this sort of flow I as really all flows between particular ingress egress pair. >>: So you are reacting, like Victor said you have little TCPs [inaudible] reacting ->> Jennifer Rexford: There is no TCP here. >>: Yeah, but I thought you were multiplying many flows. >> Jennifer Rexford: Yeah, but they're not doing TCP. >>: Okay. >> Jennifer Rexford: Yeah. They're being -- they're being told to send at the rate that we're telling them to send at. Yeah. So, yeah, that would interact in ways that might be hard to predict. But, yeah, we're essentially telling -- you could think of us as telling the ingress router how to shape the flows that come in, or you could think of it as the ingress router telling -- or the ingress router being the host who is being explicitly told the rate at which he should send. Yeah. And that's actually critical. We get our faster adaptation than TCP because we're being -- we're being so heavy-handed. Yeah? >>: How do you compare this to MPLS's auto bandwidth. How would that compare to the [inaudible]. >> Jennifer Rexford: So my sense of MPLS auto bandwidth, for folks who don't know, the basic idea in auto bandwidth is if you encounter congestion you can essentially dynamically set up a new path. >>: [inaudible] dynamically grow the bandwidth of an LSP and I will set up impact for itself, right? >> Jennifer Rexford: Yeah, yeah. Exactly. The main difference here is we're continuously doing updates. You could think of auto bandwidth as a, you know, pull the parachute because things have gotten bad. Here, we're going to be continuously adapting the sending rates and splitting ratios over the multiple paths. And we don't have a target capacity for that path at the beginning, we're just essentially always letting the network tell us what to do. So auto bandwidth is conceptually similar but it's a bit more of a pull the parachute when the plane's applying up kind of thing. Yeah? >>: Price is [inaudible] or just a ->> Jennifer Rexford: Price is what? >>: Is that a [inaudible] feedback or is it ->> Jennifer Rexford: It is. It's a number. It's a number corresponding to the current price. So if you were to think of it in ECN terms, it would be a multi bit variant of ECN. If you wanted to discretized that and carry it in the path. Yeah in. >>: On the average how many source definition paths do you have? >> Jennifer Rexford: Maximize of our experiments we had about three to five. And that was primarily because the topologies we're working with were a backbone network topologies where you really between city pairs would have something in that ballpark. And we thought that was sort of reasonable overhead to put on the ingress switch. I'm curious in the data center context what would be the right thing to do. I assume that number would be too small to make more effective use of a data center network that has more paths of comparable latency available to them. Okay. So that's roughly what TRUMP does. And then it does a little local optimization at the ingress node which could either be an end host or a switch to figure out essentially the best way to decide how much traffic to send along those paths subject to the price of each of those paths in order to maximize utility subject to some aggregate price you're willing to pay. Anyway, I'm kind of glossing over details here in the interest of getting through everything. But that's the basic jest. And I should stress again, TRUMP is not a decomposition. It is not yet another decomposition. It does not pop out of the theory, it popped out of our intuition applied to the experiments applied to the protocols we got from the theory. We can prove a few things about it. We know under certain conditions that it's stable and optimal, but we can't prove as strong of results as we're able to prove for other protocols we started with. When we did this, you know, massive multiparameter sweep in MATLAB, we found it converged substantially faster and it has only one parameter that's pretty easy to tune. So we're happy for the tradeoff here, even though we don't have as rigorous of results from the theory. Okay. So it's still as you can probably tell from the fact we're evaluating it in MATLAB, it's still mostly a numerical experiment, it's not real a simulation. It's just a bunch of equations that we're iterating through with assumptions about traffic being a fluid and about feedback delay being constant, all of which are really not true in practice. So the next thing we do is we translate that into a packet-based protocol. And nothing here will surprise you. This is pretty easy stuff. We're now going to have time intervals T, we're going to count load as the amount of bites that go across a link during that time period, and we're going to update the link price for every time period. And we're going to define that time period to be the max of the RTTs of the different paths that are being used. We could do it less often than this. This is something that seemed reasonable to us because you reasonably can't get the feedback from all the paths any quicker than that. We haven't yet understood deeply how much larger we can make it than this and still have the stability properties that we see in the experiments, but with this update, we do -- we do see stability. Okay. So what does it look like in the end? Oh, I should say also when a new flow arrives or departs, it knows from the ingress node what the prices of the paths are, and so it's appropriate rate for sending can also be determined directly because we know the current -- our current view of the state of the network. Okay. So unlike TCP that does, you know, sort of a slow start, we know the appropriate rate that this flow can send out amongst the group of flows it's part of, because we already know what the conditions on the network are. Okay. Yeah? >>: I'm sorry a new [inaudible] doesn't really talk about the [inaudible] flows, right? >> Jennifer Rexford: So that's an interesting question. If we were trying to just did the -- so TCP actually does have a notion of fairness when it does that. We don't necessarily inherent that notion of fairness. So we looked empirically at that, and we do see that we tend to do a pretty good job evenly, you know, essentially not starving. But the property that TCP has where it actually does give a certain fair share doesn't inherit -- we don't inherit that property because our objective function is different. >>: I know for -- what kind of fairness do you end up with? >> Jennifer Rexford: It's not actually obvious what we end up with. >>: I mean using the log function automatically implicitly gets you some kind of fairness, right, because if you separate out people too much, the penalty you get for giving somebody more but the benefit you get from giving somebody more is far less the penalty you get for ->> Jennifer Rexford: Definitely. So we expect we're doing something reasonable, but because we're also subtracting this F term, we can't be 100 percent sure what we're getting. But the intuition especially if we keep W that sort of weight between user utility and network utility to be pretty small. The hope is we inherit similar fairness properties to TCP. We can't prove that in the general case. Other questions? Okay. So that's essentially what the protocol looks like. So then we went NS-2 and did more realistic experiments with real delays, with real paths and with an on-off heavy held traffic model mimicking what we kind of expect web traffic to look like. And essentially what we want to know is all these multi factor experiments we did in MATLAB to be able to stumble along the right answer to our problem. Were those things misguided because there was such an abstractive view of what the network looks like, or did they give us a reasonably good view. And in fact we did NS-2 experiments for the other protocols to get a sense also of whether our MATLAB results were accurate. And actually they were pretty accurate in practice. So we feel like the MATLAB results that we used as sort of a sign post along the way gave us at least some insight of which way to turn at different stages in the work. Yeah? >>: [inaudible] in terms of [inaudible]. >> Jennifer Rexford: Relatively. We tended to pick the K shortest paths where we did -- if the K shortest path shared so many links in common, we would essentially try to have path that is were a little bit more diverse. So you could think of it as a tradeoff between K shortest paths and K link disjoint paths, something a little in between that. >>: If you had [inaudible] would you still include them in your ->> Jennifer Rexford: No. I mean, you could. But they're not going to help you much because in the end you're probably not going to use them very much. So we chose to limit the number of paths to be sort of in the three to six range. Again, the thing, protocol would still work and in fact would provably work at least as good, if not better, if we included more. It's just that we chose not to include them because we assume they would end up carrying infinitesimal amount of traffic anyway. And in fact we found that as we did experiments where we increased the number of paths. Once we got above four, we didn't real see additional benefits from having more. >>: What was the [inaudible] increasing the number of paths but actually we -diversity in the [inaudible]. >> Jennifer Rexford: Yeah. There's a trickery to that in the sense that if your K shortest paths have too many links in common you don't get enough of benefit from the diversity. And yet if you pick completely disjoint paths, the K paths you pick might have widely varying lengths. And so I think there's some art to picking the paths. And we did some experimentation with that. And certainly if we just pick only the K shortest paths we don't do as well as if we pick kind of a mix of disjointedness and shortness. Don't know what the right answer there but I think picking a handful of path that is are not much longer than the shortest path that have sort of maximal disjointedness subject to that, something in that ballpark we think will work well. But that's certainly in the realm of human engineering, not something we can say something [inaudible] about. So how do we? So I'm running low on time. Yes? >>: Just some common way of [inaudible] similar [inaudible]. So what we did was [inaudible] where of course that's an [inaudible] algorithm. So [inaudible] matrix. And I couldn't make a problem convex by using the [inaudible] the information about [inaudible] right. So therefore what you can do is you can use some kind of maximum coverage. >> Jennifer Rexford: That's interesting. >>: And you essentially can flow model and [inaudible] for example the [inaudible]. >> Jennifer Rexford: Okay. >>: I can reach some kind of cover like 90 percent, then of course then you can do this for [inaudible] and go back and now [inaudible]. >> Jennifer Rexford: Okay. Very cute. That's nice. Yeah, that's exactly what we don't have. I mean, we're essentially doing things like multicommodity flow and then picking the paths that tended to carry traffic, but we weren't real formalizing it quite that much. That's really cool. Yeah. So now I'm just going to really briefly given I'm low on time run through just a into few graphs. So what we're seeing on the left here is a time series plot of TRUMP running and the Y axis is the is the aggregate throughput. We keep in mind here aggregate throughput includes stuff that's getting lost, so these little spikes you're seeing on the upper left are traffic that's in fact it's a -- looks like it's a good thing but it's actually bad because traffic is getting dropped in the floor. So what we see at steady state is what load the network is carrying in aggregate. So you can see we make W this parameter where we are conservative, too high, we end up being so conservative that we hurt the aggregate throughput. But in fact, we can actually do pretty good even having protocols that are much more stable by having W just be kind of in the one-third to one-half range. So that's just sort of a quick confirmation of that. If you look at the graph on the right, that was one of the earlier protocols that we evaluated optimized for the steps I said the best value we could possibly get for it, after sweeping MATLAB like crazy. And as you can see, the way it behaves under two different settings of its step size it's pretty -- well, it's pretty sensitive, those two step size a little hard to see are pretty close to one another. The black curve is doing as good as we could possibly good if we let MATLAB search for a long time to find exactly the best value. The blue curve is not much different in step size, and yet you can see it things just going nuts. So again, we have a protocol that's provably stable and optimal, but we have to have the right step size to get it to work, whereas we found with TRUMP actually it was pretty insensitive to the tuning parameters. We did some experiments with failures. So here we look at three different paths that are carrying traffic between a particular ingress, egress pair. The blue, the red, and the green are the loads on the three different paths. Actually what you can see from this actually first off is that the vast majority of the traffic is going on one path, and in fact this is a recurring theme. In other words, if somebody asked earlier, do we really need multiple paths or do we just need flexible splitting on whatever paths we have. And the latter is the point. And in fact, for the vast majority of search destination pairs we do see 100 percent or nearly 100 percent of the traffic going just one of the paths. When we fail a link on the green path, we see the system adjust pretty quickly, and it put traffic more of the traffic now on the third path, the red path. Although less because now the network has less capacity and is not as able to handle the load. But it responds pretty quickly. And here the time is in seconds based on a -- I think it's the Sprint topology looking at the link between New Jersey and, gosh, Indiana, I think it's Indiana. And finally we also did some experiments where we varied the average file size. So we looked at the web objects. They have a pareto or exponential distribution of file sizes and when changed the mean file size. And we're looking here at a Y axis where 100 percent is good, where we actually maximize the aggregate throughput. And as you can see, if the file sizes are really tiny, we're not going to do that great. But if the average file size is reasonably big, where we have at least a few round trip times for the flow, we actually get pretty quickly up to a reasonable use of the network. And it's actually fairly robust across a range of different distributions. So the mean seems to be more important here than how heavy the tail actually is. Now, if you were doing TCP as you might expect, small flows are also extremely problematic for TCP, and in fact they're even more problematic than they are for us, largely because we essentially have information about the state of the network when the flow starts. So we're able to jump in right away but it's still going to take us a little bit of time. The flow size has to be -- flow has to be around at least for a little while for us to not have such bursty traffic that we don't get to the right place before the flow actually ends. And finally this gets at a question that's come up in a few guises here already is how many paths do we need, so this is looking at time on the X axis looking at the aggregate throughput, depending on how many paths we let the network have. The black curve is where we force everybody to use one path and the blue is two, yellow is three, green is four. So essentially having three or four paths, we're starting to get extremely diminishing paths at this point by adding more. This is for the -- I think this Sprint topology or it's an ISP level topology. Now, again, if you were to look at a topology with greater natural path diversity, you might need more than this. But what we found for the back one networks is that there were so few path that were substantially close in latency and number of hops that really that the fifth, fourth, and sixth path were so long that you wouldn't really have wanted to put traffic on them anyway. So for the most part the additional paths don't buy you much in this context. >>: In this context you have the [inaudible] why you getting integers of [inaudible]. >> Jennifer Rexford: That's a great question. That could very well be. I don't know. >>: I mean just -- just as an experiment if you have like a fully [inaudible] graph, do you [inaudible] going to see if getting ->> Jennifer Rexford: Yeah, that's a really good point. Yeah, even when we have four paths, we really tended to see 100 percent rather than splitting. >>: No, but I [inaudible]. >> Jennifer Rexford: They're largely disjoint. Not completely. >>: Because if the -- if the two [inaudible] flip flows are entracting at any given point, then you might -- you might just say that all you know, I'm just better off just sending everything on one path. And what we're going to [inaudible]. >> Jennifer Rexford: Yeah. That's a really good point. Yes. So we didn't do that. But that's a really good point. So any way, to include TRUMP in the end is -- has one easy to tune parameter. It only needs to be tuned when this W value is set really small, which we tend not to want to do anyway, because it makes the system not terribly robust. It's pretty quick in recovering the link failures and recoveries. And it seems to perform in a way that although it depends on file size it's not terribly dependent on the variance of file sizes. And we call it TRUMP because it trumps the other algorithms. We had and earlier protocol I didn't subject you to called dump. And any way we dumped dump because trump umped. Any way, trump is better than dump. And we think, although we don't know for sure, it's something we're still working on, that it might be possible to design a variant of TRUMP that works with implicit feedback. Note here we are passing back prices on the links about with information that kind of maps to loss and delay variation. We think that that intuition could be applied to have a variant of TRUMP that works with the end host or edge router implicitly inferring the loss and delay variation. But we haven't yet gotten traction on that problem. So another question you might have is okay, so so far I've still been pretty abstract. Yeah? >>: How many [inaudible]. >> Jennifer Rexford: I think we're just representing them as floating point numbers. That's a good question though, what level of granularity do we really need. I suspect a handful of bits is probably enough. But we definitely need more than one. >>: Because I mean sort of the [inaudible] like I said, the TCP you can sort of reverse engineering and get the same thing, but usually there's like a different equation model of TCP, and they assume like many flows each TCP source is [inaudible] and something I was always wondering is, is there any sense as how, how much mixing you need to actually be able to, you know, have that model represent anything. >> Jennifer Rexford: Yeah, that's a really good question. Yeah, I don't know if anything we do here sheds light on that, but it's a good question. And we definitely are taking advantage of some of the dynamic range of the price information here, too, so it would be interesting to know if we represented it logarithmically, how many bits would we really, really need to do it. I suspect that we'd need a handful, but hopefully not, not a huge number. So one thing you might be wondering is what are these components? I've talked pretty abstractly about the edge and the link and so on. And so what is the new architecture, if you will. So if we look at the picture I had at the beginning of the talk, the operators are tuning link weights and setting penalty functions and such. Under TRUMP, they're at most on a very coarse time scale computing the paths on behalf of the routers. If we don't want the routers to do that themselves. And they're tuning some offline tuneable parameters. So they're not doing very much. The sources in today's TCP are adapting their sending rate. Here we're adapting the actual rate on each path. And today's routers are doing some form of shortest path routing. Here they're not doing anything except computing prices on the links and you feeding them back to the sources so the sources can appropriately split the traffic over multiple paths. And it's something I've talked about with a number of people in this room is whether in fact this would allow the routers to operate without a control plane. Right? Particularly you see the things that are in parentheses here, who is doing this? The multiple paths can be set up by the routers or by the management system. The prices could be computed by SMNP polling of the link nodes or by the routers themselves. And so really what's nice here is you could actually vary quite a bit what is actually done by the routers and in fact could even imagine an implementation of TRUMP where the routers do almost nothing except collect the measurement data about link load on a, you know, some sort of round trip time scale. Okay. So I started off by saying the math was going to give us an architecture at the end, and that was a little bit optimistic. So what we end up with is a division of functionality but exactly what is the source and what is a router is still left a bit open. So the sources here could be the end host, could be the edge router, or even could be a mix of the two where maybe the end host is computing or enforcing an aggregate throughput and the edge routers are doing the splitting over the multiple paths. So there you could imagine an end host variant of this which might be appropriate in a data center network. You could imagine an edge router variant of this might be more appropriate in an ISP. The feedback in everything we've assumed so far is explicit about the link loads and the prices that are computed based on them, but we suspect but we're not 100 percent sure yet that an implicit version might be feasible. And of course that would be real nice if we wanted an interdomain version of TRUMP, right, because right now we're relying on quite a lot of cooperation from the network elements that would be extremely problematic in an interdomain setting. And finally the computation of the paths, of the prices of the path rights could be done at the links and the sources or could be done by the management system itself and pushed down into the network elements and collected up from the network elements if one wanted to do. So in fact, the network management system could implement almost this entire scheme just by monitoring link load and pushing down ascending rights down to the -- down to the end nodes or be nod involved at all if you put all that function in the routers. And we view that as a plus that in fact those questions are left open because the actual choice between these things might be driven by completely different issues like trust and security issues about whether you trust the host or you trust the routers. Maybe if you're Microsoft you trust the host and if you're AT&T you might trust the routers and so you both have a variant of TRUMP that might make sense, depending on who you are. >>: [inaudible] when we ask the question you actually said not TCP so you were thinking edge routers [inaudible] explicit so [inaudible] in your own mind it's not real [inaudible]. >> Jennifer Rexford: Yes and no. You're right at some level but you could imagine what the edge router is doing is shaping at sending rate in which case the host could send it whatever rate it wants but it's going to get rate limited as soon as it tries to exceed the shaper. In which case you could either say well it should rewrite its code so it doesn't hurt itself by exceeding it only to have its packets dropped or you could also look at how well would somebody implementing TCP do subject to a rate limit that's been computed by the network. We haven't studied that, but one could do that. It would just be clumsy, I think, to do it that way, but it's certainly feasible to do it. Yeah? >>: I think it would be really helpful to compare this to RSCPT and auto bandwidth to [inaudible] I'm wondering if the utility of going out and having to pull the device [inaudible] pretty much [inaudible]. >> Jennifer Rexford: Oh, that's a really good idea. >>: So it might be a very interesting adjunct ->> Jennifer Rexford: Yeah, that's a great idea. Certainly MPLS could be used here for the multipath establishment. And we've looked at the splitting ratio business can also be done using existing MPLS features as well. But we haven't done the rest of the comparison you mentioned. So I'll just conclude by saying we've been looking at implicit feedback based extensions to TRUMP, and in particular we're interested in that because if we wanted to do this in an interdomain setting we view sort of the reliance on explicit feedback at non starter there, so we want to try to resolve that. And I'll just end by touching on a question that came up at the very beginning of the talk. And excuse me for running over. I talked purely about throughput here. But you could imagine applications that would prefer utility function that captured delay. And so we've gone through this exact same exercise for delay and have a different protocol that is structurally very similar multipath feedback of prices and adaptations of splitting ratios for traffic that's delayed -- in elastic traffic that's delay sensitive, and a similar kind of thing we can actually find the provably optimal stable, blah, blah, blah, and a variant of it that's practical. So pretty much the same story, just with a different utility function. Now, what if you have both in your network? You have both delay sensitive and throughput sensitive traffic? Well, then essentially your objective is a weighted sum of the two objectives in these two earlier studies. And what's actually really an interesting gift from optimization theory is that optimization problem can be decomposed in a two optimization problems that are largely independent, that correspond to the two protocols that we just arrived up here. So essentially that optimization problem tells us that the right thing to do is to run the optimal protocol for each of the two classes and have an adaptive splitting ratio for each link to use in allocating bandwidth to those two different classes of traffic. And so this we think was an interesting framework for what we call adaptive network virtualization where the network is running two customized protocols in parallel, but it's dynamically adapting what share of the bandwidth each link gives to those two classes of traffic to maximize their aggregate utility at the end. And so we think in general, this body of techniques that Mung and Steven and Frank have been doing over the years is really quite existing, it provide a way for us to think much more methodically about doing network protocol design and although there's certainly a place for engineering judgment and human intuition, the theory at least points us in the right way so that we hopefully make better decisions. And I'm sorry for running over. I'd be happy to take questions if I'm not already over too long. Thanks. [applause]. >> Jennifer Rexford: If you need to leave, feel free to [inaudible]