>> PHIL CHOU: So I'm very pleased to be able to introduce Professor Mung Chiang, electrical engineer from the Engineering Department at Princeton. Mung received his Bachelor's, Master's and Ph.D. degrees from Stanford and, despite the fact that he got his Ph.D. as late as 2003, he's already won numerous awards, not only the NSF CAREER Award and ONR Young Investigator Award, but Howard Wentz Junior Faculty Award from Princeton and also the Engineering School Teaching Commendation, Terman Award from Stanford, Hertz Foundation Fellowship, Stanford Graduate Fellowship, the TR35 Award, and others that I need not go through the entire list here. But he's well-known for his work in optimization and networking. And he's going to share with us some of his perspectives on that today. Thanks. >> MUNG CHIANG: Thanks, Phil. My pleasure to be here. And thanks for sparing me the embarrassment of going through, you know, a laundry list of boring things. Indeed, this is a talk about perspectives. I decided to give a talk here and several other places on something that, you know, is hard to say it in one paper. And I find these kind of venues the best places to describe those viewpoints. And the viewpoint today is what I call beyond optimality. If you think about optimization, that word applied to networking, people think: Well, the standard answer is I'm going to use it to compute some local global optimal, but it turns out there's a lot more you can do with optimization for communications networks. So we are going to see today things ranging from modeling to quantifying architectures, to build robustness against the stochastic dynamics, even as a feedback mechanism, not inside a network but in human engineering procedures. And finally also complex performance tradeoffs. So in a lot of these applications you don't see computation to the exact optimal being the Holy Grail of the subject. Instead you are going to do something like using optimization as a language, just like English or C, and use that language to describe, to talk about how you think about network engineering. You've seen a lot of these papers and a lot of these papers are becoming more and more boring today because the end is proving things like that -- subjective step size tuning and so on. And let's try to move beyond these and move to more challenging, fun stuff. This talk itself is an optimization problem where I try to minimize the amount of things that you can just get from downloading a paper. Subject to constraint of being somewhat self-contained and the variable is the content of the time. So fortunately, I got to do the optimization. You are going to suffer through the outcome of that optimization. I would like to acknowledge many wonderful collaborators within academia and in industry. So let's set the stage first. Using optimization as a language to talk about resource allocations. So in regard to constrained decision-making problem, think of that as an optimization. It's a data structure with four fields. One is variable inside the freedom you have and constants are those that you can not vary. You've got some goals and some constraints. So any time you have these four data structure, you are talking about an optimization problem. This sounds quite abstract in general, and we are going to illustrate in this talk three successful technology transfers into commercial systems through our industry collaborators on things stretching from broadband access to Internet back bone design. So what kind of objective functions are we talking about? Usually two types: Cost and utility. And they don't have to be additive. Let's say they are, okay. So this cost can depend on all kinds of freedom, like the sum of power. And utility is a funny thing. It is coming from economic literature, usually assumed to be increasing, concave and smooth. They don't have to be. Although increasings usually are pretty much true in all cases, they don't have to be concave but smooth. They are the math complement more tractable. Usually think of that as a functional through-put. Doesn't have to be. Could be a function of delay, energy and so on. Utility can be used to model a variety of things. Ranging from efficiency to traffic elasticity, for example more concave, more elastic. User satisfaction like MOS scores or voice and EU based on user perception experiments, and even fairness. Those of you who don't know, this is kind of interesting thing that you can quantify the notion of fairness among competing users in a resource allocation vector X, that is feasible by the definitions so could alpha fairness in economic literature. Say that axis alpha fair is a vector if any other feasible vector Y. This summation, the deviation normalized over X to power alpha is negative. So when alpha is zero, that's a very unfair allocation actually, that can lead to starvation. Alpha B1 is the famous case of proportional fair. Alpha goes to infinity. This tends to becoming a maximum fair allocation. So far that's the definition. Turns out that if you maximize a certain parameterized utility function, then the resounding optimizer would satisfy this definition and therefore people call those utility functions alpha fair functions. They are of this shape when alpha is not zero and log when alpha is zero. So this is by now very famous thanks to especially Frank Kelly's '97-'98 paper on proportional fair allocation. Turns out that there is a belief that the bigger alpha is more fair and that is true if you compare alpha one infinity. But anywhere in between, actually I haven't seen the paper saying why alpha 3.5 is more fair than alpha 3.1. So there needs to be some more asymtotic study on certain widely agreeable notions of fairness becoming a monotonic function of this parameter alpha. I haven't seen that. Also I haven't seen a rigorous study the effect on suboptimal solutions on your fairness gap. You can well define optimality gap, but, you know, what would they mean to say there's a fairness gap? That connection is still missing. But nonetheless, you have seen utility being used to model efficiency of allocation and fairness. But there are other things beyond performance in a standard way. For example, availability of service under a tack. Various aspects of security and various fuzzy concepts that are as important as performance like through-put delay. Can utility be used for those? It is still ongoing work by many various groups. But leaving constraints -- leaving objectives for a moment and coming to constraints. There are usually three types of constraints with phase. One is inelastic individual queues constraints. There are prior constraints to user need. Two is technological registry constraints ranging from the simple ones such as maximum power to much more interesting ones like net neutrality constraint. Three is feasibility and this is the tricky part. If it is coming from information theory background, it could be capacity region, if you know that region. King theory -- could be the stability of the queues, region for that. Or it could be the achievability region under particular physical phenomena and the particular physical layer constructs. For example, three typical phenomena in the network, congestion, collision and interference. And they are typically represented by additive Boolean and multiply indicative constraints. They have different degrees of freedom. And in some cases there's Pioneering work in the industry that defined a whole sector of industry followed by analysis or inspiration of seminal work like Kelly's '98 paper analyzing congestion control, to Tseudis Ephramedis (phonetic) '92 paper analyzing constraint queuing networks, maximum weight matching for through-put optimality or Foschini's work extending Qualcom's near problem solution for power control. In all three cases there is some very prototypical optimization problem being involved here, whether it's linearly constrained utility maximization, convex constraint. This is a weighted right maximization or SIR constraint power minimization. These three classical problems have been studied, you know, by thousands of papers over the years and interestingly, they all use feedback in a network, either implicit or explicit feedbacks. So you can start by modeling using optimization as a language if you specify objective constraints and freedom and constants. Or you can start from the other direction and say that give me a given protocol and I am going to try to reverse engineer the underlying problem being solved by this protocol. Sometimes it is optimization, like the social welfare system. Sometimes it's an individual selfish optimization interacting as a game. So this is the mentality of reverse engineering. In contrast to optimization of the network, this is optimization by the network. The enemies in the network act together as a distributed optimization machine. You may think I already have the solution. What do I care about knowing the problem? That's because often these solutions came from ad hoc engineering intuition based hacking and certainly have done a lot of impact and sometimes works very well; sometimes doesn't. So how can we understand them in a systematic way? How can we design huger ones in a rigorous foundation? And that's where forward engineering can follow reverse engineering. Here is a quick summary of our current understanding as a field of reverse engineering. What you don't see physically here because thanks to Shannon people pretty much know what they are talking about from day one, if day one is 1948. Okay? Capacity and rate distortion limits. But upper layers tend to be designed based more on engineering intuitions. However, most of them have been reverse engineered over the last five to ten years and shown to be implicitly solving an underlying problem. That is, you can view the trajectory of this given protocol as the trajectory of a certain algorithmic steps to resolving an underlying problem. Such as the basic narrow fraction maximization such as a stable path problem for an inter-autonomous system routing and some game threat reverse engineering for contention resolution systems like those in -So I'm skipping a couple of the slides because I'm trying to make sure I don't overrun by too much. So I just want to lay the foundation and show you some of the modeling typical frameworks and languages. And now I want to say something that is a little bit more ideas driven by the quest to understand architecture. Now, architecture is a big word, a fuzzy one. By that we mean more specifically here a foundationality allocation. That is, who should be doing what? And how to connect them, at what time scale. For example, how would you contain error, contain and correct error? You can do that at a physical layer with four error correction codes. You can do that with ARQ, hop-to-hop feedback. You can do that with multipath routing. You can do that at end-to-end level with C.R.C. check. You can do that narrow coding application level. So it is not clear exactly if that there are five people capable of doing the same job, how would you allocate the job? Divide it and let each person do part of it. Or how to resolve the bottleneck in terms of resources in network. You can do congestion control. Let others, let everybody back up with their transmission source rate. You can do alternative routing. Don't use this congested path. Use a different path. You can build a better pipe, set a power control maybe at the expense of hurting some other links the in the network. Or you can just be doing local neighborhood contention resolution so that they don't contend too frequently. Again, who should be doing what, at what time scale, how to glue them back together. Another way to ask the question is, if you are AT&T and Microsoft and Cisco or Qualcom all come to you, I thought this would be a private just casual conversation. It turns out I have to sign some release. I didn't even read what I was signing. So I'll probably, I should be very careful talking about things now. But say, you know, you are an operator and some big companies with different specialization of their domain come to you and say: You know what, your problem I can solve for you. You don't need to pay others. Maybe you can pay for connectivities and just the physical layer, but don't pay any premium for their value added because I add all the value that you need, right? In fact, this does happen a lot in the industry. So the question is, in some sense they are all correct. A lot of problems you can solve it at this layer or at that layer. You know, by this functional module or by that functional module. The question is, what is the most cost effective evolvable stable robust way. In other words, which stock to buy here, for example, say Qualcom. Maybe the right answer is in today's market, don't buy any, but I guess a better answer is to buy Microsoft stock for very obvious reasons. So I think in general this are many tasks you want to do and there are many degrees of freedom possible under control and this bipartigraph shows basically which degree of freedom can impact which task. So now we say oh, I want all these tasks to be done and I have some subset of these degrees of freedom. Traditionally residing in different so-called layers. Now, how should I find a matching in this conceptual bipartigraph. Now, if you think about architectural questions, it's been raised everywhere. For example, in our related disciplines such as information theory you see this kind of picture in page one of textbook. And Shannon showed that for certain models like point to point with no feedback and no complexity concerns, the job of data compression and job transmission can be separated without localities of optimality and that is a very powerful architectural statement. And it means that you can have a digital interface here and here, going from analog to digital and that would hurt you in certain models. It might hurt you in other models, but still it's an inspirational architectural statement. In control theory, in serial computation theory you see these pictures in page one or two of the textbook and they mean that these people in our related disciplines have thought long and hard about the kind of architectural statement that can be quantified. And they have become part of their blood, it's so natural to them now. Now, what do we have in networking? This is a standard picture you see, but this is a fuzzy picture because the boundary here is not clear what exactly do they mean. I mean, the boundary means that you don't get to see and you don't get to control everything. You rely on the service provided to you from the layer below. You provide a service to the layer above you. But what exactly is this boundary, right? What is the best way to do the divisions? It is not clear always. You can do things end-to-end. You can do things within a network. You know, sometimes you can just let end host be very intelligent. Sometimes you help with shaping and policing on the edge. Sometimes you start dropping packets here. Sometimes you let intelligence reside within the network. This is the wire line Internet. In a wireless Internet it works even more intriguing questions. And between control plane data plane, control plane making measurements and probing the data plane and just some computation, put the control signal back to data plane. What jobs should reside in data plane? What in control plane? In all these cases it's not exactly clear how do we design this job division. How do we divide up the job in the functionality allocation? So something that a lot of people have been working on is called layering, is optimization decomposition. Try to understand layering. Not as an ad hoc engineering artifact, but as something that pops out of a top-down principle-based design flow. Decomposition of an optimization model. It is not the exact value of the optimum solution that we really care about that much. It's the decomposition that matters most because that gives you the boundary of the functionality allocation. So the flows roughly goes like this. That you start modeling the whole network by a generalized narrow utility maximization. So the utility doesn't have to be exactly utility as a function of the through-put. It could be, as I just mentioned, a variety of different objective functions. And the constraint sets could be intervalue non-convex even with a lot of different degrees of freedom. Given that problem, you want to find decompositions that break it down into smaller problems with smaller number of variables, observing smaller number much constants, possibly solving for different objective function than the original gigantic mother problem. And now for a given decomposition scheme, and there are many schemes possible, we call it a layering architecture. And the decomposed sub-problems correspond to different layers. The functions of primal dual variables will be identified as the correct interfaces coordinating the layers in terms of what they can see and what they can do. And this decomposition happens both horizontally, meaning over disparate geographically located narrow elements, as well as vertically, meaning across the functional hierarchy such as the seven or however many layer SI stack. And then the glueing back together would happen at different time scales possibly and using either implicit or explicit message passing. Of course, you hope that the message passing will be simplified down all the way to only implicit message passing on a very infrequent basis. Sometimes you can do that while still hoping for optimality. There are basically three steps in this design flow now which is very much a first principle top-down approach. You declare what you want and what you can vary. And then you search for a solution architecture based on a particular decomposition. And finally, if time permits, you look for alternative architecture coming from alternative decompositions. So this framework has many, many papers that have different types of join this and that, that and this. In fact, if you look at the permutation in the seven, sum of -- choose I. I goes from 1 to 7. It's a pretty big number. Then for each such cross layer you can have like what, five different meanings of what you are crossing, in terms of what you can do and see. For each idea, there are about five different groups solving about the same time independently. For each group, they are going to publish somewhere between three and ten papers, say three. So you add up the numbers, it's a pretty big number and that's a very disturbing sign. It means you should get out of that field as soon as you can. (Chuckles.) >> MUNG CHIANG: So what we tried to do is to say it's not complicated at all. There are ten equal challenges, I am coming to that in a minute, that are true hard core challenges. But there are also just becoming a template you can start filling in very easily and underlying the template there are only two very, very simple things, conceptually very simple ideas one is to view the network as a an optimizer, as I just said. The other is to view the process of layering as decomposition. So if you strip down all the complexity and the, you know, the detailed variations, it boils down to something that is conceptually very simple. Now, you wonder, the decomposition thing, how do I do this decomposition? What I am going to detail that, there are many nice articles already in print. The standard technique is so-called dual composition or grungian (phonetic) relaxation. That's mostly widely used. There is also primary decomposition, punitive function that leads to possibly decomposition and there are different combinations you can do. Different hierarchical level. You can partially do it. And later do the rest of the coupling and you can pick different time scale choices. And it turns out that there are so many different combinations that we want to construct a user manual. So this is a collaboration with some other folks in Asia and Europe, trying to come up with a user manual. It's not just intellectual curiosity because it turns out that if you go for different compositions that can lead to different algorithms, each with different engineering implication as to what the result in protocol stack would do for you. And more interestingly, you can represent the same given generalized nonproblem by an alternative representation which certainly doesn't change to optimize the value but it does change the structure of the problem. For example, adding redundant constraint. Seemingly innocent but it may lead to opening the doors of even more decomposition possibilities. So you can parse it from back there, but basically this is a snapshot of this user manual constructing. This is president general template where we breakdown the procedure into three parts. First part you start with engineering descriptions. You end up with one generalized numb representation, one problem. Second block of procedure, you break that one problem by the composition methods into N smaller problems. Still no solutions, but you have N problems. Third block of procedure you find a distributed algorithm to solve. Hopefully, the way you decompose them is inducive to finding distributed algorithm to solve each N sub-problems. So you start from engineering description problem to a particular protocol stack. And this just shows one of the many papers that go through essentially this procedure with the particular path. That's highlighted in yellow here. And while we were constructing this Web site, we'll let you know when it's done and you can click on many, many papers and each paper is just one yellow path through the flow chart. So can we automate this in automate what? First, enumeration of possible decompositions. Two, comparison among the ones you've generated. Turns out this is very tough, okay? It's good to have a dream like that, keep people going. But this is very tough. First, enumeration. Exhaustively it's very, very challenging because alternative representation of the same problem can lead to further decompositions and some of them would degenerate into equivalent classes. Second, comparing. This is even tougher because compare against what metric? You know, speed of convergence is one. Robustness against many things. Among the locality and asymmetry message passing, trade-off between message passing and local computation. And even more fuzzier, metrics to compare with. A lot of these metrics of interest, we don't have the mathematical machinery to analyze or even to totally bound them. So it's very hard to even get each individual axis done, let alone do a trade off analysis to compare among alternatives. So far it's all just manually done, okay? You manually enumerate. You manually try to compare and see if you come up with one alternative architecture that makes more sense for your application. Now, decoupling is not just difficult because it's hard to enumerate and compare. Sometimes what you have, this optimization problem, is coupled in such a way that it's just hard to decouple it. I'll give you two very quick examples. Each of these two examples were started with industry collaborators and we're fortunate have industry people take on the analysis and then put it into some cases products. But I will not go through the details. That will be two separate talks. I'll just highlight one aspect which is decoupled is not always easy. So this is our first case of a DSL spectrum management. It's got some environment access using a combination of fiber, closer to the neighborhood, followed by copper wires. If you look close enough to central office, copper wire directly to you. This could be say ADSL, BDSL that uses discrete -- transmissions, kind of OFDM. And one question is, the copper lines are not fully twisted. So they still emit electromagnetic radiation against each other an at high frequency band, that interference becomes substantial. People over in that community call it cross talk. And your job is to decide what power spectrum density should you shape your transmission so that the cross talk doesn't hurt you too much and system-wide you want to find out for all the users, what is the right region that can be obtained? So for given fixed weights W, you want to maximize some of the weights, you can vary the weight and then you get the whole boundary of the weight region. The rate is the sum of the rate you get on each tone or carrier in there, by K, which assuming that interference treated as noise, log one plus the signal interference ratio, this is the N's user's signal to transmit power at a tone K. This is your interfering neighbors Ns and alpha NMK it is the cross channel, cross talk channel gain from user M to user N on channel K or tone K. This is some additive noise term. And the variable is the PN case. So this problem it turns out to be very challenging optimization problem. It's non-convex and it's coupled across both users and tones. So these are two different challenges. Convexity is the key to retractability towards proving global optimality, whereas coupling is the bottleneck to the distribution solutions. You have both bottlenecks sitting in front of you, well studied for many years with a revival in 1992 on this so-called dynamic spectrum management problem. So there are many different variants, many acronyms going on here. I'm going to show you a version that is in collaboration with Chaing, Huang, Rafael Cendrillon and Mark Moonen called ASB Autonomous Spectrum Balancing. Of course, when I show a comparison table like this I always make sure that it highlights the best part of our algorithm. So no surprise that this is the best. And our viewpoints, autonomous, low complexity, and near optimal performance. We do not have theoretical proof on optimality gap, worst case, but empirically we are running on very realistic industry grade simulators with very realistic tenant conditions and it always comes up to very near optimum. What is the basic idea here? Ignore all the rest, okay? I will only spend one minute on the idea and move on. The idea is the following. If you know congestion control literature Resendez (phonetic), you know there's the idea of pricing, congestion signaling. It's dynamic, realtime changing of information from the network to the sources so as to align the selfish interest into a social welfare maximization. Can we do that here? Not really because among the users they cannot easily do message passing. And they cannot easily do it with central office either for traditional reasons. So within the same user across all these 256 or 4,012 tones, you can do dynamic pricing. But across the users you can't do that. But if you only allow a selfish maximization it ends up in something called iterative water filling and that is often quite sub optimal. So the idea in ASP is to add a virtual line, a reference line sitting right next to you. And when you do a selfish rate maximization, you try to imagine somebody you need to protect. So what is the parameter describing this imaginary somebody I am protecting that is, you know, part of the details here? But it turns out that there are pretty robust ways to do that. So you're going to do a one-time hand shake with the central office to get the reference line parameters and that serves as what we call static pricing. And static pricing would have worked very badly except this is a DSL case. So the wires channel conditions do not time vary too frequently. So for static coupling, it turns out stack pricing works out pretty well. Okay? And let me just show you one of the representative plots here. You're looking at two out of 14 user trade off region. And this is the best rate region you can get in red. It's water filling, totally selfish gives you this. Excuse me. Autonomous spectrum balancing, the one I described, it turns out I give you very close to optimal solution. Now, if your reference line parameter is way off, you are going to get worse, but with many sensitivity analysis, you can see it in the paper as well, it still works remarkably robust way. So that's one sort of a newer technique being developed, static pricing for static coupling. Here is another. Different story. Power control single carrier, but wireless network. The channel conditions fluctuate quite fast. Say you look at two users. If this is the achievable rate region, again it doesn't have to be convex. If it's non-convex, that's another challenge. I'll come back to that towards the end of the talk. For now let's assume it's convex, but it's coupled. It's coupled in the following way. Your objective as a utility function depends on the power as well as the signal trend ratio, which is a function of powers. The constraint is that the SR assignments must be feasible. There are certain dots in the SR plane that are not feasible. There's no way you can have a power allocation that can give you that point because of interference. And you can vary both the power and the SR assignment so that you get to the best point on the boundary as defined by the intersection of the predetermined boundary with the red dotted curves which are the contour lines of the utility functions. So how did I get to this point? If you have a centralized the solver (phonetic) across monitoring all the cells in the neighborhood and controlling all the base stations, then you can solve a convex optimization here. If the utility is concave. And get to that point. But the challenge here is that each base station would like to mind its own business and you cannot easily coordinate across the base stations. You can allow base station talking to the cell phones, but you can't let all the base station become centrally coordinated by one guy. Then how can you get on to the boundary and then move to a point defined by the utility function as the best boundary point? Now, this problem has been long studied in different speaking cases. For example, famous special case. If you block the assignment from the variables, this is a constant given to you, fixed and assumed to be feasible. Then the problem of power control to achieve this SR target is well studied by, say, Foschini mogenic. When you bring them together it becomes much more challenging and people thought about how can I extend the Foschini mogenic power control? Well, over there, a key insight is that the whole project can be transformed into a linear algebra problem where you are essentially computing the right Pirin Gianini's (phonetic) eigenvector, What eigenvector is responding to the Pirin Gianini eigen value of a matrix describing that topology of the network. But if you want to do a mobile-based local update of these eigenvectors, it becomes very hard. For example, you can try new decomposition. At least we tried and it didn't work. But it turns out if you do a change of variable, changing the description of this boundary from the algebraic description based on right eigenvector to one based on left eigenvector. It turns out the left eigenvectors do have an ascend direction that is locally computable. Why that happened, I don't know. You know, we just run into that. It shows that a change of variable can reveal a hidden decompose ability structures that the original standard representation of the optimization problem I will now give you. So let me just scale up this. Those two examples just serve one purpose in this talk, which is decoupling is not always easy, but they are a case-by-case hard work that you can put in and hope that it helps. So I'm leaving behind this architecture story because it's kind of old story by now. I mean, three years ago it was a bit more fresh. But now it becomes talked enough, talked about enough that it becomes a little bit less interesting. So instead, let me step back and take a look at what exactly happened in 1998. Among other things, you know, Frank Kelly published this seminal paper on congestion control, Understanding Disreal Rate Allocation By Economic Model, or by this congestion pricing model. Now, compare that with Shannon 50 years before that. Before Shannon, people worked on communications as well, and they tried to design a finite block link code world that protected the signals through the channel. But Shannon says for one moment let's turn away our attention from constructing codes but look at what would happen if we had infinite asymptotically, infinitely long codes. Then the law of large number and large deviation theory kicked in and Shannon was able to provide both fundamental limits like capacity as well as architectural statement like this source channel separation principle. Of course, later people had to define finite block link code words and design a practical system. Now, before '98, people had been looking at the coupled queuing dynamics. How do I look at many routers cascaded with each other and the arrival of the packets are shaped by the previous routers. How does the queuing system evolve? That becomes very intractable. But Kelly says for one moment assume these were deterministic, fluid, flowing through the network. Then what would happen? Well, it turns out then optimization and decomposition view kicks in as well as control theoretic, game theoretic, the economic theoretic viewpoints. And now people understand network protocols not as ad hoc designs but as trajectories of certain dynamic control systems. And that's a profound realization. It's a profound viewpoint. Of course, later the stochastics have to come back. That's the subject of this third part of the talk, which is so what if we are facing the real world network where we don't have deterministic fluids flowing through it. Indeed, there are many levels of stochastic dynamics. For example, session level. Are users, logical users arrive with finite workload. They don't stay there forever with the static population. They have done their job. They leave. Packet level, packets arrive in the same session in bursts, and they suffer probabilistic events along the way. Channel fluctuate topology may even vary. For each of these levels of stochastic dynamics, we would like to understand many things. For example, it is still the deterministic formulation valid in the sense of giving the right engineering intuition? If we do that, can the queues remain stable? What about probability distribution on performance like delay? Maybe you don't know the distribution, but you may know that some statistics like the average, the outage, you know, what would happen to fairness if you have like short flows coexist with long flows with a different lifetime and so on? It turns out that this big table has many blank spots. Somewhat arbitrary grading scale, I guess. Three star would mean completely done. Don't even try to work on that anymore. Blank is sort of like nobody has a really conclusive starting point. Two star, there are a couple of bright spots. Let me zoom into one bright spot and then ignore the rest of the table today. So this is the bright spot of session level stability. What does that mean? So look how the utility functions, the utility maximization problem. Now there's dynamic use of population with arrivals and departures so S denotes the class of the users. And N denotes at time T what is the number of active flows or active sessions in that class and phi is the resource that belongs to a certain constrained set. It could be a timing variant polito (phonetic), for example. It could be a time varying non-convex set. It could be something simple to something very horrible. And you get to do processes sharing among the flows that belong to the same class and this is the problem you are trying to solve per time instant T. So departure clearly are governed by the service which depends out of this constraint maximization problem. Arrivals soon to be independent of the network state. Suppose they are paused on arrival with exponential file size distribution which is not true, but what is true it turns out to be not tractable. We have to say things that we can say in the paper. Let's say exponential distribution. All right. Then we've got a Markov chain. The state evolves like this. It goes up with this rate. It goes down with this rate. Now, the question is: What is the distribution of the buffer occupancy? It turns out that's too difficult a problem. Forget about that. Just ask about stability. For example, queue or rate stability, depending which one you pick. Of this queuing neck this is Markovian input, but a State dependent service rate queue E network. The State dependent precisely through the solution of this optimization problem which in turn depends on the state of the queues in here. So there is a coupling between the deterministic optimization at each time and evolution of the queuing dynamics. So the answer is the queues will be stable if and only if something, right? Well, we know how the necessary condition that the average intensity of the arrival must be within the constrained set of the deterministic optimization. So the real question is, is that also sufficient condition for the queue stability of the security network? And the answer is yes in many, many, many, many cases. So this is a snapshot. It's not a comprehensive list of the papers. The answer is yes, for the case of this kind of arrival, this kind of topology, the utilities being the same across users or not and the shape of the utility functions. Okay? However, the general question, general arrival, meaning doesn't have to be Poisson with the exponential file size. General topology, possibly all kind of concave utility function. What is the sufficient condition for stability? That is open. Here is something else that is recently involved by group and collaborators. So instead of looking at the arrivals being more general, we look at the constrained set being more general. So the standard thing is timing variant convex. But in many cases, especially in wireless, you have a non-convex rate region. For example, you are doing random access-based allocation. You can have time variation of the rate. So the question is, in this case people have proved under those benign mild arrival cases, that the stability region of the deterministic numb solution approach of resource allocation coincides with the rate region or the constrained set of the deterministic problem. Furthermore, that is the largest possible region you can ever get by whatever resource allocations. It's the maximum stability region. Furthermore, obviously it doesn't depend on the curvature of your utility shape. So none of those three statements are true, it turns out, if you have a non-convex or time varying rate region. That the resulting stability region may be smaller than the actual rate region of the determine stick problem. That it depends on the curvature of the utility function. And that it is not necessarily the maximum stability region anymore. In fact, there may be an even trade-off between how fair you are, if you believe bigger alpha is more fair, versus how large your stability region is. There may be a strict trade-off there. So I just want to illustrate that even in this one-one corner of this gigantic table on stochastic utility maximization, there are either surprises or big open problems not solved. Let me skip those couple slides and then move on to, this is the last part, where I want to talk about another major challenge that we face. And it's a big big challenge. I talked about the challenge of decoupling. And you have to do case-by-case hard work. We talked about the challenge of stochastics back into the deterministic fluid model. And there are many work that remains to be done in order for the theory to have a direct impact on the ream dynamic network. And this is on non-convexity of your optimization model. And I'm sure most of you know very well that the watershed between easy and hard optimization problem quoting Rockefeller is not in the linearity but in the convexity. So what happens, it is non-convex problem. I hate to invent more acronyms. This stands for design for optimizability, but it's such a cool one, it's hard to resist putting it there. What does that mean here? So again, let's take a step back and look at what kind of non-convex problems you can encounter. They rise up in all kinds of corners for different reasons. Like you can have realtime application that gives you a sigmoidal utility function. You can have power control, low-SR, you can have many cases where the constrained set is non-convex. You can have integral constraints or it can have a convex constraint but is going to take exponentially long description length to describe it. So that is still very hard, like certain scheduling problems where the number of activation sets is exponential in the number of nodes. So whenever these happens and you know you are in big trouble. So now what can you do? Mathematically, one thing that is good is that convexity, unlike say uniqueness of KKD points is not an in varying quantity. so you can do a nonlinear change of variable, for example. Like law change of urban geometric (phonetic) programming or embedding in a hard dimension like sum of squares method, that gives you the convex geometry, even though it started out as being non-convex. But I am not going to talk about that today. Instead, I'll advocate a more engineering approach. A fancy name is called design for optimizability, a more straightforward name is that ask your neighbor to change his assumption. Or to change his design so that you work on different assumption. And I'll illustrate that in an example of Internet traffic engineering. But I still want to give you some pictures about perspectives, okay? Say this is non-convex optimization problem and there are three broad approaches of attacking this problem. One is what I call go around non-convexity, like you change a variable, you find conditions under which it is convex or under which KK point is unique, and so on. The other is to be brave and just sale through it. Things like some obscure signal programming, in general some kind of success of convex approximation to solve the problem. Or brunt and bound, maybe in a very smart way. Or use structures like differences of convex and generalized plause of convex and concavity and so on. Here is the third way. What I call going above non-convexity. It comes out and it goes back in there. It says don't solve difficult optimization problem. Try to redraw the architecture protocol to make the problem easy to solve. Of course, you may have to pay a price for that. And that's the trade-off to be aware of. So in this case, optimization is not being used as a computational tool. Not even used as a modeling language or architectural understanding. It is actually used as a flag to say that certain design issues may be flawed because if you can help it, you shouldn't be tackling a difficult problem. Like if you are in the structure, you are the instructor of the course and the student at the same time, you shouldn't give yourself a hard midterm. Let me illustrate this with the last case, which is on Internet traffic engineering. Within an autonomous system. So I would like to, you know, unlike what some people have impression like myself before collaborating with people working on this subject, I thought, oh, how bad could that be, the shortest path routing? It turns out it's really not shortest path routing. It's sort of the reverse of that. So the operator, say, NT, would every now and then update a number per link called the link wait. And from control plane to data plane, give this link weight to the routers and the routers would use link weights to construct the forwarding path. And then when a packet comes, you have a header, the destination, address and then forward the packet. So this is what they call link weight based, hop-by-hop destination based packet forwarding. So it's a long phrase. And they want to do this so as to load balance the traffic in the procedure called traffic engineering. So there's a centralized computation to set the link weights of the distributed usage of the link weights to do packet forwarding based on destination only. OSPF is a major class, a major member of the class of such routing protocols. But people know that if you want to do best traffic engineering OSPF will have to require exponential complexity. What is the best traffic engineering? So operator assigns a metric like the cost of a link which is a function of the utilization percentage. And usually this is modeled after some queuing delay formula. And you want to minimize the sum across all the links. Or you want to minimize the maximum utilized links appearance. Or you want to delay the appearance of the congested link to as late as possible. So whatever metric you pick, there's a certain objective function there. But the variables are the link weights. So this is weird, right? The knobs are the link weights which you tweak. Tweaking the knobs would force the routers to allocate traffic in a different way by forwarding packets in different ways, which will result in distribution of traffic different way, which changes your objective functions value. But the link weights do not directly change the objective function. So your monitors here, your knob is there and you have to indirectly influence that. And that's kind of weird, but that is how Internet intra AS routing is being done today. So we wonder, you know, how can this problem of OSPF not being able to give you best traffic engineering in an efficient way. Don't, you know, even try to read this. It's a long list of a partial, actually very partial list of outstanding work that has been done on the subject of influential paper was in 2000 by Fortz and Thorup to show that it's empty hard to do best traffic engineering by OSPF and you have to resort to local search methods which is, you know, set the industry standard for many years. Around the same time, actually before that, people started looking at something called MPRS, which enables you as a forwarding mechanism to do multi-commodity flow routing, which is what mathematically nicer tractable convex optimization solution. So that is good, but MPRS forwarding requires end-to-end tunneling. And that requires keeping a spatial memory. And that is not the philosophy of hop-by-hop destination only forwarding. That may be one of the reasons why still people in the industry actually running the network love OSPF more than MPRS. Now the question is, can we get MPRS like performance without MPRS complexity, without entering internally. You still do local hop-by-hop forwarding of packets, and yet be able to get to optimal traffic engineering. Is that possible or not? And we recently showed that it is possible. In fact, it is coming 08. There is an 07 paper that has been superseded by an '08 paper. The idea of the protocol is extremely simple. The idea of the proof actually is quite interesting. The protocol says the following: All right, you are going to use link weights computer path and decide traffic explaining pattern, but unlike OSPF, do not only construct shortest path and evenly split the traffic among the shortest path, but instead use all of the path, but put exponential penalty on longer path. Why exponential penalty? I'll come back to that in a minute. Now, this doesn't change any way you do the destination based packet forwarding. However, it does change how does AT&T or any operator does this central computation of the link weights, because the link weights are now being used differently by the routers that would mean it has to be computed in a different way by the operator. So the question is, could this new protocol be really good? And if so, how can the computational link weights be done? Let's answer the first question first. One way to measure if this routing protocol is good or not is to compare with the benchmark of optimal traffic engineering, say achieved by solving multiple flow problem realized by MPRS kind of forwarding. And Y axis is normalized to one. What is that? It means that you increase the traffic load into a sum point for a given routing protocol. There will be a congested link and you cannot increase anybody further. Stop and then calculate the link utilization. For optimal traffic engineering, set that as the benchmark 1, okay? Now you run a different routing protocol, presumably not as good. Then by the time you stop, the link utilization will be lower. And indeed if you run OSPF you are going to suffer some hit. Sometimes big, sometimes small, but different kind of network. This is Abilene network with search and traffic matrix recorded on a certain day and year. These are pretty standard artificially generated network topologies that this community often uses. But if you run what we call DAFT or a sister version called PEFT, whatever they stand for, essentially this exponential penalty area, numerically you see it's almost always one. In fact, we now can prove that it is one; that it will achieve optimal traffic engineering exactly. Another way to measure it is to look at the optimality gap which is the sum of the cost function, usually modeled after the MM1 queuing delay over all the links inside the network. So these pictures look more impressive because DEFT is almost zero and then the OSPF shows through a very high value. But this really is more of the artifact of the MM1 delay based penalty function more than anything else. So you know, if we want to get funding from certain agency, we would like to show this picture first. You see, you know, the network will explode if you don't do it our way. But really, this is more meaningful one which means saying Abilene topology, it gives you about 15 percent improvement just by being smarter, owing how you use the link weights. So the theorem is that link based routing, link State Routeing with hop-by-hop destination based forward can achieve optimal traffic engineering and furthermore the associate optimal weights computed by the operator can be done in polynomial time. In practice, it turns out that it is 2000 times faster than the existing local researching algorithms for OSPF link weights. That means not only is it better, it's also provably better. Not only it is faster in theory, it's much faster in practice compared to local search heuristics of OSPF. How do we prove that? One picture to summarize that. Look at this, the set of all feasible flow routes, okay? One point is one flow routing vector. And a subset of that is optimal with respect to your traffic engineering objective. But not all the members in this subset can be realized by link state routing, with hop-by-hop forward. Our job is to pick out those that are not only optimal with respect to the objective function of your traffic engineering, but also realizable with link state routing. So how can I pick out those configurations? It turns out that we can construct another optimization problem purely as a proof technique for whatever constrained set, where the objective function is an entropy function. It's the entropy of a probability vector that describes the probability of going from your one node in the network to the final destination through a different path. And they normalize with total traffic intensity. You get a probability vector, look at a certain normalized entropy function of that. Optimizing that, normalizing that maximum entropy function over possible configuration subject to a certain constrained set turns out will give you the animus in this subset. And the resulting cancellation of the terms is what leads to this exponential penalty. So the exponential penalty is a direct consequence of this proof technique here. It turns out we can also prove the other way around, that the entropy function is the only function subject to scaling, shifting, that can give you this picking ability. And exponential penalty is the only penalty. You can try others, any monotonic function. They will not be able to give you proof of optimal traffic engineering. So the details again are all in papers you can download. If you remember the DSL study, we talked about the coupling side. There is another side which is the non-convexity side. It turns out that somehow it performs very well, right? You probably should have asked the question at least in your own mind at that point which is: All right, I can see the decoupling using static pricing, but how come that we achieved rate regions so close to optimal? Isn't that in non-convex problem? It turns out that somehow the hard problems in that instance aren't hard in reality. By reality I mean what? I mean that you can give me a DSL spectrum management problem as a mathematical problem and then you can give me some description of the constant parameter of the cross channel coefficients. such that any sub optimal heuristic can work very poorly. That's mathematically doable. But these cross talk coefficients are not mathematical constructs. They are a representation of the physical reality of the law of electromagnetic radiations. So somehow in this tough landscape of non-convex problems, so imagine giving a non-convex partner who has a local landscape with a local maximum, global maximum, you know, local minimum and global minimum and so on. The instances that are meaningful, those parameters will lead to a terrain of this problem where the location number and value of the local maximum are not that bad compared to the global maximum, as to why that is the case, can we prove a statement such that for a certain subset of these crossing coefficients not only can we prove convergence which we have done, but we can also prove optimality all together. We wish we will be able to do that. We are not yet. And this routing story just shows that sometimes hard problems don't deserve to exist all together. In this case, it's not coupling of electromagnetic radiation that is bothering you. It's this assumption that you have to pick only shortest path and equally split traffic among them. Given the link weights, there's no reason why you have to use link weights in that particular manner. So that engineering assumption actually does not match optimal traffic engineering. No wonder it didn't work in a polynomial time before. So instead of going on from a restrictive assumption to some intractable formulation, forget about a path. Try to be brave with non-scalable solution. Maybe we should just take a minute to say the fact that this optimization coming out of an engineering assumption becomes intractable is a signal, a flag that tells me maybe the assumptions can be perturbed. For example, in this case relax to allowing whatever monotonic penalty function, turns out the exponential penalty function works great, such that I end up with attractable formulation and a scalable solution. So this is what we call a feedback, not in the network but in the human thinking procedure, in engineering process. This feedback is very important. The next time maybe looking at a hard problem, that should be alerting you to knock on the door of your neighbor and say ten years ago you did something, you made a decision for no good reason, or for some good reason. But now there are other reasons I may supersede that reason. And maybe we can talk about changing your decision now. So most likely you have to pay a price for revisiting the assumptions because they are often made with good engineering intuitions, not for zero reason. But in this case it turns out you are not really paying much price at all. You can provably get optimality and it remains as simple as OSPF. You have to do more computation at the operator and router, but those computations are cheap. The key thing is you don't need tunneling. You don't need spatial memory. A little price you have to pay is an out of order, packets arrival. Though that can be also be dealt with in different ways. So I am going to skip this complexity part. It turns out there is some interesting story recently been developing on scheduling where we look at the trade-off between complexity and optimality in the three dimensional space, but let me conclude this talk here. Back to the first slide where I said that when you look at optimization of network, if you can please do not -just think about that as computational, too, to help you solve a minimization maximization problem even though that is still a useful thing to have and sometimes it does happen as well. But think of that more as a way to do modeling, whether it's forward or reverse engineering, as a way to quantify mathematically architectural decisions and statements, as a way to connect to robustness, as a way of giving feedback to the engineering assumptions. So if you take away optimality, what you have left with optimization, think you have something that is much more open minded and much more powerful. Optimization for networking as a language to think about choices and decisions in network engineering. With that I'll conclude. Thanks for attention. (Applause.) >> MUNG CHIANG: Questions? Yes? >>: I have a question. So you mentioned that you challenge the assumption that the conventional assumption where you shortest path route and -- and in trying to move the script that you want to find the shortest past, what is the resist? >> MUNG CHIANG: Because it is not matching the definition of optimal traffic engineering, which is to minimize -- I didn't show you. I perhaps should have -- minimize the sum of the cost function overall the links. The objective is not to compute the shortest path. It is to minimize the total congestion cause function across the entire network. So you really, what you want to do is load balancing. Shortest path is a very nice load balancing, but will that be able to give you optimal load balancing? People had hoped before 2000 for the Fortz and Thorup paper showed that not really. Unless you are willing to have exponential complexity of computing the link weights. So polynomial time complexity of computing the link weights such that the resulting link weights if used only to do shortest path routing cannot give you a load balancing that is 100 percent perfect with respect to congestion cause function. And that's what they did in 2000. So then people went to the other extreme and start feeling that it is hopeless, it's impossible to have -- true, the answer is OSPF cannot give you that. That doesn't mean that any other links they route in port cannot give you that either. So in fact there were paper -- for example, I decided paper in 2005 which inspired our work that shows what if you try other penalty functions? Not just Boolean penalty because shortest path is Boolean penalty, right? You are a shortest path or you are not. If you are, you know, you count how many you are, divide by that number and that's the traffic splitting. If you are not, you get zero loaded on you. People have started thinking about maybe I do a penalty function one over the sum of the path cost. Link path weight. Or maybe exponential penalty or maybe other function. And what we did is to show that based on those ideas we can refine the ideas to the degree that it is actually provable, that one of the, these alternative penalty functions can give you exactly the optimal traffic engineering. So to answer your question in short, because load balancing in the meaning of optimal traffic engineering is not equivalent to shortest path. Yes? >>: -- basic function -- is the sum of the delay or the condition on HS or the maximum? >> MUNG CHIANG: So, it turns out that it can be any convex function composed -- so each link you look at the convex function of the link utilization percentage. For example, piece was linear approximation of the MM1 delay as a function of the utilization. And then look across all the links. You can pick any convex function. Sunk is one, max is another of those convex functions, composed with those functions and that is the cause function. >>: But this exponential penalty function you can optimize any -- >> MUNG CHIANG: -- any of these convex, as long as it's a convex function composing with individual convex functions. In practice, what operator used is the sum of the convex caused functions. >>: What is this, what do you mean by this exponential penalty function? So let's say you know like the weight of the path. >> MUNG CHIANG: So I've got two paths, you know. One is path, the path weight is ten, the other is path weight is 20. So this is E to the minus 10 divided by E to the minus 10 plus E to the minus 20 and the other is E to the minus 20 divided by E to the minus 10 plus E to the minus ten. >>: If you want to do the next problem, if -- so if you can, true, if it's MPLS, you can choose the path, you can choose what fraction to be send on each path. >> MUNG CHIANG: Yeah. >>: But if you want to view the next hop, you have to like aggregate these for next hop. >> MUNG CHIANG: So it turns out that you can do a local decision only looking to the destination without looking back to the source. And then you can make this local decision about path splitting in the DEFT or scheme that we proposed. And that is one trick that I didn't go into the detail. But indeed, if you need end-to-end enumeration of all the path, that would have been just like MPRS. No, you don't need to do that. You only need from you forward to destination. In fact you don't even need to enumerate all the possible path. There is a recursive computation in our paper that does that. >>: You can simulate like all these different paths by ->> MUNG CHIANG: Essentially, yes. You can simulate the solution to this, what we call network entropy maximization that leads to all these cancellation of the terms, based on a recursive counting of the alternative path, looking only forward to the destination without spatial memory. Yes? >>: So I'm thinking so this Markov computing OSPF weights to new trafficking ->> MUNG CHIANG: Yes. >>: That started with -- it seems where they were coming from, they have some legacy routers and PLS was recognizing the new technology. So they wanted, you know, to entropy was a way -- to do MPLS type of traffic engineering. But what I'm trying to understand here, MPLS buys you a lot of flexibility. You can do traffic engineering, four class of traffic. You can do things like obvious routing when your traffic is variable. So is this like, is there easy answer to say I can do away with MPLS because I can solve a certain class of problems using Morris Peer (phonetic) authority? >> MUNG CHIANG: Sure. two things. First of all, my understanding, and I could be totally wrong is that MPLS in the '90s invited a lot of excitement, but then people started feeling that end-to-end memory is not the way to go forward with legacy and Fortz and Thorup showed that while OSPF can't do it exactly right, but here local search method that gets very well. Now, second, does this mean, therefore, that with DEFT and PEFT, MPLS will never be useful again? No, because DEFT and PEFT can proveably get up to the optimal traffic engineering. We go back to that objective function. Now, your objective function changes, for example, to be, to the ideal or to the notion of being flexible, however that might be mathematically represented or not, all right? Then clearly you are dealing with a different optimization problem. The flexibility, whether in OW routing, QS routing, routing with a delay and so on and so forth, MPLS being an end-to-end tunneling mechanism, gives you a lot more power, a lot more freed only. Some of these power may not be captured by any link State Routed protocol including PAFT, DAFT or MPLS. Indeed if you say I don't care about optimal traffic engineering, I care about many other things as well, there are also some operative function, then there's been no study, because very recent whether DAFT, PAFT can do that, and I suspect they can not do them all because they are relying on a very simple description. Each link has one number and the very simple decision rule. Only destination based hop-by-hop forwarding. I really doubt if with such limited gun power you can kill all these beasts, you know, of the obvious routing, QS routing. I have big doubt about that, but nobody has looked at that. The answer is indeed, MPLS still has its merits. Whether it will be widely used or not will depend on factors outside technology. >> PHIL CHOU: Any other questions? (There is no response.) >> MUNG CHIANG: Okay. Thank you. (Applause.)