Document 17863915

advertisement
>> PHIL CHOU: So I'm very pleased to be able to introduce Professor Mung Chiang, electrical engineer
from the Engineering Department at Princeton. Mung received his Bachelor's, Master's and Ph.D. degrees
from Stanford and, despite the fact that he got his Ph.D. as late as 2003, he's already won numerous
awards, not only the NSF CAREER Award and ONR Young Investigator Award, but Howard Wentz Junior
Faculty Award from Princeton and also the Engineering School Teaching Commendation, Terman Award
from Stanford, Hertz Foundation Fellowship, Stanford Graduate Fellowship, the TR35 Award, and others
that I need not go through the entire list here.
But he's well-known for his work in optimization and networking. And he's going to share with us some of his
perspectives on that today. Thanks.
>> MUNG CHIANG: Thanks, Phil. My pleasure to be here. And thanks for sparing me the embarrassment
of going through, you know, a laundry list of boring things.
Indeed, this is a talk about perspectives. I decided to give a talk here and several other places on something
that, you know, is hard to say it in one paper. And I find these kind of venues the best places to describe
those viewpoints. And the viewpoint today is what I call beyond optimality. If you think about optimization,
that word applied to networking, people think: Well, the standard answer is I'm going to use it to compute
some local global optimal, but it turns out there's a lot more you can do with optimization for communications
networks.
So we are going to see today things ranging from modeling to quantifying architectures, to build robustness
against the stochastic dynamics, even as a feedback mechanism, not inside a network but in human
engineering procedures.
And finally also complex performance tradeoffs. So in a lot of these applications you don't see computation
to the exact optimal being the Holy Grail of the subject. Instead you are going to do something like using
optimization as a language, just like English or C, and use that language to describe, to talk about how you
think about network engineering.
You've seen a lot of these papers and a lot of these papers are becoming more and more boring today
because the end is proving things like that -- subjective step size tuning and so on. And let's try to move
beyond these and move to more challenging, fun stuff.
This talk itself is an optimization problem where I try to minimize the amount of things that you can just get
from downloading a paper. Subject to constraint of being somewhat self-contained and the variable is the
content of the time.
So fortunately, I got to do the optimization. You are going to suffer through the outcome of that optimization.
I would like to acknowledge many wonderful collaborators within academia and in industry.
So let's set the stage first. Using optimization as a language to talk about resource allocations. So in regard
to constrained decision-making problem, think of that as an optimization. It's a data structure with four fields.
One is variable inside the freedom you have and constants are those that you can not vary. You've got
some goals and some constraints. So any time you have these four data structure, you are talking about an
optimization problem. This sounds quite abstract in general, and we are going to illustrate in this talk three
successful technology transfers into commercial systems through our industry collaborators on things
stretching from broadband access to Internet back bone design.
So what kind of objective functions are we talking about? Usually two types: Cost and utility. And they don't
have to be additive. Let's say they are, okay. So this cost can depend on all kinds of freedom, like the sum
of power. And utility is a funny thing. It is coming from economic literature, usually assumed to be
increasing, concave and smooth. They don't have to be.
Although increasings usually are pretty much true in all cases, they don't have to be concave but smooth.
They are the math complement more tractable. Usually think of that as a functional through-put. Doesn't
have to be. Could be a function of delay, energy and so on.
Utility can be used to model a variety of things. Ranging from efficiency to traffic elasticity, for example more
concave, more elastic.
User satisfaction like MOS scores or voice and EU based on user perception experiments, and even
fairness. Those of you who don't know, this is kind of interesting thing that you can quantify the notion of
fairness among competing users in a resource allocation vector X, that is feasible by the definitions so could
alpha fairness in economic literature.
Say that axis alpha fair is a vector if any other feasible vector Y. This summation, the deviation normalized
over X to power alpha is negative.
So when alpha is zero, that's a very unfair allocation actually, that can lead to starvation. Alpha B1 is the
famous case of proportional fair. Alpha goes to infinity. This tends to becoming a maximum fair allocation.
So far that's the definition. Turns out that if you maximize a certain parameterized utility function, then the
resounding optimizer would satisfy this definition and therefore people call those utility functions alpha fair
functions. They are of this shape when alpha is not zero and log when alpha is zero.
So this is by now very famous thanks to especially Frank Kelly's '97-'98 paper on proportional fair allocation.
Turns out that there is a belief that the bigger alpha is more fair and that is true if you compare alpha one
infinity. But anywhere in between, actually I haven't seen the paper saying why alpha 3.5 is more fair than
alpha 3.1.
So there needs to be some more asymtotic study on certain widely agreeable notions of fairness becoming a
monotonic function of this parameter alpha. I haven't seen that. Also I haven't seen a rigorous study the
effect on suboptimal solutions on your fairness gap. You can well define optimality gap, but, you know, what
would they mean to say there's a fairness gap? That connection is still missing.
But nonetheless, you have seen utility being used to model efficiency of allocation and fairness.
But there are other things beyond performance in a standard way. For example, availability of service under
a tack. Various aspects of security and various fuzzy concepts that are as important as performance like
through-put delay. Can utility be used for those? It is still ongoing work by many various groups.
But leaving constraints -- leaving objectives for a moment and coming to constraints. There are usually three
types of constraints with phase. One is inelastic individual queues constraints. There are prior constraints to
user need.
Two is technological registry constraints ranging from the simple ones such as maximum power to much
more interesting ones like net neutrality constraint.
Three is feasibility and this is the tricky part. If it is coming from information theory background, it could be
capacity region, if you know that region. King theory -- could be the stability of the queues, region for that.
Or it could be the achievability region under particular physical phenomena and the particular physical layer
constructs. For example, three typical phenomena in the network, congestion, collision and interference.
And they are typically represented by additive Boolean and multiply indicative constraints.
They have different degrees of freedom. And in some cases there's Pioneering work in the industry that
defined a whole sector of industry followed by analysis or inspiration of seminal work like Kelly's '98 paper
analyzing congestion control, to Tseudis Ephramedis (phonetic) '92 paper analyzing constraint queuing
networks, maximum weight matching for through-put optimality or Foschini's work extending Qualcom's near
problem solution for power control.
In all three cases there is some very prototypical optimization problem being involved here, whether it's
linearly constrained utility maximization, convex constraint. This is a weighted right maximization or SIR
constraint power minimization. These three classical problems have been studied, you know, by thousands
of papers over the years and interestingly, they all use feedback in a network, either implicit or explicit
feedbacks.
So you can start by modeling using optimization as a language if you specify objective constraints and
freedom and constants. Or you can start from the other direction and say that give me a given protocol and I
am going to try to reverse engineer the underlying problem being solved by this protocol. Sometimes it is
optimization, like the social welfare system. Sometimes it's an individual selfish optimization interacting as a
game.
So this is the mentality of reverse engineering. In contrast to optimization of the network, this is optimization
by the network. The enemies in the network act together as a distributed optimization machine.
You may think I already have the solution. What do I care about knowing the problem? That's because
often these solutions came from ad hoc engineering intuition based hacking and certainly have done a lot of
impact and sometimes works very well; sometimes doesn't.
So how can we understand them in a systematic way? How can we design huger ones in a rigorous
foundation? And that's where forward engineering can follow reverse engineering.
Here is a quick summary of our current understanding as a field of reverse engineering. What you don't see
physically here because thanks to Shannon people pretty much know what they are talking about from day
one, if day one is 1948. Okay?
Capacity and rate distortion limits.
But upper layers tend to be designed based more on engineering intuitions. However, most of them have
been reverse engineered over the last five to ten years and shown to be implicitly solving an underlying
problem. That is, you can view the trajectory of this given protocol as the trajectory of a certain algorithmic
steps to resolving an underlying problem. Such as the basic narrow fraction maximization such as a stable
path problem for an inter-autonomous system routing and some game threat reverse engineering for
contention resolution systems like those in -So I'm skipping a couple of the slides because I'm trying to make sure I don't overrun by too much.
So I just want to lay the foundation and show you some of the modeling typical frameworks and languages.
And now I want to say something that is a little bit more ideas driven by the quest to understand architecture.
Now, architecture is a big word, a fuzzy one. By that we mean more specifically here a foundationality
allocation. That is, who should be doing what? And how to connect them, at what time scale.
For example, how would you contain error, contain and correct error? You can do that at a physical layer
with four error correction codes. You can do that with ARQ, hop-to-hop feedback. You can do that with
multipath routing. You can do that at end-to-end level with C.R.C. check. You can do that narrow coding
application level. So it is not clear exactly if that there are five people capable of doing the same job, how
would you allocate the job? Divide it and let each person do part of it.
Or how to resolve the bottleneck in terms of resources in network. You can do congestion control. Let
others, let everybody back up with their transmission source rate. You can do alternative routing. Don't use
this congested path. Use a different path. You can build a better pipe, set a power control maybe at the
expense of hurting some other links the in the network. Or you can just be doing local neighborhood
contention resolution so that they don't contend too frequently.
Again, who should be doing what, at what time scale, how to glue them back together.
Another way to ask the question is, if you are AT&T and Microsoft and Cisco or Qualcom all come to you, I
thought this would be a private just casual conversation. It turns out I have to sign some release. I didn't
even read what I was signing. So I'll probably, I should be very careful talking about things now.
But say, you know, you are an operator and some big companies with different specialization of their domain
come to you and say: You know what, your problem I can solve for you. You don't need to pay others.
Maybe you can pay for connectivities and just the physical layer, but don't pay any premium for their value
added because I add all the value that you need, right?
In fact, this does happen a lot in the industry. So the question is, in some sense they are all correct. A lot of
problems you can solve it at this layer or at that layer. You know, by this functional module or by that
functional module. The question is, what is the most cost effective evolvable stable robust way.
In other words, which stock to buy here, for example, say Qualcom. Maybe the right answer is in today's
market, don't buy any, but I guess a better answer is to buy Microsoft stock for very obvious reasons.
So I think in general this are many tasks you want to do and there are many degrees of freedom possible
under control and this bipartigraph shows basically which degree of freedom can impact which task.
So now we say oh, I want all these tasks to be done and I have some subset of these degrees of freedom.
Traditionally residing in different so-called layers. Now, how should I find a matching in this conceptual
bipartigraph. Now, if you think about architectural questions, it's been raised everywhere. For example, in
our related disciplines such as information theory you see this kind of picture in page one of textbook.
And Shannon showed that for certain models like point to point with no feedback and no complexity
concerns, the job of data compression and job transmission can be separated without localities of optimality
and that is a very powerful architectural statement. And it means that you can have a digital interface here
and here, going from analog to digital and that would hurt you in certain models. It might hurt you in other
models, but still it's an inspirational architectural statement.
In control theory, in serial computation theory you see these pictures in page one or two of the textbook and
they mean that these people in our related disciplines have thought long and hard about the kind of
architectural statement that can be quantified. And they have become part of their blood, it's so natural to
them now.
Now, what do we have in networking? This is a standard picture you see, but this is a fuzzy picture because
the boundary here is not clear what exactly do they mean. I mean, the boundary means that you don't get to
see and you don't get to control everything. You rely on the service provided to you from the layer below.
You provide a service to the layer above you. But what exactly is this boundary, right? What is the best way
to do the divisions? It is not clear always.
You can do things end-to-end. You can do things within a network. You know, sometimes you can just let
end host be very intelligent. Sometimes you help with shaping and policing on the edge. Sometimes you
start dropping packets here. Sometimes you let intelligence reside within the network. This is the wire line
Internet. In a wireless Internet it works even more intriguing questions.
And between control plane data plane, control plane making measurements and probing the data plane and
just some computation, put the control signal back to data plane. What jobs should reside in data plane?
What in control plane?
In all these cases it's not exactly clear how do we design this job division. How do we divide up the job in the
functionality allocation? So something that a lot of people have been working on is called layering, is
optimization decomposition. Try to understand layering. Not as an ad hoc engineering artifact, but as
something that pops out of a top-down principle-based design flow. Decomposition of an optimization model.
It is not the exact value of the optimum solution that we really care about that much. It's the decomposition
that matters most because that gives you the boundary of the functionality allocation.
So the flows roughly goes like this. That you start modeling the whole network by a generalized narrow utility
maximization. So the utility doesn't have to be exactly utility as a function of the through-put. It could be, as I
just mentioned, a variety of different objective functions. And the constraint sets could be intervalue
non-convex even with a lot of different degrees of freedom.
Given that problem, you want to find decompositions that break it down into smaller problems with smaller
number of variables, observing smaller number much constants, possibly solving for different objective
function than the original gigantic mother problem.
And now for a given decomposition scheme, and there are many schemes possible, we call it a layering
architecture. And the decomposed sub-problems correspond to different layers. The functions of primal dual
variables will be identified as the correct interfaces coordinating the layers in terms of what they can see and
what they can do. And this decomposition happens both horizontally, meaning over disparate geographically
located narrow elements, as well as vertically, meaning across the functional hierarchy such as the seven or
however many layer SI stack.
And then the glueing back together would happen at different time scales possibly and using either implicit or
explicit message passing. Of course, you hope that the message passing will be simplified down all the way
to only implicit message passing on a very infrequent basis. Sometimes you can do that while still hoping for
optimality.
There are basically three steps in this design flow now which is very much a first principle top-down
approach. You declare what you want and what you can vary. And then you search for a solution
architecture based on a particular decomposition. And finally, if time permits, you look for alternative
architecture coming from alternative decompositions.
So this framework has many, many papers that have different types of join this and that, that and this. In
fact, if you look at the permutation in the seven, sum of -- choose I. I goes from 1 to 7. It's a pretty big
number. Then for each such cross layer you can have like what, five different meanings of what you are
crossing, in terms of what you can do and see. For each idea, there are about five different groups solving
about the same time independently.
For each group, they are going to publish somewhere between three and ten papers, say three. So you add
up the numbers, it's a pretty big number and that's a very disturbing sign. It means you should get out of that
field as soon as you can.
(Chuckles.)
>> MUNG CHIANG: So what we tried to do is to say it's not complicated at all. There are ten equal
challenges, I am coming to that in a minute, that are true hard core challenges. But there are also just
becoming a template you can start filling in very easily and underlying the template there are only two very,
very simple things, conceptually very simple ideas one is to view the network as a an optimizer, as I just
said. The other is to view the process of layering as decomposition. So if you strip down all the complexity
and the, you know, the detailed variations, it boils down to something that is conceptually very simple.
Now, you wonder, the decomposition thing, how do I do this decomposition? What I am going to detail
that, there are many nice articles already in print. The standard technique is so-called dual composition or
grungian (phonetic) relaxation. That's mostly widely used. There is also primary decomposition, punitive
function that leads to possibly decomposition and there are different combinations you can do. Different
hierarchical level. You can partially do it. And later do the rest of the coupling and you can pick different
time scale choices.
And it turns out that there are so many different combinations that we want to construct a user manual. So
this is a collaboration with some other folks in Asia and Europe, trying to come up with a user manual.
It's not just intellectual curiosity because it turns out that if you go for different compositions that can lead to
different algorithms, each with different engineering implication as to what the result in protocol stack would
do for you. And more interestingly, you can represent the same given generalized nonproblem by an
alternative representation which certainly doesn't change to optimize the value but it does change the
structure of the problem. For example, adding redundant constraint. Seemingly innocent but it may lead to
opening the doors of even more decomposition possibilities.
So you can parse it from back there, but basically this is a snapshot of this user manual constructing. This is
president general template where we breakdown the procedure into three parts. First part you start with
engineering descriptions. You end up with one generalized numb representation, one problem.
Second block of procedure, you break that one problem by the composition methods into N smaller
problems. Still no solutions, but you have N problems.
Third block of procedure you find a distributed algorithm to solve. Hopefully, the way you decompose them
is inducive to finding distributed algorithm to solve each N sub-problems.
So you start from engineering description problem to a particular protocol stack. And this just shows one of
the many papers that go through essentially this procedure with the particular path. That's highlighted in
yellow here. And while we were constructing this Web site, we'll let you know when it's done and you can
click on many, many papers and each paper is just one yellow path through the flow chart. So can we
automate this in automate what? First, enumeration of possible decompositions. Two, comparison among
the ones you've generated.
Turns out this is very tough, okay? It's good to have a dream like that, keep people going. But this is very
tough.
First, enumeration. Exhaustively it's very, very challenging because alternative representation of the same
problem can lead to further decompositions and some of them would degenerate into equivalent classes.
Second, comparing. This is even tougher because compare against what metric? You know, speed of
convergence is one. Robustness against many things. Among the locality and asymmetry message
passing, trade-off between message passing and local computation. And even more fuzzier, metrics to
compare with.
A lot of these metrics of interest, we don't have the mathematical machinery to analyze or even to totally
bound them. So it's very hard to even get each individual axis done, let alone do a trade off analysis to
compare among alternatives.
So far it's all just manually done, okay? You manually enumerate. You manually try to compare and see if
you come up with one alternative architecture that makes more sense for your application.
Now, decoupling is not just difficult because it's hard to enumerate and compare. Sometimes what you have,
this optimization problem, is coupled in such a way that it's just hard to decouple it. I'll give you two very
quick examples. Each of these two examples were started with industry collaborators and we're fortunate
have industry people take on the analysis and then put it into some cases products.
But I will not go through the details. That will be two separate talks. I'll just highlight one aspect which is
decoupled is not always easy. So this is our first case of a DSL spectrum management. It's got some
environment access using a combination of fiber, closer to the neighborhood, followed by copper wires. If
you look close enough to central office, copper wire directly to you. This could be say ADSL, BDSL that uses
discrete -- transmissions, kind of OFDM.
And one question is, the copper lines are not fully twisted. So they still emit electromagnetic radiation
against each other an at high frequency band, that interference becomes substantial. People over in that
community call it cross talk.
And your job is to decide what power spectrum density should you shape your transmission so that the cross
talk doesn't hurt you too much and system-wide you want to find out for all the users, what is the right region
that can be obtained?
So for given fixed weights W, you want to maximize some of the weights, you can vary the weight and then
you get the whole boundary of the weight region.
The rate is the sum of the rate you get on each tone or carrier in there, by K, which assuming that
interference treated as noise, log one plus the signal interference ratio, this is the N's user's signal to transmit
power at a tone K. This is your interfering neighbors Ns and alpha NMK it is the cross channel, cross talk
channel gain from user M to user N on channel K or tone K. This is some additive noise term.
And the variable is the PN case.
So this problem it turns out to be very challenging optimization problem. It's non-convex and it's coupled
across both users and tones. So these are two different challenges. Convexity is the key to retractability
towards proving global optimality, whereas coupling is the bottleneck to the distribution solutions. You have
both bottlenecks sitting in front of you, well studied for many years with a revival in 1992 on this so-called
dynamic spectrum management problem.
So there are many different variants, many acronyms going on here.
I'm going to show you a version that is in collaboration with Chaing, Huang, Rafael Cendrillon and Mark
Moonen called ASB Autonomous Spectrum Balancing.
Of course, when I show a comparison table like this I always make sure that it highlights the best part of our
algorithm.
So no surprise that this is the best. And our viewpoints, autonomous, low complexity, and near optimal
performance.
We do not have theoretical proof on optimality gap, worst case, but empirically we are running on very
realistic industry grade simulators with very realistic tenant conditions and it always comes up to very near
optimum.
What is the basic idea here? Ignore all the rest, okay? I will only spend one minute on the idea and move
on.
The idea is the following. If you know congestion control literature Resendez (phonetic), you know there's
the idea of pricing, congestion signaling. It's dynamic, realtime changing of information from the network to
the sources so as to align the selfish interest into a social welfare maximization.
Can we do that here? Not really because among the users they cannot easily do message passing. And
they cannot easily do it with central office either for traditional reasons.
So within the same user across all these 256 or 4,012 tones, you can do dynamic pricing. But across the
users you can't do that. But if you only allow a selfish maximization it ends up in something called iterative
water filling and that is often quite sub optimal. So the idea in ASP is to add a virtual line, a reference line
sitting right next to you. And when you do a selfish rate maximization, you try to imagine somebody you
need to protect. So what is the parameter describing this imaginary somebody I am protecting that is, you
know, part of the details here? But it turns out that there are pretty robust ways to do that. So you're going
to do a one-time hand shake with the central office to get the reference line parameters and that serves as
what we call static pricing.
And static pricing would have worked very badly except this is a DSL case. So the wires channel conditions
do not time vary too frequently. So for static coupling, it turns out stack pricing works out pretty well. Okay?
And let me just show you one of the representative plots here. You're looking at two out of 14 user trade off
region. And this is the best rate region you can get in red. It's water filling, totally selfish gives you this.
Excuse me.
Autonomous spectrum balancing, the one I described, it turns out I give you very close to optimal solution.
Now, if your reference line parameter is way off, you are going to get worse, but with many sensitivity
analysis, you can see it in the paper as well, it still works remarkably robust way. So that's one sort of a
newer technique being developed, static pricing for static coupling.
Here is another. Different story. Power control single carrier, but wireless network. The channel conditions
fluctuate quite fast. Say you look at two users. If this is the achievable rate region, again it doesn't have to
be convex. If it's non-convex, that's another challenge. I'll come back to that towards the end of the talk.
For now let's assume it's convex, but it's coupled. It's coupled in the following way. Your objective as a utility
function depends on the power as well as the signal trend ratio, which is a function of powers. The constraint
is that the SR assignments must be feasible. There are certain dots in the SR plane that are not feasible.
There's no way you can have a power allocation that can give you that point because of interference.
And you can vary both the power and the SR assignment so that you get to the best point on the boundary
as defined by the intersection of the predetermined boundary with the red dotted curves which are the
contour lines of the utility functions.
So how did I get to this point? If you have a centralized the solver (phonetic) across monitoring all the cells
in the neighborhood and controlling all the base stations, then you can solve a convex optimization here. If
the utility is concave. And get to that point.
But the challenge here is that each base station would like to mind its own business and you cannot easily
coordinate across the base stations. You can allow base station talking to the cell phones, but you can't let
all the base station become centrally coordinated by one guy.
Then how can you get on to the boundary and then move to a point defined by the utility function as the best
boundary point? Now, this problem has been long studied in different speaking cases. For example, famous
special case. If you block the assignment from the variables, this is a constant given to you, fixed and
assumed to be feasible.
Then the problem of power control to achieve this SR target is well studied by, say, Foschini mogenic.
When you bring them together it becomes much more challenging and people thought about how can I
extend the Foschini mogenic power control? Well, over there, a key insight is that the whole project can be
transformed into a linear algebra problem where you are essentially computing the right Pirin Gianini's
(phonetic) eigenvector, What eigenvector is responding to the Pirin Gianini eigen value of a matrix describing
that topology of the network.
But if you want to do a mobile-based local update of these eigenvectors, it becomes very hard. For example,
you can try new decomposition. At least we tried and it didn't work. But it turns out if you do a change of
variable, changing the description of this boundary from the algebraic description based on right eigenvector
to one based on left eigenvector. It turns out the left eigenvectors do have an ascend direction that is locally
computable.
Why that happened, I don't know. You know, we just run into that. It shows that a change of variable can
reveal a hidden decompose ability structures that the original standard representation of the optimization
problem I will now give you.
So let me just scale up this.
Those two examples just serve one purpose in this talk, which is decoupling is not always easy, but they are
a case-by-case hard work that you can put in and hope that it helps.
So I'm leaving behind this architecture story because it's kind of old story by now. I mean, three years ago it
was a bit more fresh. But now it becomes talked enough, talked about enough that it becomes a little bit less
interesting.
So instead, let me step back and take a look at what exactly happened in 1998.
Among other things, you know, Frank Kelly published this seminal paper on congestion control,
Understanding Disreal Rate Allocation By Economic Model, or by this congestion pricing model.
Now, compare that with Shannon 50 years before that. Before Shannon, people worked on communications
as well, and they tried to design a finite block link code world that protected the signals through the channel.
But Shannon says for one moment let's turn away our attention from constructing codes but look at what
would happen if we had infinite asymptotically, infinitely long codes.
Then the law of large number and large deviation theory kicked in and Shannon was able to provide both
fundamental limits like capacity as well as architectural statement like this source channel separation
principle.
Of course, later people had to define finite block link code words and design a practical system. Now, before
'98, people had been looking at the coupled queuing dynamics. How do I look at many routers cascaded
with each other and the arrival of the packets are shaped by the previous routers. How does the queuing
system evolve? That becomes very intractable.
But Kelly says for one moment assume these were deterministic, fluid, flowing through the network. Then
what would happen? Well, it turns out then optimization and decomposition view kicks in as well as control
theoretic, game theoretic, the economic theoretic viewpoints.
And now people understand network protocols not as ad hoc designs but as trajectories of certain dynamic
control systems. And that's a profound realization. It's a profound viewpoint.
Of course, later the stochastics have to come back. That's the subject of this third part of the talk, which is
so what if we are facing the real world network where we don't have deterministic fluids flowing through it.
Indeed, there are many levels of stochastic dynamics. For example, session level. Are users, logical users
arrive with finite workload. They don't stay there forever with the static population. They have done their job.
They leave. Packet level, packets arrive in the same session in bursts, and they suffer probabilistic events
along the way.
Channel fluctuate topology may even vary. For each of these levels of stochastic dynamics, we would like to
understand many things. For example, it is still the deterministic formulation valid in the sense of giving the
right engineering intuition? If we do that, can the queues remain stable? What about probability distribution
on performance like delay? Maybe you don't know the distribution, but you may know that some statistics
like the average, the outage, you know, what would happen to fairness if you have like short flows coexist
with long flows with a different lifetime and so on?
It turns out that this big table has many blank spots. Somewhat arbitrary grading scale, I guess. Three star
would mean completely done. Don't even try to work on that anymore. Blank is sort of like nobody has a
really conclusive starting point. Two star, there are a couple of bright spots. Let me zoom into one bright
spot and then ignore the rest of the table today. So this is the bright spot of session level stability.
What does that mean? So look how the utility functions, the utility maximization problem. Now there's
dynamic use of population with arrivals and departures so S denotes the class of the users. And N denotes
at time T what is the number of active flows or active sessions in that class and phi is the resource that
belongs to a certain constrained set. It could be a timing variant polito (phonetic), for example. It could be a
time varying non-convex set. It could be something simple to something very horrible.
And you get to do processes sharing among the flows that belong to the same class and this is the problem
you are trying to solve per time instant T.
So departure clearly are governed by the service which depends out of this constraint maximization problem.
Arrivals soon to be independent of the network state. Suppose they are paused on arrival with exponential
file size distribution which is not true, but what is true it turns out to be not tractable.
We have to say things that we can say in the paper. Let's say exponential distribution. All right.
Then we've got a Markov chain. The state evolves like this. It goes up with this rate. It goes down with this
rate. Now, the question is: What is the distribution of the buffer occupancy? It turns out that's too difficult a
problem. Forget about that. Just ask about stability. For example, queue or rate stability, depending which
one you pick. Of this queuing neck this is Markovian input, but a State dependent service rate queue E
network. The State dependent precisely through the solution of this optimization problem which in turn
depends on the state of the queues in here. So there is a coupling between the deterministic optimization at
each time and evolution of the queuing dynamics.
So the answer is the queues will be stable if and only if something, right? Well, we know how the necessary
condition that the average intensity of the arrival must be within the constrained set of the deterministic
optimization. So the real question is, is that also sufficient condition for the queue stability of the security
network?
And the answer is yes in many, many, many, many cases. So this is a snapshot. It's not a comprehensive
list of the papers. The answer is yes, for the case of this kind of arrival, this kind of topology, the utilities
being the same across users or not and the shape of the utility functions.
Okay? However, the general question, general arrival, meaning doesn't have to be Poisson with
the exponential file size. General topology, possibly all kind of concave utility function. What is the sufficient
condition for stability? That is open.
Here is something else that is recently involved by group and collaborators. So instead of looking at the
arrivals being more general, we look at the constrained set being more general. So the standard thing is
timing variant convex.
But in many cases, especially in wireless, you have a non-convex rate region. For example, you are doing
random access-based allocation. You can have time variation of the rate. So the question is, in this case
people have proved under those benign mild arrival cases, that the stability region of the deterministic numb
solution approach of resource allocation coincides with the rate region or the constrained set of the
deterministic problem.
Furthermore, that is the largest possible region you can ever get by whatever resource allocations. It's the
maximum stability region.
Furthermore, obviously it doesn't depend on the curvature of your utility shape. So none of those three
statements are true, it turns out, if you have a non-convex or time varying rate region. That the resulting
stability region may be smaller than the actual rate region of the determine stick problem. That it depends on
the curvature of the utility function. And that it is not necessarily the maximum stability region anymore.
In fact, there may be an even trade-off between how fair you are, if you believe bigger alpha is more fair,
versus how large your stability region is.
There may be a strict trade-off there.
So I just want to illustrate that even in this one-one corner of this gigantic table on stochastic utility
maximization, there are either surprises or big open problems not solved.
Let me skip those couple slides and then move on to, this is the last part, where I want to talk about another
major challenge that we face. And it's a big big challenge. I talked about the challenge of decoupling. And
you have to do case-by-case hard work. We talked about the challenge of stochastics back into the
deterministic fluid model. And there are many work that remains to be done in order for the theory to have a
direct impact on the ream dynamic network.
And this is on non-convexity of your optimization model.
And I'm sure most of you know very well that the watershed between easy and hard optimization problem
quoting Rockefeller is not in the linearity but in the convexity.
So what happens, it is non-convex problem. I hate to invent more acronyms. This stands for design for
optimizability, but it's such a cool one, it's hard to resist putting it there.
What does that mean here? So again, let's take a step back and look at what kind of non-convex problems
you can encounter. They rise up in all kinds of corners for different reasons. Like you can have
realtime application that gives you a sigmoidal utility function. You can have power control, low-SR, you can
have many cases where the constrained set is non-convex.
You can have integral constraints or it can have a convex constraint but is going to take exponentially long
description length to describe it. So that is still very hard, like certain scheduling problems where the number
of activation sets is exponential in the number of nodes.
So whenever these happens and you know you are in big trouble. So now what can you do?
Mathematically, one thing that is good is that convexity, unlike say uniqueness of KKD points is not an in
varying quantity. so you can do a nonlinear change of variable, for example. Like law change of urban
geometric (phonetic) programming or embedding in a hard dimension like sum of squares method, that gives
you the convex geometry, even though it started out as being non-convex.
But I am not going to talk about that today. Instead, I'll advocate a more engineering approach. A fancy
name is called design for optimizability, a more straightforward name is that ask your neighbor to change his
assumption. Or to change his design so that you work on different assumption.
And I'll illustrate that in an example of Internet traffic engineering.
But I still want to give you some pictures about perspectives, okay? Say this is non-convex optimization
problem and there are three broad approaches of attacking this problem. One is what I call go around
non-convexity, like you change a variable, you find conditions under which it is convex or under which KK
point is unique, and so on.
The other is to be brave and just sale through it. Things like some obscure signal programming, in general
some kind of success of convex approximation to solve the problem. Or brunt and bound, maybe in a very
smart way. Or use structures like differences of convex and generalized plause of convex and concavity and
so on.
Here is the third way. What I call going above non-convexity. It comes out and it goes back in there. It says
don't solve difficult optimization problem. Try to redraw the architecture protocol to make the problem easy
to solve. Of course, you may have to pay a price for that. And that's the trade-off to be aware of. So in this
case, optimization is not being used as a computational tool. Not even used as a modeling language or
architectural understanding. It is actually used as a flag to say that certain design issues may be flawed
because if you can help it, you shouldn't be tackling a difficult problem. Like if you are in the structure, you
are the instructor of the course and the student at the same time, you shouldn't give yourself a hard midterm.
Let me illustrate this with the last case, which is on Internet traffic engineering. Within an autonomous
system. So I would like to, you know, unlike what some people have impression like myself before
collaborating with people working on this subject, I thought, oh, how bad could that be, the shortest path
routing?
It turns out it's really not shortest path routing. It's sort of the reverse of that. So the operator, say, NT,
would every now and then update a number per link called the link wait. And from control plane to data
plane, give this link weight to the routers and the routers would use link weights to construct the forwarding
path. And then when a packet comes, you have a header, the destination, address and then forward the
packet.
So this is what they call link weight based, hop-by-hop destination based packet forwarding.
So it's a long phrase. And they want to do this so as to load balance the traffic in the procedure called traffic
engineering. So there's a centralized computation to set the link weights of the distributed usage of the link
weights to do packet forwarding based on destination only.
OSPF is a major class, a major member of the class of such routing protocols. But people know that if you
want to do best traffic engineering OSPF will have to require exponential complexity.
What is the best traffic engineering? So operator assigns a metric like the cost of a link which is a function of
the utilization percentage. And usually this is modeled after some queuing delay formula. And you want to
minimize the sum across all the links. Or you want to minimize the maximum utilized links appearance. Or
you want to delay the appearance of the congested link to as late as possible.
So whatever metric you pick, there's a certain objective function there. But the variables are the link weights.
So this is weird, right? The knobs are the link weights which you tweak. Tweaking the knobs would force
the routers to allocate traffic in a different way by forwarding packets in different ways, which will result in
distribution of traffic different way, which changes your objective functions value.
But the link weights do not directly change the objective function. So your monitors here, your knob is there
and you have to indirectly influence that.
And that's kind of weird, but that is how Internet intra AS routing is being done today.
So we wonder, you know, how can this problem of OSPF not being able to give you best traffic engineering
in an efficient way. Don't, you know, even try to read this. It's a long list of a partial, actually very partial list
of outstanding work that has been done on the subject of influential paper was in 2000 by Fortz and Thorup
to show that it's empty hard to do best traffic engineering by OSPF and you have to resort to local search
methods which is, you know, set the industry standard for many years.
Around the same time, actually before that, people started looking at something called MPRS, which enables
you as a forwarding mechanism to do multi-commodity flow routing, which is what mathematically nicer
tractable convex optimization solution.
So that is good, but MPRS forwarding requires end-to-end tunneling. And that requires keeping a spatial
memory. And that is not the philosophy of hop-by-hop destination only forwarding.
That may be one of the reasons why still people in the industry actually running the network love OSPF more
than MPRS.
Now the question is, can we get MPRS like performance without MPRS complexity, without entering
internally. You still do local hop-by-hop forwarding of packets, and yet be able to get to optimal traffic
engineering. Is that possible or not?
And we recently showed that it is possible. In fact, it is coming 08. There is an 07 paper that has been
superseded by an '08 paper.
The idea of the protocol is extremely simple. The idea of the proof actually is quite interesting.
The protocol says the following: All right, you are going to use link weights computer path and decide traffic
explaining pattern, but unlike OSPF, do not only construct shortest path and evenly split the traffic among the
shortest path, but instead use all of the path, but put exponential penalty on longer path. Why exponential
penalty? I'll come back to that in a minute.
Now, this doesn't change any way you do the destination based packet forwarding. However, it does change
how does AT&T or any operator does this central computation of the link weights, because the link weights
are now being used differently by the routers that would mean it has to be computed in a different way by the
operator.
So the question is, could this new protocol be really good? And if so, how can the computational link weights
be done? Let's answer the first question first. One way to measure if this routing protocol is good or not is to
compare with the benchmark of optimal traffic engineering, say achieved by solving multiple flow problem
realized by MPRS kind of forwarding. And Y axis is normalized to one. What is that? It means that you
increase the traffic load into a sum point for a given routing protocol. There will be a congested link and you
cannot increase anybody further.
Stop and then calculate the link utilization. For optimal traffic engineering, set that as the benchmark 1,
okay? Now you run a different routing protocol, presumably not as good. Then by the time you stop, the link
utilization will be lower. And indeed if you run OSPF you are going to suffer some hit. Sometimes big,
sometimes small, but different kind of network. This is Abilene network with search and traffic matrix
recorded on a certain day and year. These are pretty standard artificially generated network topologies that
this community often uses.
But if you run what we call DAFT or a sister version called PEFT, whatever they stand for, essentially this
exponential penalty area, numerically you see it's almost always one. In fact, we now can prove that it is
one; that it will achieve optimal traffic engineering exactly. Another way to measure it is to look at the
optimality gap which is the sum of the cost function, usually modeled after the MM1 queuing delay over all
the links inside the network. So these pictures look more impressive because DEFT is almost zero and then
the OSPF shows through a very high value. But this really is more of the artifact of the MM1 delay based
penalty function more than anything else.
So you know, if we want to get funding from certain agency, we would like to show this picture first. You see,
you know, the network will explode if you don't do it our way. But really, this is more meaningful one which
means saying Abilene topology, it gives you about 15 percent improvement just by being smarter, owing how
you use the link weights.
So the theorem is that link based routing, link State Routeing with hop-by-hop destination based forward can
achieve optimal traffic engineering and furthermore the associate optimal weights computed by the operator
can be done in polynomial time.
In practice, it turns out that it is 2000 times faster than the existing local researching algorithms for OSPF link
weights. That means not only is it better, it's also provably better. Not only it is faster in theory, it's much
faster in practice compared to local search heuristics of OSPF.
How do we prove that? One picture to summarize that. Look at this, the set of all feasible flow routes, okay?
One point is one flow routing vector. And a subset of that is optimal with respect to your traffic engineering
objective. But not all the members in this subset can be realized by link state routing, with hop-by-hop
forward.
Our job is to pick out those that are not only optimal with respect to the objective function of your traffic
engineering, but also realizable with link state routing. So how can I pick out those configurations? It turns
out that we can construct another optimization problem purely as a proof technique for whatever constrained
set, where the objective function is an entropy function. It's the entropy of a probability vector that describes
the probability of going from your one node in the network to the final destination through a different path.
And they normalize with total traffic intensity. You get a probability vector, look at a certain normalized
entropy function of that. Optimizing that, normalizing that maximum entropy function over possible
configuration subject to a certain constrained set turns out will give you the animus in this subset. And the
resulting cancellation of the terms is what leads to this exponential penalty.
So the exponential penalty is a direct consequence of this proof technique here. It turns out we can also
prove the other way around, that the entropy function is the only function subject to scaling, shifting, that can
give you this picking ability. And exponential penalty is the only penalty. You can try others, any monotonic
function. They will not be able to give you proof of optimal traffic engineering.
So the details again are all in papers you can download.
If you remember the DSL study, we talked about the coupling side. There is another side which is the
non-convexity side. It turns out that somehow it performs very well, right? You probably should have asked
the question at least in your own mind at that point which is: All right, I can see the decoupling using static
pricing, but how come that we achieved rate regions so close to optimal? Isn't that in non-convex problem?
It turns out that somehow the hard problems in that instance aren't hard in reality. By reality I mean what? I
mean that you can give me a DSL spectrum management problem as a mathematical problem and then you
can give me some description of the constant parameter of the cross channel coefficients. such that any sub
optimal heuristic can work very poorly. That's mathematically doable.
But these cross talk coefficients are not mathematical constructs. They are a representation of the physical
reality of the law of electromagnetic radiations. So somehow in this tough landscape of non-convex
problems, so imagine giving a non-convex partner who has a local landscape with a local maximum, global
maximum, you know, local minimum and global minimum and so on.
The instances that are meaningful, those parameters will lead to a terrain of this problem where the location
number and value of the local maximum are not that bad compared to the global maximum, as to why that is
the case, can we prove a statement such that for a certain subset of these crossing coefficients not only can
we prove convergence which we have done, but we can also prove optimality all together. We wish we will
be able to do that. We are not yet.
And this routing story just shows that sometimes hard problems don't deserve to exist all together. In this
case, it's not coupling of electromagnetic radiation that is bothering you. It's this assumption that you have to
pick only shortest path and equally split traffic among them.
Given the link weights, there's no reason why you have to use link weights in that particular manner.
So that engineering assumption actually does not match optimal traffic engineering. No wonder it didn't work
in a polynomial time before. So instead of going on from a restrictive assumption to some intractable
formulation, forget about a path. Try to be brave with non-scalable solution. Maybe we should just take a
minute to say the fact that this optimization coming out of an engineering assumption becomes intractable is
a signal, a flag that tells me maybe the assumptions can be perturbed. For example, in this case relax to
allowing whatever monotonic penalty function, turns out the exponential penalty function works great, such
that I end up with attractable formulation and a scalable solution.
So this is what we call a feedback, not in the network but in the human thinking procedure, in engineering
process. This feedback is very important. The next time maybe looking at a hard problem, that should be
alerting you to knock on the door of your neighbor and say ten years ago you did something, you made a
decision for no good reason, or for some good reason. But now there are other reasons I may supersede
that reason. And maybe we can talk about changing your decision now.
So most likely you have to pay a price for revisiting the assumptions because they are often made with good
engineering intuitions, not for zero reason. But in this case it turns out you are not really paying much price
at all. You can provably get optimality and it remains as simple as OSPF. You have to do more computation
at the operator and router, but those computations are cheap. The key thing is you don't need tunneling.
You don't need spatial memory.
A little price you have to pay is an out of order, packets arrival. Though that can be also be dealt with in
different ways.
So I am going to skip this complexity part. It turns out there is some interesting story recently been
developing on scheduling where we look at the trade-off between complexity and optimality in the three
dimensional space, but let me conclude this talk here.
Back to the first slide where I said that when you look at optimization of network, if you can please do not -just think about that as computational, too, to help you solve a minimization maximization problem even
though that is still a useful thing to have and sometimes it does happen as well. But think of that more as a
way to do modeling, whether it's forward or reverse engineering, as a way to quantify mathematically
architectural decisions and statements, as a way to connect to robustness, as a way of giving feedback to
the engineering assumptions.
So if you take away optimality, what you have left with optimization, think you have something that is much
more open minded and much more powerful. Optimization for networking as a language to think about
choices and decisions in network engineering.
With that I'll conclude. Thanks for attention.
(Applause.)
>> MUNG CHIANG: Questions? Yes?
>>: I have a question. So you mentioned that you challenge the assumption that the conventional
assumption where you shortest path route and -- and in trying to move the script that you want to find the
shortest past, what is the resist?
>> MUNG CHIANG: Because it is not matching the definition of optimal traffic engineering, which is to
minimize -- I didn't show you. I perhaps should have -- minimize the sum of the cost function overall the
links. The objective is not to compute the shortest path. It is to minimize the total congestion cause
function across the entire network.
So you really, what you want to do is load balancing. Shortest path is a very nice load balancing, but will that
be able to give you optimal load balancing? People had hoped before 2000 for the Fortz and Thorup paper
showed that not really. Unless you are willing to have exponential complexity of computing the link weights.
So polynomial time complexity of computing the link weights such that the resulting link weights if used only
to do shortest path routing cannot give you a load balancing that is 100 percent perfect with respect to
congestion cause function.
And that's what they did in 2000. So then people went to the other extreme and start feeling that it is
hopeless, it's impossible to have -- true, the answer is OSPF cannot give you that. That doesn't mean that
any other links they route in port cannot give you that either.
So in fact there were paper -- for example, I decided paper in 2005 which inspired our work that shows what
if you try other penalty functions? Not just Boolean penalty because shortest path is Boolean penalty, right?
You are a shortest path or you are not. If you are, you know, you count how many you are, divide by that
number and that's the traffic splitting.
If you are not, you get zero loaded on you. People have started thinking about maybe I do a penalty function
one over the sum of the path cost. Link path weight. Or maybe exponential penalty or maybe other function.
And what we did is to show that based on those ideas we can refine the ideas to the degree that it is actually
provable, that one of the, these alternative penalty functions can give you exactly the optimal traffic
engineering.
So to answer your question in short, because load balancing in the meaning of optimal traffic engineering is
not equivalent to shortest path.
Yes?
>>: -- basic function -- is the sum of the delay or the condition on HS or the maximum?
>> MUNG CHIANG: So, it turns out that it can be any convex function composed -- so each link you look
at the convex function of the link utilization percentage. For example, piece was linear approximation of the
MM1 delay as a function of the utilization. And then look across all the links. You can pick any convex
function. Sunk is one, max is another of those convex functions, composed with those functions and that is
the cause function.
>>: But this exponential penalty function you can optimize any --
>> MUNG CHIANG: -- any of these convex, as long as it's a convex function composing with individual
convex functions. In practice, what operator used is the sum of the convex caused functions.
>>: What is this, what do you mean by this exponential penalty function? So let's say you know like the
weight of the path.
>> MUNG CHIANG: So I've got two paths, you know. One is path, the path weight is ten, the other is path
weight is 20. So this is E to the minus 10 divided by E to the minus 10 plus E to the minus 20 and the other
is E to the minus 20 divided by E to the minus 10 plus E to the minus ten.
>>: If you want to do the next problem, if -- so if you can, true, if it's MPLS, you can choose the path, you
can choose what fraction to be send on each path.
>> MUNG CHIANG: Yeah.
>>: But if you want to view the next hop, you have to like aggregate these for next hop.
>> MUNG CHIANG: So it turns out that you can do a local decision only looking to the destination without
looking back to the source. And then you can make this local decision about path splitting in the DEFT or
scheme that we proposed. And that is one trick that I didn't go into the detail. But indeed, if you need
end-to-end enumeration of all the path, that would have been just like MPRS. No, you don't need to do
that. You only need from you forward to destination.
In fact you don't even need to enumerate all the possible path. There is a recursive computation in our paper
that does that.
>>: You can simulate like all these different paths by ->> MUNG CHIANG: Essentially, yes. You can simulate the solution to this, what we call network entropy
maximization that leads to all these cancellation of the terms, based on a recursive counting of the
alternative path, looking only forward to the destination without spatial memory.
Yes?
>>: So I'm thinking so this Markov computing OSPF weights to new trafficking ->> MUNG CHIANG: Yes.
>>: That started with -- it seems where they were coming from, they have some legacy routers and PLS
was recognizing the new technology. So they wanted, you know, to entropy was a way -- to do MPLS type
of traffic engineering.
But what I'm trying to understand here, MPLS buys you a lot of flexibility. You can do traffic engineering, four
class of traffic. You can do things like obvious routing when your traffic is variable. So is this like, is there
easy answer to say I can do away with MPLS because I can solve a certain class of problems using Morris
Peer (phonetic) authority?
>> MUNG CHIANG: Sure. two things. First of all, my understanding, and I could be totally wrong is that
MPLS in the '90s invited a lot of excitement, but then people started feeling that end-to-end memory is not
the way to go forward with legacy and Fortz and Thorup showed that while OSPF can't do it exactly right,
but here local search method that gets very well.
Now, second, does this mean, therefore, that with DEFT and PEFT, MPLS will never be useful again? No,
because DEFT and PEFT can proveably get up to the optimal traffic engineering. We go back to that
objective function. Now, your objective function changes, for example, to be, to the ideal or to the notion of
being flexible, however that might be mathematically represented or not, all right? Then clearly you are
dealing with a different optimization problem. The flexibility, whether in OW routing, QS routing, routing with
a delay and so on and so forth, MPLS being an end-to-end tunneling mechanism, gives you a lot more
power, a lot more freed only. Some of these power may not be captured by any link State Routed protocol
including PAFT, DAFT or MPLS. Indeed if you say I don't care about optimal traffic engineering, I care about
many other things as well, there are also some operative function, then there's been no study, because very
recent whether DAFT, PAFT can do that, and I suspect they can not do them all because they are relying on
a very simple description. Each link has one number and the very simple decision rule. Only destination
based hop-by-hop forwarding.
I really doubt if with such limited gun power you can kill all these beasts, you know, of the obvious routing,
QS routing. I have big doubt about that, but nobody has looked at that. The answer is indeed, MPLS still
has its merits. Whether it will be widely used or not will depend on factors outside technology.
>> PHIL CHOU: Any other questions?
(There is no response.)
>> MUNG CHIANG: Okay. Thank you.
(Applause.)
Download