>> Dave Maltz: Hi. It's my pleasure to... here from Cornell. He's a student of Paul Francis. ...

advertisement
>> Dave Maltz: Hi. It's my pleasure to introduce Hitesh Ballani who is visiting us
here from Cornell. He's a student of Paul Francis. He'll be talking to us today
about his work on the router table space exhaustion and what can be done to fix
that, his other work in extensive work in Internet measurement, working with the
DNS and anycast problems, as well as work in top-down management of IT
infrastructure and networks. So thank you very much for joining us. And Hitesh,
please take it away.
>> Hitesh Ballani: Thanks, Dave. Thank you for having me here. Before we
jump into the technical details, I wanted to give you a brief overview of the kind of
work I do. And I figured what better way than a [inaudible] of my research
statement. Now, I'm sure you've all heard this argument in one form or the other
that the explosive group of the Internet over the pass decade has meant that
many of the original assumptions underlying Internet design no longer hold true.
This has led to a plethora of problems which is good for us, because there's work
to be done. However, there's a flip side of the story. The success of the Internet
has meant that you can't really go out and say well, let's take this protocol out
and put this new one in and whoala, the problem goes away.
Throughout my graduate career I've tried to stay cognizant of this ground truth.
And I've really strived to strike a balance between two complete [inaudible]
regarding my research. On one hand is the freedom of doing blue sky research.
Openly questioning Internet design in the face of new needs and challenges. For
instance, as Dave mentioned, as part of my dissertation I look at the problem of
network management. And I argue that networks today are so difficult to
manage because the Internet was not designed with manageability in mind.
So what I mean by that is protocols today tend to expose their internal bells and
whistles. And what humans and management applications are sort of expected
to understand these low level parameters in order to be able to manage the
network. Hence, we proposed a new morning management architecture called
complexity oblivious network management or CONMan for short. As part of
CONMan we precisely defined the kind of information that should be exposed by
protocols in order to make them amenable to management.
All these protocols are implemented and deployed on a real test bed and what
we found was that the resulting test bed was much easier to configure. When
things went wrong you could debug and diagnose the network forms in a
structured fashion. So overall it was lot of function showing that this complex
problem of network management can be tackled through such an invasive
approach. Question?
>>: Are you going to give us detail later in the talk about how you demonstrate
this?
>> Hitesh Ballani: On network management? No. The talk is generally about
my routing work, but if you're interested I'll be more than happy to speak to you
about the network management stuff and my vision there. Okay?
On the other hand, I do want my research to have real impact through solutions
that are immediately deployable. Now, this may seem to contradict what I said
earlier when I claimed that a lot of these network problems arise due to a
mismatch between design and use. And I still stand by that. However, what I
have found is that in many cases simply by focussing on a subset of the given
problem space, you can come up with a solution that does not require wholesale
change.
And this is the recurring theme in my work, a trick that I use over and over again.
And these are not just point solutions. As I'll explain later, this can actually serve
as small incremental steps towards the eventual solution that you would like
anyway, so there's a clear progress path between where what can be done today
and where you'd like to be 10, 15 years from now, which I think is pretty neat.
To give you a few examples, I have worked on how to deploy IP anycast which is
essentially a network layer service discovery mechanism in a practical and
scaleable fashion. So effectively I've built the service, I deployed it, I have seven
servers across the Internet which included BGP peerings with various ISPs, and
[inaudible] this test bed for the past four years and it has also been used by other
researchers for active routing experiments.
On a similar note, I came up with a minor modification to the caching behavior of
DNS servers and assured that this can substantially mitigate the impact of the
service attacks on DNS. Now this was a very, very trivial area. But I'm still very
proud of it because at the time when these papers were published, a lot of
people were proposing normal peer-to-peer architectures for DNS. Now, that's
all great work because DNS does suffer from severe problems.
What I showed was that if you focus on the specific problem of flooding attacks,
you can come up with a solution that does not require infrastructural change.
And as it turns out, there is a connection between the simple idea and the other
clean-slate proposals that I just mentioned.
And finally, there is my more recent work on routing scalability which is what the
rest of this talk is about. So we'll be switching gears here a little bit and moving
on to the technical component of the talk.
Now, I'm sure many of you must have heard about the growth in the Internet
routing table. Then you must have seen this graph. And you can see that the
growth has been rather steep over the past few years. What is more [inaudible]
is that this is probably going to get worse in the near future. This is because as
the IPv4 address space runs out, more and more small prefixes will be
advertised, resulting in a larger routing table. And if, lo and behold, the IP
deployment takes off the ground, we could end up with a very bloated routing
table.
Now, this routing table specifies how a given router should follow data packets.
And so routers need to maintain this in fast memory as part of something called
the forwarding information base or the FIB for short. And I'll explain how that
works later.
Hence, a larger routing table means that ISPs need routers with more and more
FIB space. Given this, there are two questions that come to mind. The first of
which being why is the routing table growing so rapidly? And I'll comment on that
very briefly. The scalability of the Internet serving system relies on hierarchy,
which in turn requires that the addressing of ISPs on the Internet be in line with
the actual physical topology.
So if you had a nice figure like this wherein the address space of any given node
is a subset of the address space of its parent, you'll get good scalability. For
instance, if you look at the routing stated ISP, it only comprises of other top level
ISPs and its immediate children, a small amount of state. However, the growth in
the Internet has meant that this nice picture is no longer true. In the figure I've
shown how [inaudible] wherein a given sight connects to multiple upstream ISPs
can lead to a mismatch between addressing and topology. And there are
multiple factors that can lead to this. But the point here is that this mismatch
between addressing and topology is the route cause for the rapid growth in the
Internet serving table.
The second question that I'm often asked is so what. [inaudible] throw more
RAM in the problem. And that's exactly what has been happening. Every few
years or so router windows come out with a new generation of routers that have
more memory and more processing power. And I'm not arguing that they can't
keep on doing this forever, however, the problem is the scalability properties of
this FIB memory. On the technical side, there's a concern about the amount of
power these massive chips consume. Not to mention the concomitant heat
dissipation problems which are especially relevant in the kinds of locations where
these router's housed. However, off-chip SRAM, which is commonly used for
router FIB is a low volume component that has been shown to not track Moore's
law. So the cost codes are not in our favor. And what I mean by that is a larger
routing table reduces the cost effectiveness of ISPs because the price per byte
forwarded increases. And of course there's the cost of actually upgrading the
router because it ran out of memory.
However, in spite of all these reasons, I have to admit that FIB size is a
contentious issue. There are people who think FIB size is important problem, be
and then there are the non-believers. As a matter of fact, in one of my own
papers a few years ago, I had argued that it is technologically feasible to add
more state to routers and I still stand by that. And while my thinking has evolved
over time, the standard I'm going to take in this talk is that given the technical and
the business implications of FIB code, you can't be sure of the criticality of the
FIB code problem given existing data.
Looking ahead there are several very good reasons why you would want a
smaller routing table. And of course these are speculative in nature. But what
really sealed the deal for me were recent events and discussions with ISPs and
operators. And I'll give you a quick example. ISPs in Australia have already
started filtering out small prefixes, mostly slash 24s, which is the smallest
address block that you can advertise into Internet routing. This is because these
ISPs don't want to upgrade their in-store router base, which is scary because it
implies that parts of the of Internet may not have reachability to each other.
So hopefully this quick anecdote convinces you that ISPs are concerned about
the FIB on the routers, and they're actually willing to undergo some pain to
extend the lifetime of the installed router base. Which is good for me because
there's a specify real problem to solve. Question?
>>: The ISPs do that kind of filtering in the slash 24s, does that really mean you
have [inaudible] reachability or does it mean you just get inefficient routes?
>> Hitesh Ballani: It could mean inefficient routes for the most part, but it could
mean to non-reachability so they're not just filtering out.
>>: [inaudible].
>> Hitesh Ballani: I don't think there has been active study. I have done some
random hacks or measurements and it seems that in some cases there's a
decent amount of inefficiency because you're not just throwing the slash 24
away, you're assuring there is some super prefix such that you have a slash 16
and you're throwing the underlying slash 24s away. But because of [inaudible] in
some cases it might be the indication that you don't even get reachability. And
I've seen a couple of instances of that. But I don't think there has been any
measurement, systematic measurement.
>>: I'm curious. Who is the Australian ISP?
>> Hitesh Ballani: This was a Pacnar [phonetic] mailing list, and they had
mentioned several Australian ISPs. I don't have specific names. All right?
Okay. So this is good because there's a specific problem. So motivated by this,
we propose Virtual Aggregation, ViAggre. A configuration only approach to
shrinking the FIB on ISP routers. By configuration only what I mean is that
ViAggre does not require changes to routers and earning protocols and it can be
deployed independently by any ISP on the Internet today.
In case you were wondering, I did not come up with that name. The credit or the
discredit for that goes to Brad Karp at UCL. Also, it's ViAggre with a V, not with a
W as it will sound when I say it because my Vs and Ws are messed up and it
doesn't really come out with my pronunciation. But anyway.
This talk focus mostly on the research component of ViAggre. Now, I do want to
point out that these features have allowed ViAggre to have real world impact.
For instance, there's ongoing standardization effort in the IETF ViAggre. This is
peer headed by Huawei, the largest router manufacturer in China that provides
equipment to more than 70 percent of the top telecos. As a matter of fact,
Huawei is also implementing ViAggre natively into its routers. And I'll explain
later what the advantage of such an approach would be.
But the point here is that this possibility of real impact is what is so exciting about
all this research, at least from my point of view. The basic idea between ViAggre
is really simple. Today every router needs to maintain their entire routing table.
In ViAggre, we allow the ISP to divide the routing table into parts such that
individual routers only need to maintain routes to a few of these parts, and so you
get a shrinkage in FIB size. And I'll explain that as we go along the talk.
So the way I have set up the rest of the talk is that I'll begin with some basic
information on Internet routing, which I'm sure most of you know anyway, and will
help me come up with a crisp problem statement, and it will help me place this
work in context.
So Internet routing. Internet routing is domain based wherein the domains are
independent, autonomous entities. This could be ISPs like AT&T and Sprint and
enterprises like Microsoft and [inaudible] and so on. Such domains lend
themselves nicely to a two tiered routing architecture where in you have
intra-domain routing to establish routes within a domain and inter-domain routing
to establish routes between domains.
On the Internet BGP serves as the de facto inter-domain routing protocol and
BGP is what the rest of this talk is about. Moving from this very high level view to
the innards of factored routers, what I've shown here is a router that is connected
to two other routers. At the top there's a route processor, which is responsible for
the routers [inaudible] tasks. This route processor maintains what is effectively a
database of routes that it obtains from other routers. This is known as the routing
information base or the RIB for short. And it's generally maintain on slower
memory, for instance data might be used.
On the other end we have the line cards which are responsible for actually
sending and receiving the packets. These line cards maintain a table of routes
based on which to forward these packets. And as I mentioned earlier, this is
known as the forwarding information base or forwarding table or routing table or
FIB for short. Obviously the FIB need to be accessible at line rates and so it's
generally maintain on fast memory, for instance off-chip SRAM may be used.
And here I've shown how packets might come in through one line card,
mismatched on to another line card. So two important things to note, RIB resides
on slow memory, FIB resides on fast memory.
Given this basic routing information, we can take a look at the scalability
problems afflicting the Internet routing. As I mentioned earlier, a larger routing
table has led to concerns regarding FIB growth. Beyond this, there are concerns
about RIB size problem, routing convergence and all this has been known for a
while, which shows up in the massive amount of work done if this area. And at a
very high level, there are approaches that argue for separating edge networks
from core networks, there is geographical routing. There's been a whole lot of
work especially in the theory community on compact routing. There are
elimination approaches and without going into any details at all, the common
thing here is that all these approaches require architectural change. I don't mean
that in a bad way because I do realize that a lot of these problems arise from the
way Internet routing works, and so some change is in order.
However, this very need for change has meant that non of these approaches
have seen any deployment. Frustrated by this lack of deployment, I started
wondering if you can do away with the need for change by focussing on some of
these problems gathered by this in ViAggre we tackle the FIB size problem in an
incremental fashion. And as it turns out, our current techniques can also help
with both RIB size and routing convergence. So again we have this notion of
achieving deployability when narrowing down our focus.
So in this talk, I'm going to focus mostly on FIB size and if time permits I'll
comment on how ViAggre can be applied in a more invasive fashion to
[inaudible] the inter-routing scalability problem space.
And with that big picture in mind, we can delve into the ViAggre design.
Question?
>>: So you showed us how the [inaudible] over time. Can you tell us how the
FIB grows compared to that?
>> Hitesh Ballani: Say that again, please.
>>: How does the FIB grow compared to the growth in the routing table?
>> Hitesh Ballani: So the routing table sizes are the [inaudible] the actual FIB.
So it's the FIB size, the global FIB of -- there are routers in the default free zone
which routers that can't use default routes and that when a set Internet routing
table I essentially meant the FIB size in the core of the Internet. I don't want to
use the term FIB at that point in time because I hadn't introduced it. Obviously
the RIB can be much bigger because it depends on the number of peers you
have and other factors. So that graph was for the FIB on the core of the Internet.
Okay?
So ViAggre design. And in the figure here, I've shown an ISP with three PoPs or
points of presence. These are routing centers in different geographical areas
and a few routers in each PoP. Today as I mentioned, each of these routers
feeds to maintain the entire routing table. In ViAggre, we allow the ISP to divide
the IPv4 address space in parts. Here I have divided it into four parts which are
color coordinated.
Each of these parts can be represented by a slash 2 prefix, which I'm going to
reference to as a virtual prefix. Now, I'm assuming that people are aware of the
slash rotation. Since I have four parts, I only need two bits to represent them,
and so only the two higher bits of the IP address are relevant, everything else is
masked off and that's what slash 2 means.
Beyond this, the ISP assigns its routers to these virtual prefixes. So the two
green routers are assigned to the green virtual prefix or their aggregation points
for the green virtual prefix. What this means is that these routers are responsible
for routes to any prefix in the green part of the address space. And you can
imagine how this will ensure that on average every router maintenance a quarter
of the Internet routing table. Of course this assumes that the prefixes are
distributed uniformly across the address space, which is obviously not true for the
Internet. However, you can choose these divisions such that the distribution of
prefixes across them is relatively uniform.
Now, be this basic idea of dividing the routing table can be can achieved in a
number of ways. For instance you could change the routing protocol to do this
the. However, we wanted a deployable system. And so a key design challenge
was how do you achieve this without changes to the routers and without requiring
external cooperation? Further, once you divide the routing table, how do packs
flow through the ISP's network? Because routers now only have partial
information. So in the next part of the talk, I'm going to elaborate on ViAggre
design by answering these questions in order.
First up, the control plane design. In ViAggre, we need to make sure that the
following table on any given router -- question?
>>: Yes. So is there only one router per point of presence in this picture?
>> Hitesh Ballani: This is a simplified picture and that's what is misleading here.
So in a typical deployment, and I'll get to that later, a typical route, a typical PoP
has 20, 30 routers, and so you want to make sure that there are two routers or at
a least two routers responsible for every color in each PoP for things like stretch
and [inaudible] and I'll get to that in like four or five slides from now. Okay?
So in ViAggre, you need to make sure that the following table in any given router
contains only routes for the prefix or this aggregate. So route router should only
contain route prefixes, which is a problem when you consider interaction with
external ISPs. In this case, the external router is advertising the complete routing
table to the blue edge router, even though it's responsible only for blue prefixes.
And so we need some mechanism of selectively inserting routes into a given
routers forwarding table.
We came up with two mechanisms to achieve this. As it turns out, one of them
double really work in practice, the performance overhead is too high. So what
I'm going on do is I'm going to explain both the mechanisms at this point in time,
and once we go on to the performance evaluation section I'll explain why one
works while the other doesn't.
The first mechanism or design one takes a one page of router feature that we call
FIB suppression. With FIB suppression you can configure a router to only load a
subset of its RIB into its FIB so in this case the blue router gets the entire routing
table, loads this in its RIB and only loads the blue routes into its FIB. The
advantage of this scheme is that the ISP does not need to modify its internal
routing setup at all. Everything remains the same as today [inaudible] the entire
routing table and selectively inserts into its forwarding table. The disadvantage is
that there's no RIB shrinkage at all because every router still needs to maintain
the complete routing table in its RIB, which is the slow memory. So that's the
disadvantage. Question?
>>: [inaudible] change your hardware for this?
>> Hitesh Ballani: Yes.
>>: Is that true?
>> Hitesh Ballani: We don't change the hardware and the software for this. As it
turns out, existing routers have this capability of putting entries, selectively
putting entries from the RIB into the FIB. Whether that's practical or not, I'll show
you later that it turns out to be not practical. That's the catch here. But there is a
second approach that does end up working. Okay?
>>: [inaudible] I mean the RIB is something external DRAM, right?
>> Hitesh Ballani: Yes.
>>: Presumably only a few gigabytes [inaudible] right?
>> Hitesh Ballani: Yes.
>>: And so we're talking about gigabytes DRAM that costs hundreds of dollars.
So is that a problem?
>> Hitesh Ballani: No, the DRAM on the RIB size is not a problem, per se. What
it turns out, and I'll show this later, that reducing the RIB, which the second
design does, not this design, actually helps out with internal routing convergence.
What ends up happening is that you are sending less number of routes to your
routers because the RIB size is smaller and it helps with internal routing
convergence. So the memory cost on the RIB side is not important. I should
have clarified that.
The second design is slightly more invasive. And it offloads the task of
maintaining this complete routing table on to machines that have off the data
path and it's machines that don't forward data packets. As it turns out, ISPs
already use such machines for scaleable internal distribution of BGP routes,
something called BGP route reflectors. So without going into any design details
of how route reflectors work, in this design the external router instead of
advertising routes to a blue edge router, it has a peering to a PoP route reflector,
and it is the route reflector that selectively forwards these advertisements to the
ISP service. So the blue router gets only the blue prefixes, the green router gets
only the green prefixes. And these routes are propagated to route reflectors and
other PoPs and so on. Question?
>>: Since you have route reflectors why don't you run another routing algorithm
with the third level of [inaudible] BGP [inaudible]?
>> Hitesh Ballani: So a lot of people have proposed things like RCP and
forwarding value, separate this logic from the actual data plane routers, and you
can take advantage. If you notice that the route reflector here is similar to a
decision element in 4D and IRSCP and you notice that the solution I come up
with, for instance, RCP was focussing on routing convergence, 4D was focussing
on network management. I'll come up with a solution that is pretty similar and
compatible with those models so you can use ViAggre in the context of those
models to achieve reduction routing table size. So I'll answer the question once I
explain the process and see, I'll explain why ViAggre fits into the RSV model and
what would happen if you had the flexibility of routing, changing routing protocols.
Question? Yes.
>>: Do [inaudible] how many IBG congigurations do [inaudible].
>> Hitesh Ballani: How many IBG configurations do these route reflectors have?
>>: Yes, how many IBG configurations inside the [inaudible].
>> Hitesh Ballani: Oh, how many IGB configurations inside ASs don't use route
reflectors? As far as I know, and maybe people here might be able to, as far as I
know most tier one ISPs and tier two ISPs use route reflector based
configurations and so we should be save in that aspect. Okay.
>>: So are you saying the route reflector does need [inaudible].
>> Hitesh Ballani: DRAM because ->>: [inaudible].
>> Hitesh Ballani: Yeah, because it's not on the forwarding path. Yes. Yeah,
the only point was that it's not on the forwarding path, so it didn't need fast
memory, it only needs slow memory. Question?
>>: [inaudible] ISPs that you use reflectors are not running route reflectors like
traditional routers that are in the forwarding path?
>> Hitesh Ballani: I know at least a couple of ISPs that are not -- don't have the
route reflectors on their forwarding path. I also know of an ISP where the route
reflectors is on the forwarding path. So, yes, that's not completely compatible
with our model.
>>: [inaudible].
>> Hitesh Ballani: Okay. But that being said, this could be actual PCs. So you
could shift away from these machines being on the data path. So, yes, there
might be slight changes for some ISPs. Okay. And I guess that the
disadvantage here because of slightly more invasive because it has the use of
route reflectors, you need to reconfigure your external peerings which is
achieved because I wanted to be completely transparent to neighboring ISPs.
On the other hand, the advantage here is that apart from shrinking the FIB, you
shrink the RIB on all data plane routers. And again the model length there is not
memory but routing convergence which present evaluations [inaudible].
>>: Could you back up? I'm missing something.
>> Hitesh Ballani: Sure.
>>: Fundamental here. So you divide the Internet up into subset of colors. You
stick the routes for one particular color in a router.
>> Hitesh Ballani: Yes.
>>: Okay. Now it's external facing. It gets a packet that's in some other section.
What does it do with it in.
>> Hitesh Ballani: Excellent question, which is my next slide. I'll explain that. So
I'll explain. Because that's the thing I wanted to answer. Now routers only have
partial information and how a packet's going to flow.
In this case, packets come into a prefix in the red part of the address space to a
blue end router and it doesn't know what to do with that. These packets are
routed from ingress router I to aggregation point A, which is segment 1 and then
from aggregation point A to external router X which is segment 2. And I'll explain
how both these segments work.
When the packets come in, the blue ingress router doesn't have a route to the
destination prefix and so it somehow needs to know that these packets need to
be sent to a close by aggregation point which in this case happens to be router
A. To achieve this in ViAggre, routers advertise the virtual prefix they are
aggregating into the ISP's internal routing. So to red routers advertise the red
virtual prefix. And if you look at the following table of the blue router, it contains
routes for all the blue prefixes and it contains one entry for every virtual prefix
that is not aggregated. So this is how the red route -- the blue router knows that
these packets need to be send to router A.
Once the packets get there router A has a route to their destination, and so it
knows that packets need to be forwarded to external router X. However, the
packets can't be forwarded in the normal hub-by-hub fashion because the routers
along the path, the blue, the green, and orange router don't have a route to the
destination prefix, so they'll probably end up sending the packets back to router
A, resulting in a routing loop.
Toward this router A tunnels the packets and if it encapsulates the packets in an
extra header such that intermediate routers only need to forward packets
destined to external router X, which they can do nominally. However, you can't
tunnel the packets directly to the external router because that would require
cooperation from the neighboring ISP. Because the external router X is going to
get these packets with an extra header and they won't know what to do with
them. Hence it is the egress router E that deters the packets. Now, as it strips
off the opportunity header before forwarding them on to the external router X.
And all this can be achieved with standard router configuration.
As a matter of fact, those of you doing BPM research might realize that behavior
of the egress router here is similar to the [inaudible] in an MPLS VPN scenario.
And the important point to take over here is that we figured out away to achieve
all this with standard router configuration today. Question?
>>: [inaudible] the header includes in the next PoP so that when he strips it -when he strips off the header and knows that it's ->> Hitesh Ballani: This is the next talk, and that's the trick. That's why E doesn't
need to have the entire routing table even though it's orange.
>>: [inaudible] operation on the line card?
>> Hitesh Ballani: Yes, it's [inaudible] on the line card.
>>: It's not an exception case that goes to general [inaudible].
>> Hitesh Ballani: No. That's what saves us. What saves us is the fact that
tunnels have been adopted in mainstream networks for things like traffic
engineering, and so most routers are equipped with line cards that can do
tunnelling at line card rates. And that's why the computational [inaudible] is not
high and ->>: This is [inaudible] tunnel plus this extra bit of [inaudible] like it's [inaudible].
>> Hitesh Ballani: So as it turns out existing MPLS tunnels have that sort of
technology built in. So I'm just using standard MPS technology wherein the label
that we use to identify a tunnel inverts this information off what the next is and so
this is no exception case, it's all on the data path, fast path. Okay? Question?
>>: So basically you change the -- although you cannot reduce the routing table
size you change the network topology [inaudible] to evidence prefix, right?
>> Hitesh Ballani: Sure.
>>: So how does it deal with [inaudible] inside of the ISP?
>> Hitesh Ballani: Excellent question. You are two slide ahead of me and I'll
address that point in a second. I'll get to that. Okay. Question.
>>: [inaudible].
>> Hitesh Ballani: Excellent question. So tunnels represent two kind of
overhead. There is an overhead in terms of computational cost which is
suspension and there is a storage overhead. As it turns out, and this is again a
copout because ISPs use tunnels existing in -- for existing purposes. It turns out
that layer two technologies and [inaudible] technologies has been designed in
such a fashion that you don't have MQ issues. For instance, MPS can work as a
layer two technology, and even [inaudible], the MPUs have been designated in
such a fashion that there is no IP fragmentation. If there was IP fragmentation,
this would not work. Okay. Yes?
>>: An alternative would be to beef up the memory in some of the [inaudible] so
you have to tunnel to the red one to tunnel to ones that know everything and
have [inaudible] memory [inaudible].
>> Hitesh Ballani: Yes. So there is an entire design space wherein this is -what I'm going to explain is the assumption that you don't want a lot of SRAM on
any router. There are cases where you won't have, oh, this router is just a new
router and so he can keep all the memory and then you have these end routers
that lie on that specific router and we have explored that in a little bit detail with
engineers [inaudible]. I'll comment on that very briefly at the end of the talk, and
if you're interested we can talk about that offline. Okay?
>>: Is there a slight bit more [inaudible] might have a path from I to E or I to X,
but you might not have it back [inaudible] goes through one of the red routers.
>> Hitesh Ballani: Excellent question. Which is my next slide. So this basic
design leads to a couple of design concerns, the first of which being failover.
What happens when router A fails because there's a part that exists from
external router to external router? If you remember, the blue router receives two
routes to the red virtual prefix, one from A, another from A 2. Hence when router
A fails, it installs the alternate route into its forwarding table. And the packets are
rerouted automatically.
So the point here is that failover in ViAggre happens automatically using existing
mechanisms and the ISP doesn't need to do anything fancy other than to ensure
that there is some aggregation point to failover on to. And so you have this
management overhead wherein you want to make sure that in order to get the
same amount of quality robustness, you pick and choose your aggregation points
properly, for instance one way to do it would be to ensure that every PoP has a
couple of aggregation points. And so the robustness doesn't suffer much.
And the [inaudible] don't need to do anything fancy, other than to be smart about
how you place the aggregation points.
>>: [inaudible].
>> Hitesh Ballani: So what happens here is that I'm zooming a mesh of internal
peerings. This guy already had two routes to the red virtual prefix in the RIB, and
so essentially what he needs to do is send stuff from the RIB into the FIB which
in our measurements is sub millisecond times. As ->>: [inaudible] that says A is [inaudible].
>> Hitesh Ballani: So what happens is assuming we have a mesh of peerings,
your TC -- your BGP peering session goes down. So if you are using aggressive
time-outs there, you get that instantly and you fall back onto the other option
[inaudible] FIB. And our measurements have shown that this is on actual
hardware routers that if you can configure that properly or engineer that properly,
that comes out to be in the sub millisecond range. Essentially detecting that your
peering has gone down --
>>: [inaudible].
>> Hitesh Ballani: Yes. So one [inaudible] while [inaudible]. So it turns out
[inaudible]. Question?
>>: One way to look at this is you're taking what was one router and distributing
it into four other routers.
>> Hitesh Ballani: Yes. It's essentially, yes.
>>: Does the cost go up, but do you not have to buy four more routers, four
times as many routers as you have in the past? I mean, what extra load does
this put on the internal routing structure?
>> Hitesh Ballani: Excellent question which is my next slide. So as you
mentioned, and it's ViAggre requires traffic to be routed through an aggregation
point which as people mentioned can impose stretch on traffic, it can increase
load across the ISPs routers and across the ISP's link. And I guess I don't -- I'm
not sure as I say this at some point in the slide, but the stretch answer is the fact
that most ISPs are designed in the form of PoPs, where you have a few routers
in each PoP and you want to make sure you divide the routing table within each
PoP, so you're not going from New York to Chicago. The lower answer is slightly
tricky. As it turns out traffic on the Internet follows a power-law distribution. That
is 95 percent of the traffic is destined to 5 percent of the prefixes. Hence, we
proceeds that these popular prefixes should be load into the following table of
heavy router. This will ensure that a majority of traffic follows direct paths, a
small fraction of traffic takes this detower and so it substantially reduces the
impact of ViAggre on the ISP's network. And this is what it makes ViAggre a
good trade-off, the substantial reduction in FIB size while having a minimal
increase in load and stretch. And that's what the evaluation results are going to
show.
So unless there are questions at this point of time, I'll move on to evaluation.
Yes?
>>: So are those traffic to [inaudible] that doesn't mean that those traffic are not
important for example some traffic might be [inaudible] but they are not very
much latency-sensitive but some traffic if you're going into this [inaudible].
>> Hitesh Ballani: That is very true. My argument that, well, for most of the
traffic things work well and for small fraction of traffic things couldn't go bad, if
things can go really bad in some application specific scenarios, this would not be
acceptable to the ISPs, it might break SLAs, and you notice that I'll come to that
in four slides from now. I'll address that issue in terms of how you want to assign
the aggregation points. Because you want to be smart about that. Yes?
>>: [inaudible] when you said 95 percent, you mean 95 percent of the bytes, 95
percent of the what?
>> Hitesh Ballani: 95 percent of bytes.
>>: Isn't -- isn't bytes [inaudible] by BitTorrent?
>> Hitesh Ballani: No, it used to be the case. It's not now. If you look at
[inaudible] BitTorrent doesn't ->>: [inaudible] for four years. Could you back to one slide?
>> Hitesh Ballani: Yes. This is actually 10 years. Jennifer Reckford's [phonetic]
paper in the late '90s even now what it [inaudible].
>>: [inaudible].
>> Hitesh Ballani: So what happens is that for instance I'm AT&T. I have two
prefixes that belong to me that are my customers and so a lot of traffic is going to
them. There are three, four prefixes that belong to AOL and Comcast that are
getting services from me. And their traffic is again AT&T local. So you have
these four prefixes that are carrying about 40 percent of the traffic. What turns
out is the rest of the traffic is essentially going to Google and Microsoft data
centers that are again connected to AT&T. So it turns out any traffic coming from
AOL, Comcast, and AT&T customers going to these data centers are
peer-to-peer applications is AT&T local, which again belongs to these 10, 15 or
50 or 100 prefixes which turn out to the popular.
Actually I was very confused about that point, too. When I came up and did this
evaluation based on network records, I was like this doesn't make sense. And I
have some insight into why that works. And I'll be happy to explain that to you
offline. Okay?
So evaluation results. And I'll address that question of what happens for really
bad cases. So first up evaluation metrics. Which an ISP is choosing to deploy
ViAggre it is looking to shrink the FIB on its service. On the other hand, the use
of ViAggre imposes stretch on traffic and it increases load across the ISP
service. So there are positives and there are negatives. Further, ViAggre
employs the ISP with a number of deployment options and I listed three of these
here. So the theme of the evaluation is going to be how can the ISP use these
deployment options to tune the positives and the negatives? And the main result
that I'm going to try and show is that the positives far outweigh the negatives,
which makes this a good trade-off.
In the interest of time in this talk I'm going to focus on the latter two deployment
options. Now, if you think about it the choice of which routers aggregate a given
virtual prefix is an important one because the more aggregation points you have
the less stretch you'll impose on traffic. On the other hand, the more aggregation
points you have, the more cumulative FIB space you'll end up using. So there's
a trade-off between FIB size and stretch. And the ISP can use its choice of
aggregation points to tune this trade-off.
In our work, we consider a simple constrained based optimization problem
wherein the ISP is trying to minimize the worst-case FIB size across all its routers
while constraining the worst-case stretch. Now, this is simply constrained. But
I'm sensing that it's [inaudible] and it answers your question because I want to
make sure that my existing [inaudible] are not bridged and my latency-sensored
traffic is not completely hosed. So I could say well, even in the worst case
scenario my stretch should not be more than four milliseconds or five
milliseconds and that's the kind of Xs I've got on ISP can do, and you can get
more sophisticated.
On the other hand, the worst case FIB size is important because that's what I
need to provision for. As it turns out, the simple constraint problem can be
mapped to the multi-commodity facility location problem, which turns out to be
NP-hard and has been studied quite a bit at the theory community. As a matter
of fact there was a SODA paper in 2004 that proposed an approximation
algorithm per logarithmic bounds for the problem.
For our purpose we implemented a simple greedy approximation algorithm and
we applied this data from an actual tier-1 ISP on the Internet. So it took the ISP's
topology, their routing tables, their traffic matrix, and we assumed that the ISP
wanted to deploy widely and was using our tool to determine an allocation of
aggregation points, the results for which are plotted on the graph here.
So X axis is constrained on worst case stretch, Y axis is FIB size. As you relax
the constrained, the worst-case FIB size drops. And with the constraint of about
four milliseconds you get a worst case FIB size across all the routers of from
10,000 prefix, which is four percent of the global routing table. On the same
graph on the right hand side Y axis, I have plotted the actual stretch. And there
are two points to note here. The worst-case stretch, which is the dark blue line,
is always less than the constraint. And we will finally check for the algorithm.
While the average case stretch is pretty much negligible throughout, .2
milliseconds. So FIB size reduces, stretch increases, stretch is pretty much
negligible throughout.
Another way to look at this reduction in FIB size is to look at the extension in
lifetime of routers due to the use of ViAggre.
>>: I have a question.
>> Hitesh Ballani: Sure.
>>: So what you're saying is that that blue line there's this imaginary X equals Y
line that it lies under?
>> Hitesh Ballani: Yes.
>>: So that's the measured worst-case stretch as opposed to the constraint that
you saw before?
>> Hitesh Ballani: Yes.
>>: Okay.
>> Hitesh Ballani: So that's the measure of the stretch and obviously you would
want to make sure if your algorithm is working right it's below that diagonal line.
>>: Right.
>> Hitesh Ballani: Yes. Question?
>>: So the FIB size bottoms out around four percent, what's the integration for
why -- what was the ->> Hitesh Ballani: The [inaudible] there is that we are limited by the smallest
PoP that this ISP had. So if you -- if this guy had a PoP with five routers and
limited to wiring the routing table amongst those routers and then relying on
some close by a PoPs, but my stretch constraint ensure that I can't rely on some
random PoP far away.
>>: [inaudible] about five reflects the -- your choice of number of colors in your
[inaudible].
>> Hitesh Ballani: Yes.
>>: So that was fixed?
>> Hitesh Ballani: Yes. It was fixed.
>>: Wait. Now I didn't understand. I didn't think that the number of colors was
fixed, I thought you were saying that the stretch constraint forged to the number
of colors was fixed because it was of the -- because you didn't have more colors
than that in one PoP?
>> Hitesh Ballani: No, what I said was that in this exercise I started out with a
certain number of colors, let's say 200 colors, and then I said ->>: [inaudible].
>> Hitesh Ballani: Yes. A decent number of colors. So that I get an even
distribution of prefixes across these various colors.
>>: Okay.
>> Hitesh Ballani: That's why you're decent -- that's why I needed a decent size,
not four or eight or something like that. All right?
And another way to look at this reduction in FIB size is to look at the extension
and lifetime of routers due to the use of ViAggre. And we conducted a study to
defect, and the highlight of that study was the fact that ViAggre can be used to
extend the lifetime of already outdated routers by 7 to 7 years, while imposing no
stretch on the ISP's traffic, which I'm sure many ISPs would be very excited
about.
Now, [inaudible] from ViAggre requires traffic to be routed through an
aggregation point, which as I mentioned, can impose load on the ISPs ->>: I don't [inaudible].
>>: So this is a zero stretch.
>> Hitesh Ballani: I should have clarified zero stretch means that if you're taking
a hop within the PoP that is zero.
>>: Okay. Within a PoP.
>> Hitesh Ballani: Within a PoP. And as it turns ->>: So the number of routers ->> Hitesh Ballani: Number of routers. So I should have clarified that.
>>: So [inaudible].
>> Hitesh Ballani: It's no magic.
>>: [inaudible] exactly the same routers you had before?
>> Hitesh Ballani: Yes. Sure.
>>: All right.
>> Hitesh Ballani: The only problem is that well, you can still take a hop within
the PoP, but the load is a problem. Which is ->>: [inaudible].
>> Hitesh Ballani: Yeah. But it runs off stretch they're okay in terms of load
they're not okay.
>>: Right.
>> Hitesh Ballani: And this is where the popular prefixes come from. Question?
>>: So in 10 years assuming that the growth rate continues [inaudible].
>> Hitesh Ballani: So there I used two models of FIB growth rate. One was an
exponential model proposed by Jeff Hustin [phonetic], one was a quadratic
model proposed in some IETF document, a quadratic based model. So, yes,
people based on past growth have assumed extra [inaudible] future code, and I
took two models and based on that, I got this range. And of course if IPv6
deployment takes off or IPv4 the aggregation takes, the growth might be more, in
which case this number would reduce a little bit, yes. Okay?
So the load problem, and this is where proper prefixes comes in, we perform a
pretty long term and comprehensive study to determine the fraction of traffic
carried by different paucity of prefixes. And the highlight of that result was the
fact that a small fraction of prefixes carry a vast majority of traffic. This is what
fast results are shown, this is what we found. Hence, we proposed that these
popular prefixes should be loaded into the following table of every router. Given
this use of popular proceed fixes, we conducted a load analysis to determine the
increase in load across the ISP servers. So X axis [inaudible] popular prefixes, Y
axis increase core types for increasing load across the axis routers. As we
increase the popular prefixes, load drops sharply. And with around 5 percent
popular prefixes, we get a maximum load increase of 1.38 percent. Which
should be pretty acceptable.
So hopefully this quick set of results convinces you that ViAggre can be used by
ISPs to extend the lifetime of their routers while imposing negligible traffic stretch
and also no increase in load across the routers. Beyond this, ViAggre has a
number of other advantages, offense ISPs don't need to bind to the ViAggre
model completely. They can play around when their done on limited scale so
that they get comfortable. And there are several other advantages. Question?
>>: So [inaudible] selection of aggregation only consider the stretch, have they
considered the load on the [inaudible].
>> Hitesh Ballani: So the simple optimization that I presented in this case, the
idea that was put a constraint on the stretch, measure the load, turns out that
load is pretty small. Beyond that, obviously for an ISP to deploy this, you
wouldn't want to constrain the stretch and load. I have a mathematical
formulation of these constraints that I fed into an ILP [inaudible], ILOG. So
essentially I have a tool that takes these constraints and generates a deployment
model that would satisfy those constraints. But the results in this talk are based
on the simple constraint model.
>>: If -- so the model you've done is with no failures of the routers. If you start
thinking -- well, maybe -- I say this without knowing.
>> Hitesh Ballani: So what I'll [inaudible] failure model, failure model I wanted to
ensure was that you are placing a color in a given PoP you want to make sure
that there are two routers for that color.
>>: In that PoP?
>> Hitesh Ballani: In that PoP.
>>: Okay.
>> Hitesh Ballani: Because well if that fails you can land something on the same
PoP, otherwise you go to some --
>>: [inaudible] property then you won't effect stretch with any single failure?
>> Hitesh Ballani: Yes.
>>: Right?
>> Hitesh Ballani: Yes.
>>: All right.
>> Hitesh Ballani: That being said, you could come up with a more complex
constraints where you want to constrain the stretch even in the face of failure.
>>: [inaudible] going to be if you took a failure would you have much worse long
haul.
>> Hitesh Ballani: Question.
>>: So I'll be impressed if you have a slide for this one. [laughter]. You don't do
this work in [inaudible] service attacks, and it looks like if I were a [inaudible]
service attacker, I could go after the layout of the routing and make sure that load
ended up just where I wanted and make sure that things get cropped.
>> Hitesh Ballani: That's an excellent question. I don't have a slide for this.
Sorry. Essentially as an attacker, what you want to do -- and I will answer this
very briefly and we can have a discussion later, if you -- you could send traffic to
popular prefixes. That being said, ViAggre, when you have a packet destined to
a prefix which is not popular, you're not relying on the control plane. So it's not
as if you're taking a cache head and getting something from the control plane,
you're essentially forwarding it to some other router.
Packets always stay on the data path. So that would ensure that no matter -well, not no matter, but for decent size attack traffic, you should be -- you should
be fine. Obviously that can be accounted in how you deploy ViAggre. And I
have some very preliminary results on that that I'll be happy to share with you.
Okay. Not in this talk.
So evaluation was fine, but there's a question that -- question?
>>: [inaudible] followed by a constraint but [inaudible]. So here's one question.
[laughter].
>>: Research [inaudible] [laughter].
>>: So here is a question. There's one thing that sort of -- maybe it's not
[inaudible] but it's the [inaudible] and there are had already so many papers
talking about routing [inaudible] how hard it is to keep them [inaudible].
>>: Yes, how do you deploy this?
>> Hitesh Ballani: Excellent question. [laughter]. Which is my next slide. So we
wanted a deployable system. And we went out and spoke to ISPs and operators,
be and there were two main concerns. First was, well, you're using all these
control plane hacks, and what happens with installation time and convergence
time and all those data metrics.
The second and perhaps more important concern was the management
overhead. I went out and spoke to operators and I was happy because I found a
solution that was going to save the world, but they were concerned about the
operational costs of this extra configuration. And having done some network
management I really appreciate that concern. Pretty much now, to answer these
questions, we went out and deployed our system on a test bed of actual
hardware routers at the WAIL lab in Wisconsin. And the figure I show very
simple topology although we experimented with all kinds of different topologies.
In the figure we have an ISP that has deployed ViAggre and it's changing routes
with two neighboring ISPs, AS2 and AS3.
We configured these routers to propagate routes using three different
mechanisms and I'll briefly recap them for you. First is status quo, which is what
happens today. External router advertises routes to the edge router, edge router
forwards these using a mesh of internal peerings.
Second, we have design one, wherein the internal routing setup remains exactly
the same, except that routers use FIB suppression to only load the relevant
routes into their FIB. We achieved this FIB suppression using something called
access list, which is a standard route 3 mechanism available on all routers and
the only thing that you need to know about access lists from this talk's point of
view is that they can be massive due to the use of popular prefixes. Because if
you have a thousand popular prefixes, the access lists need to enumerate them,
to tell the router that these need to go into the following table.
And finally we have design two wherein internal router advertise the routes to a
route reflector, and these are selectively forward.
We achieve this selective forwarding using something called prefix lists, which is
again a standard [inaudible] mechanism available on all routers and these can be
massive due to the use of popular prefixes.
Now, we conducted a whole slew of experiments, and the one experiment that
I'm going to focus on in this talk, what happened there was you peer down and
you reestablish the peering between the external router and the edge router, and
you measure the amount of time it takes for the route to be advertised, installed
into the edge routers FIB and then forwarded on to it, something that we call
installation time.
The experiment had two key parameters. First is the number of routes being
advertised by the external router, which in some sense representing the routing
overhead, and the number of popular prefixes which represents the size of the
access list or the prefix list that you'll be using in the ViAggre employment. And
->>: [inaudible].
>> Hitesh Ballani: Yes. So these results are based on Cisco 7300s. I think we
have experimented with Juniper routers, too, M20s, I think, but the results that
I'm going to show in this talk are based on Cisco router results. Okay? Which is
shown here.
So design 1 imposes substantial overhead in terms of installation time. And this
increases dramatically as we increase the fraction of popular prefixes. This is
because as it turns out, routers today just aren't designed to deal with massive
access list. The overhead is too high, which is what I was alluding to at the
beginning of the talk. So this is not a practical implementation, and we still are
working on it.
However, design two actually reduces installation time. This is because in
design two the route reflector only needs to forward a subset of the routing table
to the ISP's routers. As a contrast with status quo, the edge router needs to
advertise the entire routing table to ISP's routers, and that's where that
advantage is coming from.
Further, the installation time doesn't increase much as we increase the fraction of
popular prefixes. So this is great news because this shows that not only are
routers designed to [inaudible] the massive prefix lists, but reducing the following
table can also help with internal routing convergence. So we started by
focussing on FIB size, and it seems we can help with both RIB size on data plane
routers and internal routing convergence, which is one of the advantage of
reducing the RIB size. Question?
>>: I don't have a good feeling for how many prefixes are often advertised within
networks on the X axis.
>> Hitesh Ballani: So on the X axis we are here. So today the Internet routing
table is around 280,000, 300,000 prefixes. So we are here and we've done stuff
here. And it's done sort of recent.
Next is the management -- question?
>>: [inaudible] the router [inaudible] fails.
>> Hitesh Ballani: What happens if the route reflector fails? You need to be very
careful about how do you the route reflector deployment because all the external
routers peer with them. But that is not any qualitatively different than what
happens today because today ISPs use redundancy to ensure that route reflector
failure does not lead to route propagation failure, which is the same for ViAggre.
Question?
>>: [inaudible].
>> Hitesh Ballani: Why the [inaudible].
>>: [inaudible].
>>: [brief talking over].
>>: [inaudible] the opposite. Why is it so [inaudible]?
>>: [brief talking over].
>>: [inaudible] linear scale, it would look like that.
>> Hitesh Ballani: This curve?
>>: Yes. [inaudible].
>> Hitesh Ballani: This curve would increase -- I don't have an answer to that
question. I'll have to think about that. I guess it could be a function of as you're
increasing the number of routes, you're sending more data in, and things are
getting congested. I don't have an answer to that question. I'll think about that
and I'll get back to you in a second, okay, after the talk.
The other question was the management overhead. Now, to address this
concern, we went out to rep with our deployment that left a management tool that
can help with the ViAggre configuration. It's a simple tool that takes an ISP's
existing configuration files and the net flow records for the traffic statistics and it
generates configuration files that are ViAggre compliant. So effectively you have
an automated means to go from a status quo network to a ViAggre compliant
network without any manual intervention. Of course this [inaudible] was specific
to the Cisco 7300s we were using and the imperialistic knowledge we were
using, but the simplicity suggests that the configuration problem might not be in
some [inaudible].
That being said, it's an excellent question because as we chose a
configuration-only part from ViAggre because we thought it would lead to easier
deployment which is not necessarily true. And another way to approach the
problem would be to assume router vendor support and to build all these
primitives directly into the router, which is what we are working on with engineers
at Huawei. So this would reduce the configuration on the ISPs and it would
make them more comfortable because they would have router vendor support.
So we have a couple of IETF drafts on this, but would I say this is more of a work
in progress.
So with that, I'm going to conclude the ViAggre component of the talk. ViAggre
or [inaudible] an aggregation is a configuration-only approach to shrinking the
FIB on ISP routers.
ISPs today can use ViAggre to extend the lifetime of their installed outer base.
Of course ISPs may need to upgrade the routers for other reasons but at least
their hand is not going to be forced by factors beyond their control, mainly the
growth in the Internet routing stable. Further, I don't think of ViAggre as a be all
and end all solution to the routing scalability problem. And so I think of it as a
simple yet effective technique to sort of hold the fort until a more clean-slate
solution can come along and save the day. So that was ViAggre.
And so I have a few minutes left, and so I would like to talk briefly about future
direction. Question?
>>: [inaudible] this argument, you just said that you know, this is a first step
solution and a clean-slate solution. Isn't the fact that the lack of clean-slate
solutions coming in is because we [inaudible] isn't it kind of [inaudible].
>> Hitesh Ballani: That's an excellent question. So you -- I [inaudible].
>> Hitesh Ballani: Yes. Yes. If I were arguing for dirty-slate solutions where you
come up with these dirty solutions and place band-aids on the network
architecture, you'll never get to the clean-slate solution. And so as researchers,
we have to be very smart about how you design the dirty-slate solutions so that
there is a clear progress path.
In the context of ViAggre, the idea there is you convince ISPs to use this. They'll
run into trouble, they'll management overhead, all those things, you will get these
changes implemented into routers. Once you have things implemented into
routers which is what the second step, and that's where we are currently
pursuing, you can actually move to inter-domain ViAggre. Because if I'm an ISP,
I have a routing table size problem, I can turn the switch off into my routers
without any management overhead and get these advantages. So now you have
various ISPs that have deployed ViAggre.
And beyond that point, there would be incentives for them to cooperate to reduce
the FIB size even more and to ensure that end-to-end latency does not suffer.
So there is a clear progress path at least for ViAggre of why this is not a bandage
solution but this would actually encourage our moment towards a clean-slate
solution, which I think I'm very proud of. Okay? Question? Yes.
>>: How [inaudible] are FIB caching [inaudible].
>> Hitesh Ballani: FIB caches are -- went out of play ten years ago. And there
were reasons for that. ISP -- ISPs weren't happy because when there was a
cache miss, you'd have reduction in throughput and increase in loss. Vendors
weren't happy because they couldn't benchmark their product. And that's why
FIB caches went out of business.
That being said, it's a very insightful observation because what I've designed is
essentially a distributed caching system. So centralized caching wherein you -when you have a cache miss and you get stuff from the control plane doesn't
work 10 years ago and it won't work today. That being said, that's why you need
the distributed caching system, and that's what ViAggre is.
>>: So if you had routers today that didn't have FIB caching, didn't use the FIB
cache, it seems that it would do some of what you're proposing in that, you know,
for the [inaudible] of the most popular traffic you know is [inaudible] and FIB and
anything else you go out to the router and [inaudible] going to take a hit, but
maybe that hit would be smaller than a routing stretch hit.
>> Hitesh Ballani: That is actually what we are doing with Huawei. You actually
do these popular prefixes calculation online. You populate the FIB without any
management overhead. If you do have to take a hit, you go on to some other
line coding on some other router. And so that the sort of the kind of things that
we are exploring with ->>: So is that pretty much exact the same as the old FIB cache stuff, or is there
some new [inaudible].
>> Hitesh Ballani: There's some new stuff there, because old FIB cache stuff
had to rely completely on the control plane. In fetching data from the route
processor, you don't want to do that, because of reduction in throughput. You
either want to rely on some other line column, the same router or some of the
router in the same PoP. So there are some differences there.
But the dynamic calculation of popular prefixes what you just mentioned is
something that we are implementing in Huawei and others. Albert?
>>: Yes, I was just thinking about the economics of the [inaudible] ISPs
[inaudible] upgrading line cards but their they wouldn't care about upgrade
routers that they do and they don't even have to update all of them and you
[inaudible] so again this brings me back to my question. Why don't you buy a
small number of code routers that know everything? Why do you need a partition
by ->> Hitesh Ballani: That's a good observation. So, yes, there's one deployment
model wherein you can buy some deployment core routers that have a lot of
memory and you rely on them and that leads to a simpler deployment model.
And we have explored that a little bit. That being said, you brought up an
interesting point when you said that, well, you need to upgrade routers every
three, four years anyway because of data raids.
>>: Core routers.
>> Hitesh Ballani: Core routers of data raids, and so why do you need to bother?
Because the memory sort of -- memory upgrades come along with the upgrade
of the actual router. Right.
>>: Compared to the cost of the links for most ISPs.
>> Hitesh Ballani: Yes.
>>: It's [inaudible] a big deal.
>> Hitesh Ballani: Yes. As it turns out, well, this thing -- there's a separation
between what happens for medium ISPs and large ISPs, medium ISPs being tier
to tier 3s. For medium ISPs, the concern is that they need medium links and
medium routers but they need big routing tables. As a [inaudible] bigger ISP
need big links and big routers and big routing tables. So there's a mismatch
there. That's where ViAggre helps.
That being said, on the side of tier 1 ISPs or big ISPs, as far as my
understanding goes and please help me here, five, 10 years ago, every few
years you change a router because you need new data raids. Data raids went
up and you need to carry more traffic. Now, router line cards have reached
lambda speeds, which in turn implies that most of the upgrades will come
because of memory.
And so it is going to hurt the bottom line of ISPs. Is it going to be an influential
factor for [inaudible] deployment? I don't know. But we can take that
conversation offline. Question?
>>: I have a specific question. You were talking about [inaudible] that's used to
set upper your [inaudible].
>> Hitesh Ballani: Yes.
>>: Do you have an [inaudible] of how many [inaudible] you need to set up?
Because it almost feels like for each ->> Hitesh Ballani: That's an excellent question. I never figured that somebody
would get to that in detail.
>>: Actually, I'll tell you why I care about it. Because you were talking about
routers 7 to 10 years old and hadn't talked [inaudible] like that they have limits on
how many [inaudible] ends can sit on routers, especially the older routers don't
support very many of them.
>> Hitesh Ballani: That is very true. And [inaudible] which I explained in this talk,
you'd have tunnels equal to the number of directly connected external routers
that you have in the deployment model that I showed. That being said, you can
do an optimization wherein you only advertise tunnels for every edge router,
which reduces the number of MPLS tunnels that you need to hold by an order of
magnitude. So we're talking about 2,000, 1,000, 1,000 tunnels.
The number of routers that you have in the ISP's network, which can vary from
200 to 1,000, to 1,500, and as far as I know at least the three, four year old Cisco
7300's viewer working with, they can't support that many number of MPLS ISPs.
So there is an optimization that the goes from reducing the number of tunnels to
10,000, which is the number of directly connected routers to the ISP to the
number of edge routers, which is a direct function of what your topology looks
like. Okay? That is a good question.
Well, I wanted to speak about future directions. I don't know how things work.
Do I have five minutes? Okay. I'll try to keep this very brief. Throughout my
graduate career I focused on lower level networking problems. This has allowed
me to delve deeply into one area so as to have most impact. I think I've been
relatively successful at this. And it has led to a number of new ideas that I plan
to pursue.
Now, while there is a decent amount of work to be done in core Internet
research, I do believe that the field's best days are behind it. And I say that with
some amount of sadness. In this context, I think my past work is useful for two
reasons. First, it taught me that good research can comprise of clean-slate and
dirty-slate components. Second, be it equipped me with the requisite tools to
diversify my research beyond these core interests. And to give you a quick
example, I in one of my current projects am building a system that facilitates the
transfer of delay-tolerant bulk data. So this would be users downloading movies
and software but they don't necessarily need it immediately. And my insight here
is to take advantage of this delayed tolerance to reduce user cost and to improve
performance.
So this is an example of a project which is at the bottom line of traditional
networks and peer-to-peer systems and distributed systems. And so I'm already
moving up the stack. Beyond this my current pipeline of projects is a mixed bag,
and I'd be happy to speak to you about these if people are interested.
To put this all together, I want my research to be geared towards building
network systems, both traditional and emerging that are better along several
dimensions, including their scalability, their manageability, their reliability, and
their security. Further, I want my research to retain this dual theme of instant and
delayed gratification, and I'll explain what I mean by that. Given proper space, I
want my projects to try a specific roadmap, wherein the first part of the first step
is to find the largest subset of the problem space that can be tackled without
invasive [inaudible]. So this is the [inaudible] solution that can have things
today.
Next I would like to use insight from this solution to a more complete solution that
most likely will be invasive in nature, which is the long-term solution that can help
things eventually.
In the context of past work, I have done work on clean-slate and dirty-slate
projects, and the idea now is sort of combine them and have a unifying thread
going through them. And I'll give you a quick example. In a couple of my past
projects, I've used tunnels in several ways. For instance, in ViAggre, we used
tunnels to shrink the FIB on ISP routers. I think tunnelling is a very powerful
primitive. And I'm actually priced at how little attention tunnels have received in
the research community. Although that's slightly changing a little bit. For
instance, I noticed that Dave and Albert have done [inaudible] stuff where they
use tunnels for load balancing and scalability.
I think tunneling is a very powerful primitive. And I think that a properly designed
and a cleverly incentivized inter-domain tunnel architecture represents a very
good opportunity to have genuine impact on many longstanding networking
problems, like routing skate and traffic engineering and load balancing and
perhaps even network security.
So this is an example where the same notion can be used in the near term to
help things today and it can be used in the long term to get to event solution.
Note that I'm not claiming that these solutions will spur architectural change, all
I'm saying is that when the severity of the problem just fist the cost of change, my
solutions will have a better alignment of cost and benefits, and hence, I think a
better chance of deployment. And on that optimistic note, I think I'm going to
conclude my talk. Thank you all for your patience. Thank you for having me
here. And I'll be happy to answer questions.
>> Dave Maltz: Well, let's thank the speaker.
[applause].
>> Dave Maltz: Any more questions?
>>: So I think you have done a great job of actually showing how this solution
[inaudible] but if you look at [inaudible] do you think any of it has had impacts in
[inaudible].
>> Hitesh Ballani: In terms of the older work. So the IP ->>: [inaudible] examples where your previous work has been [inaudible].
>> Hitesh Ballani: Sure. Sure. That's a fair question. So in terms of the anycast
stuff, I think it had some impact because I go after deployment servers. These
are seven boxes that are actually advertising these prefixes. And people are
interested in growing that deployment. Because for IP Anycast, and this is
technical as I'm sure we can get this offline, for IP Anycast to actually to complete
with DNS based anycast, I need a substantial size deployment. I have seven
servers right now. And if I can get to 20, 25 servers, that would be a substantial
enough deployment where other services, other research projects might be able
to use my IP Anycast based servers instead of DNS based anycast for some
advantages.
>>: So when you say I have seven servers, you mean you have a start? I mean,
what would that mean you have seven servers?
>> Hitesh Ballani: I have seven servers in the sense I took one used sever in
[inaudible] boxes and shipped them to different ISPs or labs that were interested
in hosting my boxes. I went there installed my server, spoke to their upstream
ISP, installed a BGP peering, advertised my prefix and installed my software on
those machines and which is running right now.
So as a graduate student, I think it's not as kind of an impact I would like to have,
but given the persistence I think I'm pretty happy with the kind of things I've been
able to achieve. Albert?
>>: The ViAggre work is [laughter].
>> Hitesh Ballani: Sorry about that.
>>: It's a [inaudible] being pitched in a very impactive [inaudible] both have an
IETF effort and IRTG effort, there must be a dozen [inaudible]. So how are you
doing? I mean, how is it stacked up and what's the difference [inaudible]
proposal and [inaudible].
>> Hitesh Ballani: Sure. So there are a whole set of proposals that I mentioned
that are clean-slate in the sense they require changes to software and hardware.
And yes, they are amazing proposals because they tackle the entire problem
space. But they don't get enough traction.
There's a second set of protocols that do not require substantial change of
protocols but they do require cooperation from router vendors, for instance FIB
compression. The problem there is that Cisco doesn't want to cooperate
because it wants to send more routers, which I can understand, because they
have their economic incentives. And the reason ViAggre, and this is me being
optimistic or speculative, might just take off, and it's getting some traction is
because there is a form of ViAggre that does not require any cooperation. If
you're on ISP, you have a problem, you go out and deploy this.
Obviously there are some management concerns that we have a management
tool to address. And if you wanted to, there could be a startup where they have a
management solution and people might deploy ViAggre. And I think that is the
only reason why there is some hope of this actually getting somewhere some -beyond some odysseys it actually getting deployed in some networks which is
why I'm slightly pursuing that. Because it can be a lot of pain in terms of my time.
And the only reason I'm pursuing that is it's something I came up with, and I think
there is some hope that this can help network problems in real. Question?
>>: So I like the approach the way one of the slides was forwarded that says that
you want to -- you would be able to increase the lifetime from 7 to 10 years from
now. There's this battle I'm trying to understand whether you're trying to
engineer software beyond the projected hardware reliability guarantees and
could be provided for these routers anyway, is there area ->> Hitesh Ballani: No, there's no projected reliability guarantees. The RAM
works just fine. It's just that somebody else -- I'm an ISP, you are as a network
operator might be lazy and you're not doing your aggregation job and you are
advertising 10,000 prefixes, even though you're supposed to advertise 1,000
prefixes, and now I'm supposed to maintain all that in fast memory. And my
router vendor won't help me because he wants me to buy new routers. And so
as an ISP, I need some mechanism of being able to solve this problem which is
where ViAggre comes in.
So I don't think it runs into problems of reliability. Yes, if you were -- for instance,
most ISPs, when they upgrade the routers, it's because while data reg went up
and I have more customers which is fine because at least you're getting more
revenue. When you need to upgrade a router because of RAM constraints and
[inaudible] constraints it's not because you are getting more revenue. Although
I'd like to avoid the purely economic discussion because I'm not very good at
that. Question.
>>: This is a question [inaudible] and in a sense we're trying to get [inaudible] if
you will. So what you've done essentially is add an extra layer of hierarchy that
uses an amount of state at the load level [inaudible].
>> Hitesh Ballani: Yes.
>>: [inaudible].
>> Hitesh Ballani: Sure, sure, sure.
>>: [inaudible].
>> Hitesh Ballani: There's no unique insight. That's essentially the insight. I
came up with a distributed caching system or a level of interaction which tackles
a pressing problem and I figured out a way to achieve that without any changes
to hardware and software. That's the key point. It's yes, I have not come up with
I'm not going to get any big prize or something like that. It's more of a I came up
with a cool idea which there was a pressing problem for and I came up with a
way to achieve that without requiring any cooperation. Because as [inaudible]
researchers you have this constraint, and I'm sure you must have experienced
this when you have written your papers that you want to do cool, interesting work
but it's difficult to do cool and interesting work because there's all this legacy stuff
out there. And the fact that I was able to do this cool solution in the context of
legacy stuff is something that I really like.
>>: [inaudible] essentially making it work with the right infrastructure.
>> Hitesh Ballani: Making it work with the right infrastructure ensuring that it
provides good tradeoffs. Because it's in a simple level you have reduction FIB
size but load goes up and stretch goes up. And so you want to make sure that
you balance the right elements, and you want to make sure that the negatives
are tiny enough that ISPs might be interested to take those costs while the
positives are high enough that ISPs might actually be willing to take the trouble of
deploying these things.
>>: So are you [inaudible] but let me ask.
>> Hitesh Ballani: No, please, please.
>>: Are you only constraining yourself by the constraints that exist [inaudible].
>> Hitesh Ballani: No, I'm not only constraining ->>: So, so.
>> Hitesh Ballani: Yes. Please continue.
>>: [inaudible] you talk about that which is I want us to work with parameters that
are given to me, and I want to make sure that my solutions actually work within
the parameters and sort of [inaudible] clean slate and dirty slate and all this stuff.
Can you go back and tell me where you've actually done things [inaudible].
>> Hitesh Ballani: Yes. This is the stuff that I had done on network
management, I guess the slide is pretty close enough.
>>: [inaudible].
>> Hitesh Ballani: Yeah, okay. So the idea there was network management is
really hard. People here have really done a lot of work on network management.
And the idea there was that we came up with this -- the reason why we thought
network management was difficult and we came up with this explanation that it's
hard because today protocols and devices tend to expose too much information.
If you look at the management interface of the main depositories, which is the
management information base, it comprises of thousands of variables, and you
need to build management applications that are -- understand these variables.
>>: [inaudible].
>> Hitesh Ballani: Yes.
>>: [inaudible]. So the question is that do you feel gone from that, or do you feel
that that's just -- do you feel like you get enough out of it?
>> Hitesh Ballani: Yes. So there are pros and cons I guess is the positive side
and the negative side. From a research point of view, I had a lot of fun in
CONMan. From a deployment point of view, I had a lot of fun because I was
able to show that you take these protocols and you model them according to
CONMan abstraction and you get all these benefits. It was a lot of fun publishing
this [inaudible] fun on paper.
What was not fun was going to Cisco and talking to actual engineers and them
saying doesn't seem like we'll be interested in this in the next five years. So, yes,
you have to balance the tradeoffs. That's what I was saying. Yes, there are
cases where I would like to make sure that oh, I want this functionality to be
provided in systems of the Internet 10 years from now, or this problem to be
solved, I want to distill the architectural reasons why that might be the case, I
want to pursue where we'd like to be and come up with solutions like CONMan.
On the other hand, I would like to keep my engineer hat on and come up with
solutions that can help with things today, too. So I'm very happy with what
CONMan ended up with. So I didn't want to give the impression that, oh, some
work in the past that I'm not very proud of. I really enjoyed that a lot.
>> Dave Maltz: That's great.
>> Hitesh Ballani: Thank you. Thank you.
[applause]
Download