>> He Xiaodong: Hello. Welcome, everybody, come to... what's wonderful monthly lectures. And if you like, take...

advertisement
>> He Xiaodong: Hello. Welcome, everybody, come to our monthly what's hot,
what's wonderful monthly lectures. And if you like, take some pizza and come
back to enjoy the next hour.
I would like to introduce. It's really a pleasure. Great honor to introduce
Radia Perlman, Dr. Radia Perlman. And she is very distinguished. She's an
IEEE fellow and also Academy of Sciences member. And she's most famous for
her invention of spanning-tree protocol, which is fundamental to the
operation of network bridges.
So she also make large contribution to many other areas of network design,
standardization such as link-state protocols. And so she work -- her work
transform the Ethernet protocol from using a few codes over a limited
distance into something able to create large networks.
And she is author of one textbook on network, coauthor of one book on network
security. She holds more than 100-issued patents. Her award include
Internet Hall of Fame 2014. That's very prestigious. SIGCOMM Award; USENIX
Lifetime Achievement Award, 2006; recipient of the first Anita Borg Institute
Women of Vision Award or Innovation in 2005; Silicon Valley Intellectual
Property Law Association Inventor of the Year, 2003; Honorary Doctorate,
Royal Institute of Technology, 2000; so twice named as one of the 20 most
influential people in the industry by Data. So let's welcome Dr. Perlman.
[applause]
>> Radia Perlman: So thank you and feel free to ask questions during or
whatever. We'll be kind of informal. The real point of this lecture is not
for me to tell you stuff but get you to question stuff. So not everything
that you hear or read, including from me, is necessarily true, and sort of
really wake you up to that point.
So, yeah, especially in network protocols, a lot of what everybody knows is
actually false. So this field is just so confusing. So there's things that
everybody knows. If you ask a networking expert why do we have both IP and
Ethernet, they confidently tell you because IP is layer 3 and Ethernet is
layer 2. And I will be explaining why that's total nonsense because Ethernet
is not layer 2, it's layer 3. And the more subtle question is why do we have
these two layer 3 protocols.
Another one is Ethernet is CSMA/CD. I'll explain what that thing is. And it
was at one point. It isn't anymore. So all these papers about what Ethernet
is is just irrelevant to what Ethernet is.
This one I'll also be talking about. In '92 people suggested replacing IPv4
with a different envelope called CLNP, and they said, oh, that would be
ripping the heart out of the Internet and putting in a foreign substance;
whereas instead IPv6 is just sort of a gentle upgrade to a new version of the
protocol. And I'll explain why that's kind of nonsense.
And then security is built into IPv6 and it's just an add-on to IPv4. I have
a few slides about that. And SDN is revolutionary stuff, and I'll talk about
those two first.
So security is built into IPv6 but it is just an add-on to IPv4. Where did
this come from? Maybe those of you who aren't really in the field haven't
heard that, but I hear it all the time. Oh, we have to upgrade to IPv6 so
that we'll get security.
So where did this come from that people started saying this? Well, turns out
there's a protocol called IPsec and it's a protocol similar to SSL. You all
know what SSL does. The spec for IPsec says that it's mandatory to implement
for IPv6 and it's optional for IPv4.
But it doesn't work better with IPv6 than IPv4. It works just as well. And
mandatory is just words in a spec. So there's probably more implementations
for IPv4 of IPsec than IPv6. So more people have implemented IPsec using
IPv4 than IPv6.
And there's plenty of IPv6 implementations without IPsec. Plus IPsec is not
equivalent to security. Pretty much anything you could do with IPsec you can
do with SSL. And plus there's lots of other security things that neither
IPsec nor SSL has anything to do with. So it's just these kind of little
phrases that get repeated over and over.
SDN. All of a sudden in the industry every conference you wept to would have
a keynote saying what are privilege it is to be born at this point in history
when SDN is transforming the planet. So what exactly is SDN? It stands for
software-defined networking. Does that help?
It's three perfectly wonderful words. I have no idea what they're doing in
the same phrase. But at any rate -- oh, and then there's like panel sessions
at these things where I would hope that they would describe it. But, no,
they'd have like seven panelists and the title of the panel would be SDN:
Panacea Or Miracle? And then each person on the thing wouldn't say what it
was but they'd say how important it was to their company and they were going
to be implementing it.
But, yeah, it's a buzzword. And it's a moving target. So what people are
talking about being SDN today, you know, three years from now will have
totally different things underneath that buzzword umbrella.
And so I -- when you use a term like that, about 80 percent of the engineers
just get traumatized. They think everyone else understands it and I don't,
and if they discover I don't, then they'll think I'm stupid. And then the
remainder of the people, if you ask them what it is, they will confidently
give you a definition. But people's definitions will vary.
So I actually separated it into about six completely orthogonal concepts, and
none of them are new. So like one of them is implement a switch or a router,
whatever you want to call these things, on a general purpose machine rather
than a specialized box. That's how they used to do it until bandwidth got
too fast. So it's a fine thing to do if that's possible. Sometimes you
still need hardware assist for link compression or something like that. So
you can't be all -- it must all be done in software, it's just do whatever is
cheapest and whatever.
Another thing is calculating the forwarding table which is what tells a
switch how to forward the packet from a central place rather than a
distributed algorithm. And you could do it either way. The older way
actually is with this centralized thing. So ATM did it that way, X.25,
InfiniBand does it that way.
And then another one is managing a network from one place. And that was
always the vision of networking. There was this room called the network
operations center. And I've been to one, you know, like 30 years ago. And
there was this, you know, comfortable chair in front of a fancy display that
had a picture of the network with links that were blinking if there was some
problem with them. And the network operator would sit at the chair and type
commands. Who knows what he was commanding exactly. And information would
go out. So none of this stuff is particularly new.
Now, the way networking tends to be taught, not in my books, but in most
books is they give you the impression that TCP/IP arrived on tablets from the
sky in its awesome perfection and nothing else ever existed. And they assume
that the students, you just cram their head full of the actual packet formats
of what's actually deployed so that when they graduate they can write an
application to the sockets interface.
My books are very different. I say, well, here's a conceptual problem. You
plug into a network and you need an address. And here's like six different
ways I can think of doing it, here are the pros and cons between the various
approaches.
But, oh, and by the way, Appletalk did this, IPX did this, IPv4 does this.
And some professors supposedly say why is there stuff in here that my
students don't need to know? With the model of a students brain as being
sort of very, very small and you have to make sure that you don't put any
information in there that won't be relevant to a recruiter's checklist.
They're not going to ask do you know Appletalk, so it's waste that they know
anything about it.
But I claim that if all you care about is IPv4, you will have a way deeper
understanding of it if you can see alternatives. Plus if you ever actually
are going to design something else. And I run into all the time people who
claim to be networking researchers, and I say, well, you know, how does this
thing compare to Appletalk. They've never heard of Appletalk. They've never
heard of thinking else. And how you can do research without having -- you
know, and the researchers tend to reinvent things, often not as well as
things that existed before.
So, yeah, where does this confusion come from? Hype, you know, people get
all excited about their thing. People repeating stuff that they don't
understand, and I hope you will never do that again after this lecture.
Buzzwords with no clear definition. Or else the world might be changing, so
something that used to be true is no longer true.
So things are so confusing. I always want to get to the heart of the truth
of things. So when there's two things that are similar, technology A and B,
I want to understand what's really different about them.
Nobody else seems to do it. So I ask people. Nobody knows them both. So,
okay, I take a deep breath, they can the two specs and I get these two huge
specs all with their own jargon for no good reason, badly written, and I'm
trying to plow through them and I might try to save myself some time by
asking someone who's an expert in A how it compares with B, and they say A is
awesome and B sucks. And then I ask the B person, I get the opposite answer.
But then if I discover things about B that are better and I tell the A people
actually it has these features and it works better in this case because of
the way it handled that, no problem. They steal the ideas. So both A and B
are moving targets. Nobody cares about the technology in their spec, they
just want credit.
So I tell people it's natural to think of standards bodies as well-educated
technologists that are carefully weighing engineering tradeoffs. But a much
more accurate way to think of them is as drunken sports fans. So this, you
know, what if you actually measure A versus B. That's actually science.
So I was actually at a presentation once where somebody was trying to
convince the executives of the company to bet the company on technology B
where the existing thing was technology A, and everything that I understood
about his thing seemed worse than technology A. In particular, he -- one
little thing was he made everything little teeny packets so every packet
would have the overhead of an extra envelope and the switches would have more
switching decisions to have to make. So that seemed only one of N reasons
why I thought it would be worse.
So he was describing them according to various things. He was saying A
doesn't scale beyond whatever, and it was like where he picked that number
out. But in particular he said that, well, okay A was Ethernet. So he said
that with Ethernet throughput it was 1 gig and with his technology it was 4
gig. And so throughput is like really important. And so getting four times
as much throughput is like really a very important thing.
But my mind was rebelled. It was I can't imagine any reason why it would be.
All of my intuition says it should be the opposite. So in front of all the
executives I completely innocently raised my hand and said were you by any
chance using a 1 gig physical link with Ethernet. And indeed, and he said
yes, that's all I could find in the lab. So he was measuring his thing on a
10 gig link getting 4 gig and Ethernet on a 1 gig link and getting 1 gig, but
now it was on a PowerPoint slide and it was likely to get repeated over and
over and over because it was science.
So you have to be very careful when you're measuring things. You're only
measuring one implementation of A versus one implementation of B. It's not
necessarily an intrinsic part of the technology.
So please practice critical thinking. And there's something else I'm
passionate about, which is corporate culture. So one time I worked in this
group that the culture was dominated by these really obnoxious people who
were very aggressive and condescending. And if you would ask a question,
they would snap back if you don't know that, you don't belong in this group.
So I believe that it has to be safe to ask questions. So if somebody asks me
a question that everybody knows, like they say what's a public key, I
wouldn't say how can you not know that. I'll say, oh, my goodness, it's the
coolest thing ever and I can't believe my good luck that I get to be the
first person to explain that to you.
But also sometimes naive questions make you rethink your assumptions, and
sometimes someone looking at it with fresh eyes, you know, it's like, well,
why was I thinking that.
And the other thing is that once you get to be fairly senior, you sort of
think everyone expects me to know everything. And you're afraid to admit
that you don't know everything. Now, nobody knows everything. If you're
truly a leader, you should be the first person to ask naive questions to show
that you're perfectly comfortable in your own skin admitting you don't know
everything and that it's safe for people to ask that sort of thing. So in
your own companies please do that.
So now an example of something confusing. What exactly is Ethernet?
does it compare with or work with IP? And people talk about layer 2
solutions and layer 3 solutions, and I'll explain all of this.
So first we need to review network layers. So
was credited with naming the layers. And it's
about networks. It's nobody does it this way,
other layers. But it's a great way to kind of
networking. So I will quickly review them.
How
ISO, one of the sports teams,
really just a way of thinking
layers look at things from
start understanding
So this is really Perlman's layers rather than ISO's layers. It's slightly
different, and you'll understand in a minute. So layer 1 is the physical
layer. It sort of says how you signal a bit to your neighbor, what the cable
looks like and all that. Layer 2 is how you get a whole packet to your
neighbor. And so layer 1 let you signal bits and somehow you -- in this
bitstream of 0s and 1s you somehow signal that is the beginning of a packet,
this is the end of a packet, here's a checksum. Layer 3, which was always
sort of the layer I loved, I still do, is the thing where the network figures
out how to create a whole path and forwards the packet across the network.
And like IP is an example of a layer 3 protocol. Layer 4 is end-to-end stuff
between the source and the destination. So you might number the messages and
acknowledge things so that things that get lost or out of order can be put
in, can be retrieved. And layers 5 and above are boring. So that's why it's
Perlman's layers.
So why are we forwarding Ethernet packets? We are, as I'll explain. Which
means that Ethernet is no longer layer 2. It's a layer 3 protocol. So
Ethernet was not invented to be forwarded. It was invented to just be a
single link that everybody could hear everybody else on the link. So what
exactly is it? And the only way to understand it is to see the history.
Because it makes no sense. No one would have invented what it is without it
having evolved. Yeah, sometimes I say intelligent design is probably better
than evolution, but anyway, in this case, because it was kind of in each
little step.
So back then, which is like in the early 1980s, I was the one who was in
charge of designing layer 3 of DECnet. Now, you may think, well, DECnet has
died out. But actually the basic algorithms, you know, have made their way
into IP.
The routing protocol I designed was adopted by ISO and unfortunately renamed
IS-IS, so -[laughter]
>> Radia Perlman: I constantly have old friends e-mailing me newspaper
articles about how ISIS ->>:
What does it stand for?
>> Radia Perlman: Oh. IS is intermediate system. So it was the protocol
between intermediate systems, and the intermediate systems were what they
called routers or switches.
So, anyway, layer 3 calculates paths and forwards packets. And layer 2 was
just supposed to get a chunk of information, a packet from one guy to its
neighbor. So, yeah, this thing here receives a packet, looks at its
forwarding table and decides -- it looks up something in the forwarding table
like, for instance, the destination address based on what's in the packet,
and it will tell it which link to forward it out. So how do you compute the
forwarding table. And it could be done with a central node like ATM or
InfiniBand, or you could do it with a distributed algorithm.
And, by the way, anyone that wants the sides, you know, just e-mail me and
I'll send you the slides. I don't make very good slides, but, yeah.
So distributed -- a distributed algorithm is where you just plug the network
together like Tinkertoys and the individual green circles there gossip
amongst themselves and figure out how to compute their forwarding tables.
So the one that is my favorite I call link-state routing, and this is what
IS-IS is. So you're responsible -- here's a picture of a network. And here
you see that C has three neighbors. B at a cost of 2, C at a cost -- G at a
cost of 5, because there's a 5 there, and F at a cost of 2.
Each one of these nodes is responsible for generating what I call a Link
State Packet that says who you are, in this case I am A, and who your
neighbors are and the cost of the link. So here A says he has two neighbors,
B at a cost of 6 and he does, and D at a cost of 2, and he does. And this
gets sent to everybody. So everyone has this information which means they
know complete information about what the graph looks like and they can
compute paths.
So back to history. I was doing layer 3 innocently and along with great
fanfare came the Ethernet. And so everyone was all excited about it, and
I'll talk about how it evolved from CSMA/CD to spanning tree, and I'll talk a
little about TRILL as well.
So CSMA/CD was the original invention of the Ethernet. It's a way for a
bunch of nodes to share a wire. It's actually -- I was born with CSMA/CD,
apparently. You know, you sit in a conference room, and CS means don't
interrupt if someone else is talking; MA is multiple access, be aware you're
sharing the bandwidth, don't ramble on forever; CD is collision detect,
meaning that while you're talking, if somebody else talks, you both stop and
then you'll start again at a random time.
I'm always amused at looking at a conference room because there's the people
that do CSMA/CD, just like me, there's the people that raise their hands, and
I don't know who they think is going to call on them, but -- and there's the
people who don't even do the CS part, like when they feel like talking they
just start talking and they don't do CD, if someone else talks, they'll keep
talking, or they'll start speaking more loudly if somebody else -- but at any
rate, this was a fine protocol for getting a bunch of nodes on a single link.
If you have too much traffic, then you waste so much time on collisions that
you don't -- you get less good throughput. And it didn't scale beyond a few
hundred nodes and a limited distance like within a single building.
So I saw Ethernet and I said whoops, this is a new type of link. And in my
layer 3 kind of thing, it would not perform well with this kind of link
unless I made modifications. So, for instance, if you remember link state
from like three slides ago, if you had 500 nodes and each one of them
reported 500 neighbors, then the link state database would get really big.
So, you know, there was just kind of little things that I did.
So I said, well, okay instead of doing this fully connected thing like over
here, I'll pretend that the Ethernet itself is a node, I called it a
pseudonode, and everybody just reports connectivity to that and it makes
it -- you know, but no big deal.
But I wish they had called Ethernet Etherlink because they confuse the
industry. So people -- you know, it's easy to get confused. This is what an
Ethernet packet looks like. You put an envelope on your data with the
destination and the source. And a layer 3 packet looks the same. There's
this extra field called a hop count. And the reason for that is that when
the topology changes and people are modifying their forwarding tables, there
will be a time when things are not matching. And so you'll have packets
wandering around, and the hop count will get rid of them, you know, before
they go around too many times.
So it's easy to confuse Ethernet with layer 3. It kind of looks the same.
There's no hop count fields. And it isn't because the Ethernet designers
didn't know about hop count fields, it just never occurred to them anyone
would be forwarding their header. It wasn't intended to be forwarded.
Also, one of the geniuses of Ethernet, the genius qualities of Ethernet, is
the flat address, which is that every device is born with a unique ID. So
you can just plug them together and you know that the addresses won't
conflict with each other. But if you did the Internet that way, then it's
hard for the routers because they have to be careful of where every
individual node was. If instead you get an address that conforms to where
you are in the topology, then the routers can draw circles around portions of
the Internet and just say everything in here has an address that looks like
this.
So why are we forwarding Ethernet packets? How did that happen? So people
got confused and thought that Ethernet was the new way of doing networking.
And so I -- you know, and they were building their applications directly on
Ethernet without layer 3. So I tried to complain to them, and I said no, no,
no, you still need layer 3. And they said, oh, go away, Radia, you're just
upset because no one needs your layer anymore. And I said but you may want
to talk from one Ethernet to another. And they said our customers would
never want to do that.
So they built their stuff directly on Ethernet. And they made a
for the company because their application was really good but it
scale beyond a single Ethernet. So they would have made just as
had they done it properly. So but explaining this to management
these guys are such heroes.
lot of money
wouldn't
much money
is hard when
So I was kind of in a bad mood about all this when one day my manager said,
Radia, you do this kind of distributed algorithm thing. We have to invent a
magic box that will sit between two Ethernets and let someone on one talk to
somebody on the other. And that's what my stuff did. But my stuff only
works if the endnodes cooperate.
to acts in certain ways.
They have to put on the header, they have
So the constraint was that we had to invent this box that would work without
modifying how the endnode worked in any way, and the endnode thought it was
speaking on a single CSMA/CD link, there was not a single spare bit in the
Ethernet packet and there was a hard size limit on it.
So the basic concept is fairly simple. You just move Ethernet packets
around. So this thing listens promiscuously to every single packet and
stores it up and then when the Ether is free on the other side or if it were
a token ring when it gets the token, it forwards it.
So that's all great. But in addition it can be even smarter than that. It
can look at the source address and learn that A is on that port so if J were
to transmit a packet with destination A, it doesn't need to forward it at
all. Or if J sends to X, it knows it only needs to forward there; whereas if
A sent a packet to J, since this doesn't know where it is, it has to send it
on both.
So this is a very simple concept, but it won't work if there's multiple ways
of getting -- if there's loops basically because if you can receive something
from the source in two different directions, where is it, really, and also
packets will never die. There's no hop count in this thing.
So why not just tell customers don't put in any loops. But then what about
backup paths or miscabling. So that was why it was good to have the spanning
tree algorithm, which is where you plug it together however you want, you can
have as much redundancy as you want, but the bridges talk amongst themselves
and figure out a loop-free subset of the topology for actually forwarding the
packets on.
So you have a physical topology like this, and then the bridges turn some of
these things into dotted lines. Now, the fact that this is a dotted line
means that bridge 3 never receives or forwards a packet on that port, but
it's still running the spanning free algorithm in case the topology changes.
And you'll notice it's not an optimal path because if A wants to talk to X,
it goes this really long way, 11, 7, 6, 2, 14, 4, and then 3. And you might
think, well, that's kind of a silly spanning tree. If it were a smarter
spanning tree, you'd get better paths. But, no, if you're having one shared
loop-free subset, someone's going to be unhappy. So intuitively if you
imagine your topology to be a big circle, spanning tree has to chop at some
place, and people on either side of the chop have to go around the long way.
So yes. So the story of this is like really cool. My manager challenged me
to come up with this thing that would break all the symmetries, require no
configuration. He challenged me to this on a Friday. And then furthermore,
he thought he was being clever. He thought it was going to be really hard.
And so he said, well, while you're at it, make it scale as a constant so no
matter how many links and bridges there are the amount of memory necessary to
run this should be a constant, which is crazy.
Linear is the best you can do.
Nothing is a constant.
And then he was going to be gone the whole next week. And that was before
the days when people read e-mail on vacation or had cell phones or
electricity or whatever. So that night I realized, oh, my goodness, it's
trivial, and I could prove it. I knew exactly how to make it work. And it
scaled as a constant. The reason it scales as a constant is that to run the
spanning tree algorithm you have to remember the best spanning tree message
you've heard on each one of your ports.
So let's say you have four ports. When you're receiving packets on this
port, you say is this a spanning tree message, and if so, you compare it with
the one you have stored. And there's a trivial comparison, whichever one is
better, you save that and throw the other one away. So a spanning tree
message is about 50 bytes. So if you have four ports, it takes 200 bytes to
run it, no matter how big the actual network is.
So I was all excited. And then Monday and Tuesday I wrote the spec in enough
detail because it's really a trivial algorithm that the implementers got it
working in just a couple months without asking me a single question. But
then I had the whole rest of the week where I couldn't concentrate on
anything else because I had to show off and my manager wasn't around. So I
spent the remainder of the week working on the poem that goes along with the
algorithm. And the poem is the abstract of the paper in which I published
it.
So the poem is called Algorhyme because every algorithm should have an
algorhyme. And the poem is I think that I shall never see a graph more
lovely than a tree. A tree whose crucial property is loop-free connectivity.
A tree which must be sure to span so packets can reach every LAN. First the
root must be selected, by ID it is elected. Least cost paths from root are
traced, in the tree these paths are placed. A mesh is made by folks like me,
then bridges find a spanning tree.
[applause]
>> Radia Perlman: So then there was this really cool story where the -- this
really I felt was a bad idea. I really thought they should have to put layer
3 back in the endnodes, I mean, because spanning tree doesn't give you
optimal paths and stuff. If you want to make a network, make a network, have
a civilized header ->>: So for the spirit of not being afraid to ask a question, may I ask you
to clarify.
>> Radia Perlman:
Sure.
>>: It looks like the previous slide that you have, it doesn't look like a
regular tree [inaudible] tree.
>> Radia Perlman: Well, it is. Because there's only one place to get from
any place to any other place. Now, the reason it doesn't look like a tree to
you is that it's not obvious who the root is. But this particular topology,
any one of these things could be the root of the tree, it would still be a
tree.
>>:
Huh.
Okay, can you select one to show --
>> Radia Perlman: Okay. Let's pick one. Okay, 4 would have as children 9,
3 and 14. And then 14 would have as a child 2; 2 would have as a child 6 ->>:
[inaudible] it's not the green bar.
>> Radia Perlman:
>>:
These are Ethernets.
And these are ports.
These are bridges.
Okay.
>> Radia Perlman:
>>:
I'm sorry.
Okay.
>> Radia Perlman:
>>:
Oh, no.
I got it.
Yeah.
Thank you.
>> Radia Perlman:
Yes?
>>: Would you end up with the same tree regardless of which root you
selected?
>> Radia Perlman: No. The actual tree that you compute is sort of greedy
with respect to the root. Everyone wants to be as close to the root as
possible. And it's -- there's this thing called a minimum weight spanning
tree which is you take a tree and you add up the cost of all the links. This
is not a minimum weight tree either. You can calculate that as well. It's a
lot more complicated. But also probably you want a tree which is as compact
as possible. Yeah.
Okay. So, yeah, I felt kind of story for the implementers because they felt
like this whole thing was stupid and they thought everyone should put layer 3
in, as I did. And they just wanted to build the simplest possible device
just to let our customers survive for a year or so until they could redo the
endnodes to have a layer 3 in it.
And I kind of sympathized. Of course once you like do something like this
you'd love to see it deployed. And I didn't want to argue because I figured
they'd think I was biased anyway. So I let management argue it out. And so
they told the implementers, yes, you have to put in the spanning tree. And
as trivial as it is, it made their device more complicated than if they'd
done the simplest possible thing.
But then when they sold the first one, I realized, yes, it was the right
thing to make them do that. The very first bridge was sold to the world's
most sophisticated networking customer, at that point, and they had the
world's simplest topology, which was two Ethernets and one bridge.
And the story as I heard it later was that the salespeople were telling them
about this wonderful thing and they were saying, oh, but look at all the
sophisticated networking things we're doing. And the sales guy said it
really doesn't matter what you're doing, it's just going to work. And they
were saying, no, we need to talk to the engineers to tell them all the stuff
we're doing. And the salesman was saying, no, you don't, it's just going to
work.
And so they plugged it together and it didn't work and they were really
angry. And when field service went to figure out what the world's most
sophisticated customer had done with ->>:
[inaudible].
>> Radia Perlman: That I'm not going to say. What the world's most
sophisticated customer had done with the world's simplest topology, they
discovered this.
[laughter].
>> Radia Perlman: Which is that they plugged both ends into the same
Ethernet. Because, you know, in the ceiling orange cable looks like orange
cable. And I was relieved that I thought of that case actually. And
everything was working perfectly actually. The spanning tree was saying,
well, I don't need to forward packets. If I ever do, I will, but in the
meantime. So if they -- yeah, again, that shows it was right to make them do
that.
So very soon, like a year or so, after this technology CSMA/CD died out.
These days Ethernet is just wires between two switches and there's no
contention at all, other than wireless. Okay. So the next stage in Ethernet
evolution, why not just get rid of Ethernet and just use IP? Because at the
time the problem was people didn't even have layer 3 in there.
Furthermore, it was complicated by the fact that there wasn't just English,
there was Italian and German and Dutch and whatever. Now everyone's agreed
on IPv4, everybody has it in their networking stacks. Why not just get rid
of bridges and just hook everything together with IP?
And this is a very deep question with a deep answer that most networking
people never think about. And the reason is that IP has an annoying
idiosyncrasy that would make it unpleasant if you tried to hook up the entire
world with IP. And what's wrong with it is that it's configuration
intensive. Every block, every link, which is sort of surrounded by IP
routers, must have a unique block of addresses. So if you have a block of IP
addresses and you want to number your corporate network, you have to carve up
the address space to have a unique block on each link and you have to
configure the routers to know which block is on which port and if you move
from one side of a router to another, you have to change your address.
Now, that's just how IP works. It's not how layer 3 has to work. So let me
give you an example. A different sports team did this other protocol that's
like IP and they called it CLNP for connectionless network layer protocol.
And so it was actually the same standards body that took my routing protocol.
I took their packet format. So I saw CLNP and I used it for DECnet, it
seemed like a perfectly good thing.
It had 20-byte addresses. Now, keeping in mind IPv4 has 4-byte addresses.
IPv6 has 16-byte addresses. That had 20-byte addresses. But, you know, if
all you care about is big addresses, why not 735 bytes. But not only were
the addresses bigger, but they used it in a very interesting kind of way,
which is that with IP, and IPv6 works exactly the same way, every link has to
have its own block of addresses.
With this thing, this 14-byte prefix was shared by an entire large cloud. So
the 14-byte prefix gets you to the cloud. Once you get to the cloud, it's
the bottom six bytes is how the routers route to you, and the endnodes kind
of let you know where they are in the cloud. So you have a real layer 3
protocol inside the cloud that keeps track of where all the N nodes are, and
you can do shortest paths, you can do multi-pathing. You can do anything you
would with a layer 3 protocol.
So if you're using IP plus Ethernet, which is how things work today, IP gets
you to what IP thinks of as a link and the only reason that it's not just the
link is because Ethernet is kind of spanning tree and allows you to have much
bigger things than a single link. But you also have to do ARP in order to
find out once you get to the link what your Ethernet address is; whereas if
you do the CLNP way, the top 14 bytes get you to the cloud. You don't have
to do ARP because your address is right there, the bottom six bytes, and you
can do true layer 3 routing there.
And, again, another way to look at it is if you have one prefix per link like
IP, you have to carve up the address space. And if you move around, you have
to change your address. With one prefix per entire cloud, you need no
configuration of the guys inside here. All you have to do is tell somebody
what the 14-byte prefix is, and nodes inside of here can jump around and keep
their address.
So the single worst decision in the history of mankind was that in 1992
people said why don't we replace IP with CLNP. And people said good idea.
And they showed how they could make TCP work on top of CLNP. It took just a
couple months. And all the Internet applications just automatically worked.
So with just a couple months of work, everything just worked.
And imagine doing that back then. The Internet was just this small researchy
thing. It wasn't the lifeblood of all these merchants. And IP also had not
at that point out of necessity invented things like NAT and DHCP. So you
could give people understandable advantages. If you said why don't you
convert to CLNP, they'd say why, and you'd say auto configuration. They'd
go, oh, yes. Because back then you had to configure each endnode.
And you might think, well, IPv6 we've had like 25 years at this point to make
it like really awesome. So it must be like so much better. But, no, it's
exactly the same as IP where every link has its own prefix.
So but, you know, why didn't they do that? Well, just, again, nobody kind of
bothers learning anything else and they were saying this would be ripping the
heart out of the Internet and putting in a foreign substance. So instead
we're going to design something that will be just an upgrade to IP. And
there is no sane sense in which IPv6 is just an upgrade whereas CLNP would
have been a foreign substance.
>>: So question. So, yes, there are single committee inside of change or
upgrade, or who is actually people design it?
>> Radia Perlman: Well, there was a committee called IETF which is sort of
very political, as all of these things are, and they're very proud of the
fact that they don't do voting. So it's basically instead of voting it's
sort of the loudest voices win. So the loudest voices were saying this and
what can you do.
At that point, all of the vendors were actually behind CLNP because they all
had it implemented and they wanted to do it. But, you know, some people were
thinking, wow, maybe I can get a Nobel Prize by inventing a new packet
format. And so this opportunity to invent this IPv6 thing. But there's
really like nothing to it. So, yeah, as I said, the amount of money wasted,
you know, by refusing to do that, which was exactly the right thing at that
point in time.
Okay. So now I'll quickly tell you about TRILL, which was I was sort of
horrified that they were still using this stuff. I wasn't paying too much
attention. I assumed it was a quick hack until they put layer 3 back, and I
wasn't really thinking deeply. And then I realized, oh, my gosh, this stuff
is everywhere.
And so to kind of atone for this, I figured when also realizing what the
world was kind of stuck with Ethernet because of the fact that IP requires
something else to create a flat address space cloud, so I was thinking can I
make Ethernet better. So the basic concept, this got standardized in IETF.
It's called TRILL, which stands for transparent interconnection of lots of
links, where you want the best of both worlds. You want the auto
configuration and the flat address space from Ethernet, but from layer 3 you
want optimal paths and stability and traffic engineering, all of that.
So my general philosophy about protocol designs. I actually kind of hate
technology. My company finally gave me a smartphone. I've never had one
before. I don't know how to use it really. It sits on my desk and makes
funny noises. I don't know how to stop it from making that. So when I
design things, I design things for people like me, people who kind of hate
technology, where it just works, you don't have to think about it.
But then people said to me, Radia, we have customers that really like to
configure things. And I said really? Well, fine. Okay. They want knobs,
I'll put in knobs. But you don't have to touch the knobs. And if you do,
you can't hurt yourself. Any setting of the knobs will still work. So
that's kind of the -- you can play but you can't hurt yourself.
And also be evolutionary if possible. You know, it's a fun exercise to say
let's throw away the Internet and how would I design it. And you can't -it's hard to do that, so it's better to be able to say let's have a network
and you don't have to snap your fingers and replace everything, but the more
you upgrade sort of the better qualities you'll get.
So you have a spanning tree Internet with bridges, and you can replace any
subset of those with TRILL switches. And the more you replace, the more
layer 3-like you'll get. So here if you have a mixture of TRILL switches,
which are the red things, and bridges, which are the little Bs, is the TRILL
switches sort of don't even notice the little Bs. So this is what the
network, the TRILL switches, see.
So the TRILL switches create with a link-state protocol, they know how to
reach all other TRILL switches, but they have no idea where the endnodes are.
So they make a little network just with the TRILL switches. And then when A
transmits an Ethernet packet or one puts it in a TRILL envelope addressing it
to R2, and it gets across here because this is the network of TRILL switches,
it gets to R2, R2 removes the header and out pops and Ethernet packet.
So the interesting questions are what does this header, which I'll show in a
minute, and how does R1 know that R2 is the right destination.
So given that this is a picture rather than lots of words, the header is
actually only 6 bytes because the TRILL switches get a 2-byte nickname that
they auto configure. So with 2 bytes your forwarding table can be at most
64,000. So you can do a direct table lookup. And it's nice and small. So
it's 2 bytes for the first switch, 2 bytes for the last switch and a hop
count and some flags. So the packet goes across here as if it's any layer 3
protocol, because it is. And then it gets removed there.
And then the other question is how does R1 know that R2 is the right
destination. There's a bunch of ways you could do it. You could have -usually in a cloud there's some sort of fabric manager that knows where
everybody is. You could ask it. Or what the original TRILL, the deployed
ones, today, act like bridges do, which is that if you don't know where it
is, you send it on a tree, and then as every switch removes the header and
sends it to their attached endnodes, you make a note that source Ethernet
address A is attached to R1. And so you remember that for a while. If
nobody on your link cares about A, you'll time it out. But if, in general,
you will have in your table, you'll know who -- everyone that's corresponding
with endnodes that you're attached to, you'll know which switches to send it
to.
So the advantage of this extra header. Switches inside the cloud don't need
to know about all the endnodes. Their forwarding table is just the size of
the number of switches. And it's evolutionary. You can replace any subset
of your bridges with TRILL switches.
And an orthogonal concept is who puts on the header. It could be the first
switch or it could be the first hypervisor or the VM or the application. And
a note that I have to say. In the original paper I called them RBridges for
routing bridges, and I've come to dislike that term because whenever I try to
explain it people hear O-U-R bridges. Our bridges.
So I tried to get the working group to switch it to TRILL switches, and they
said, oh, we have like a whole bunch of documentation already, we don't want
to rewrite it for something silly like that. And then they finally said ah,
the poem won't work. And so that shut me up.
Now, the poem is actually -- the first time I was trying to explain this, I
was going to give a talk the next day, and I called my son who was grown up
and he was very familiar with Algorhyme because my daughter is a musician.
She plays violin. I've always been her piano accompanist. And she also has
started singing opera, and I was also her piano accompanist. And she was
giving a recital of Italian and German arias, and my son set Algorhyme to
music.
So I called him up at ten o'clock at night, and I said, look, and I explained
this new technology. I said can you come up with a version of Algorhyme that
explains this new thing. And you have one hour because I want to go to
sleep. So I'll call you back in an hour, and if you've done a good enough
job, I'll use your poem in my talk; and if not, all you've done is wasted one
hour of your life and you owe me that much.
[laughter]
>> Radia Perlman: So I called him. And he did such a good job that the poem
is in the spec. So Algorhyme v2 is: I hope that we shall one day see a
graph more lovely than a tree. A graph to boost efficiency while still
configuration-free. A network where RBridges can route packets to their
target LAN. The paths they find, to our elation, are least cost paths to
destination. With packet hop counts we now see, the network need not be
loop-free. RBridges work transparently, without a common spanning tree.
[applause]
>> Radia Perlman: So recently there were a bunch of similar things invented
also under the umbrella of SDN for some reason. And it's really just a
different kind of encapsulation. So VXLAN, for instance, is assumes that the
inner thing is IP and you just treat the IP header as if it's a 32-bit flat
address and has header on the outside which is IP plus UDP plus other things.
So the way to think of it is that inside there's a flat address space which
an Ethernet -- in TRILL is Ethernet and in some of the more recent stuff they
use IP as the inner thing, just ignoring all the fields except for the
address. And the outer thing, which in TRILL was 6 bytes and auto
configured, you know, the nicknames you just -- everyone picks their own
nickname. And the other ones you have a different header, but it's the same
concept.
Oh. And this I find kind of interesting. Suppose you had -- we had gone
with CLNP versus doing IP to get you to the cloud and then TRILL or VXLAN or
whatever inside. The advantage of CLNP is there's no need to do this ARP to
get the address on the other thing. But on the other hand, the advantage of
this thing with encapsulation is that only the edge guys need to keep track
of where the endnodes are. The guys in the middle can just forward.
Okay. So now I'm almost done. So now I'm going to insult you with things
that are just so obvious but people always get them wrong. So version
number. Most protocols have a field called version number. So if you look
at the IP packet format, it's right there. And in the spec it says put a 4
here. That's why it's called IPv4.
So the question is what is the purpose of that field. Is it decorative? Is
there some reason for it? So now a deep question is what is the difference
between a different version of a protocol and a totally different protocol.
So the only thing that makes sense to me is that most protocols have a field
in it that says what's inside.
So in IP there's a field called the protocol type. In Ethernet, there's also
field called the Ether type that says what's inside. In UDP and TCP it's a
port. So I claim that if you want to share the same let's say Ether type
with something else, then you are a different version of the same protocol.
As many things as you want can share the same protocol type, provided that
you differentiate based on the version number.
If you have a different Ether type, then you are a different protocol. So
even if the specs for the two things are identical except for which Ether
type you use, there are different protocols; whereas if you share the same
Ether type and you do it based on version number, it doesn't matter how
different the thing is, it's a different -- it's a different version of the
same protocol. That's the only definition that makes sense to me.
So, now, let's look at IPv4. The spec says put a 4 there. It doesn't say to
ever look at it. So they discovered that although they were hoping to share
the same Ether type and just call it IP, IPv4 nodes just do who knows what if
you give them an IPv6 packet. They'll just assume they should parse it this
way. Because they don't look at it.
And so therefore IPv6 is not a new version of IP. It's a new protocol. They
have to use a different protocol. And there's no reason why they had to call
it 6. They could have called it version 1. And you'd think that they would
have learned their lesson. But the spec actually says here's a version
number field; put a 6 there.
And then there's other examples that are quite hilarious. Like SSL. When
they went from version 2 to version 3, they redid all the packet formats.
Who knows why. But they were hoping to keep the same port number. They
moved the version number field too. And that's the one thing you can't
change. So as it turns out, with SSL version 2 and version 3, they still
managed to use the same port number but by the first packet that you send,
the first hello, you have to send in version 2 format and you say in there
that you can do version 3. Now, the fact that you could say everything that
you needed to do in version 3 in version 2, I don't know why they had to redo
it, but at any rate.
So okay. Parameters. Let's see. Can I go for an extra five, possibly ten
minutes without -- okay. So parameters. It would be nice not to have these
at all. Like, for instance, for link cost, you could just measure the
bandwidth and put in an equation, come up with a number. Every one of these
things, you know, the customer has to read and understand it.
So if you to have settable parameters, make sure they can't be set
incorrectly. And that's easy enough. You just have a range that you're
allowed to set it in. But sometimes you have a legal value here and a legal
value here and they don't interoperate.
And my favorite example of this is a protocol that I was never able to
explain to my otherwise brave college-aged son, which is that there's no such
thing as a reliable I am dead now message, so you have to periodically call
your mother.
[laughter].
>> Radia Perlman: Here is an example of a protocol mismatch possibility
because it's like how often should he call his mother and how long should I
wait before I call the police.
So when I was doing IS-IS, I kind of realized this problem. So in the hello
message I say hi, I'm Radia. I send hellos every 25 seconds. And you, my
neighbor, multiply that by 3, maybe add a couple of seconds, so that you know
how long to wait before declaring our link down.
Well, I never thought it was profound enough to write a paper about or get a
patent about it, but when the OSPF people basically copied IS-IS because, you
know, they wanted to invent their own thing, they mostly copied it. And they
saw those fields in there, but they kind of didn't quite understand the point
in it.
So in OSPF, it also says hi, I'm Radia, I send hellos every 25 seconds. What
you do in OSPF is compare your neighbor's hello timer with your configured
hello timer. And if they're not identical, you refuse to talk to each other.
Which is like makes the network like so brittle, why shouldn't you have
different values and whatever.
>>:
It would solve the protocol issue with your son.
>> Radia Perlman:
>>:
If --
[inaudible].
>> Radia Perlman: Oh, how did I solve it. No, he sort of got it. I would
eventually call him and say, hey, you know, are you okay. And he's like,
well, of course. I would have told you if I wasn't. And I'd go no, that
doesn't work. He's actually super smart, by the way.
Okay. And last kind of technical example I'll give is latency. So suppose
you really care how long it takes to deliver a packet across a network. So
there's two ways you can deliver the packet. One is store and forward which
is that each switch receives the entire packet and then forwards it to the
next node.
Well, if you care about latency, what you should do is as soon as you can
make a forwarding decision about which port to send it on, you should start
forwarding it there while you're still receiving it from the other one. And
that's called cut through.
So the question is what field in the header do you need to see in order to
make this forwarding decision. Well, the destination address, right? Let's
look at the IPv4 header. It's absolutely the last thing. And let's look at
the IPv6 header. It's absolutely the last thing.
So, okay. So parting thoughts. What wins out in the marketplace isn't
necessarily the best thing technically. And don't repeat or believe things
that you don't understand because they're often false.
And then in my book, Interconnections, which is about layers 2 and 3, I have
these little boxes that I call real-world examples to kind of illustrate a
point I'm making. So when I talk about scalability, I talk about the
wineglass clicking protocol which works okay with like five people but not
with 20 where everyone has to click with everybody else.
But the one that is absolutely everybody's favorite, and it's a hundred
percent true, and the point I was trying to make is you should know what
problem you're solving before you try to solve it. Which is a real problem
in this industry. People hear about two special cases, they get all excited
and start writing code. Doesn't work in all the cases. They add more code.
But yeah. So the hundred percent true story of why you should know what
problem you're trying to solve before you try to solve it is that when my son
was three, he ran up to me crying, holding up his hand saying my hand, my
hand. So I took it and kissed it a few times. What's the matter, honey?
Did you hurt it? And he said, no, I got pee on it.
[laughter]
>> Radia Perlman:
So thank you.
[applause]
>> He Xiaodong:
>> Radia Perlman:
Couple minutes for asking questions.
Yes?
>>: I want to ask about the software-defined networks. So actually I think
I can classify what I know about them into kind of two kinds. Second kind I
learned about is -- and actually I learned about a couple days ago -- is what
you said, it's kind of like dynamic tunnelling, right, similar to TRILL.
>> Radia Perlman:
That's one of the definitions of SDN.
>>: Yes. And the other definition I've heard before is that if you look at
a router, there is kind of two parts to it. One of them is what gets the
packets out of the wire, looks in the table, it says into which [inaudible]
and sends it. And the other part which executes the protocols. Right? And
that is that you take the second part and you move it out of the router,
right, and all the other properties you specified, they kind of fall out of
this.
>> Radia Perlman: Yeah. So let me say what you're saying in different
words. So there's this forwarding table that tells you what to do with the
packets. And the question is where does that forwarding table come from. So
it could come from a link state -- you know, from a distributed algorithm
like a link state thing or it can come from a central thing. And the concept
of doing it with a central thing is not new.
Now, the arguments for doing it with the central thing from the original
paper that I was, you know, not terribly impressed with is that one is that
it would make switches really cheap. And, no, the reason switches are
expensive has nothing to do with the distributed algorithm; it has to do with
engineering it to be able to move packets really quickly. And also confusion
about price versus cost. So just because, say, Cisco could get away with
charging a lot for something doesn't mean that if somebody else built it.
So I happen to like distributed algorithms better because if a link changes,
a link goes down, information goes out as quickly as possible, everyone
updates their own table; whereas with a central thing, you have to let the
central thing know and it has to recompute forwarding tables for everybody
and put that in and then tell everybody. Now, that's not a big deal.
Topologies don't change very often. Either way works.
But I have like an entire talk just about the seven orthogonal things that
people think come under the SDN umbrella. And some of them are perfectly
good. One thing is like virtualization, which I think is fantastic. It
predated SDN. Right. But I can't give that whole talk right now.
>>: Oh, no, just I wanted to say that it actually doesn't preclude you from
using these distributed algorithms. Right? It just opens the way to use
nondistributed algorithms. But also allows to do things like just upgrade
the software, right, if it's in the switch or they got to get an update from
the switch manufacturer if it's in a general purpose machine, or you just,
you know, get compiled source and you get [inaudible].
>> Radia Perlman: Absolutely. I think doing it on general purpose machines
is very attractive for a lot of reasons. In particular this other buzzword,
network function virtualization, which I'm incredibly excited about. Instead
of having -- like if you want to have a load splitter, instead of buying an
extra box and putting in in there, you just put it in a VM on one of your
switches.
So, yes, if you can keep up with wire speeds with general purpose machine,
yes, I'm all for it. So it's not like everything about it isn't good. It's
just I don't like the buzzword because it confuses people.
>>: Actually, I didn't -- I meant a different thing. So your hardware
switch still manages keeping up with the line speeds, but the general purpose
machine runs their algorithms, it builds the hierarchy tables for this
[inaudible].
>> Radia Perlman:
right.
Yeah.
And that's how often it is.
But, yeah, anyway,
>>: Just I'm trying to say that that's kind of the point of the
software-defined switching I would classify, and that allows all the other
things that you saved five different things. So they kind of fall out of
there. Yes?
>> Radia Perlman:
>>:
Okay.
Well --
[inaudible] confusing?
>> Radia Perlman:
see. Yes?
Yeah, we can -- right.
We can sit down at some point and
>>: So even though SDN has a number of different definitions, one aspect
seems to be kind of the central control component. I go way back to S&A days
and APPN, high-performance routing and all the stuff back then, [inaudible].
What are your thoughts given how you've seen this unfold about kind of why -and I tend to be, even with the S&A stuff, more of a distributed systems guy.
But why do we have this kind of control point networking back and forth? Why
does it keep coming back up? Why do we keep trying to solve it in different
ways? Going back to it, I should say.
>> Radia Perlman: Right. Yeah. A lot of times things get reinvented. As
far as I can tell, containers are the same as time sharing systems. You
know, I don't know. So, yeah, it's sometimes just sort of bright and shiny
hype about some old idea. Now, indeed, you really do want a central place to
manage your network from, like the knock. And networks were always done that
way.
But the interesting thing is what kind of wishes would this network operator
want to be telling the network, what kind of -- for instance, in a public
cloud, you want to allow a customer to pay you to say I want you to carve out
a pretend network out of your public cloud with three servers and two disks
and a pretend 4-gig link here and a 20-gig link there and somehow carve that
out.
So interesting questions is how can you make it easy for the human to express
this and how can you engineer the switches to be able to grant these wishes.
And the least interesting thing is the actual syntax by which the network
management station conveys these wishes. Yes?
>>: So what is the future in this area, layer 3, layer 5, and then you
forget about talking about layer 6 that everybody in this company is working
on layer 6 application. So I just want to know. I mean, you use these
stories to show certain principles. But this is an interesting area
[inaudible]. What do you see the future there? Or is it going to have a big
breakthrough within a few years?
>> Radia Perlman: Yeah. So people often ask me about what do I see for the
future, and I've never been really good about answering that. As a matter of
fact, if somebody had told suggested to me, I don't know, 15 or 20 years ago
that they had this great idea for a company that they were going to map the
entire Internet so you could search for anything and it would be free and all
paid with ads, I would have said, first of all, you're nuts. There's no way
that this is work. Second of all, economically, it wouldn't be feasible.
You know, so I'm not that great about predicting the future I guess.
A lot of these things, you know, this revolution with SDN is not a
revolution, it's just sort of calling things different things. So that I
don't see it. The security is astonishingly -- you know, none of this --
it's all held together with a chewing gum and thumbtacks and yet it sort of
works.
That's another whole lecture that I have that I might do some time. My title
is How to Build an Insecure System Out of Perfectly Good Cryptography.
[laughter]
>> Radia Perlman: And sort of talk about kind of the security issues people
haven't thought of. Yes?
>>: When we [inaudible] HTTPS, right, during this whole section [inaudible]
are those interconnection fixed or they are all over the places [inaudible]
change the path connections?
>> Radia Perlman: Yeah. When you speak HTTPS, it's just magic how the
packets get there. HTTPS is only between the source and the destination.
And the paths can change on every single packet and things like that. And
let me rant about one thing.
To me, layer 3 should be allowed to lose things, should be allowed to get
things out of order. And it's up to TCP to keep them in order. That's the
way the world should work. But you can't possibly sell a router that gets
packets for a particular flow out of order.
Now, originally this was because of lazy TCP implementations. Just because
the network tended not to get things out of order, people made their
implementation be okay if a packet got lost. But if packets got out of
order, it would just assume if it got N plus 1 that, well, N must have gotten
lost, too, and I'll throw N plus 1 away because it will get retransmitted.
So if you got things out of order, you'd get really miserable performance.
So I sort of had absolutely no sympathy with that because layer 3 can be much
simpler and higher performance if it's allowed to kind of send things on a
per-packet basis, not have to sort of cache which way it's sending things for
these flow.
But then when you have these middle boxes that want to do virus scanning,
they have to see every single packet of your conversation. So you can't
really send things different ways. So I'm not quite sure why this -- what
you asked; that this has anything to do with, but, yeah, I mean -- well,
whatever. I'm sorry. Okay? Any other questions? Okay. Well, thank you so
much.
[applause]
Download