>> He Xiaodong: Hello. Welcome, everybody, come to... what's wonderful monthly lectures. And if you like, take...

>> He Xiaodong: Hello. Welcome, everybody, come to our monthly what's hot, what's wonderful monthly lectures. And if you like, take some pizza and come back to enjoy the next hour. I would like to introduce. It's really a pleasure. Great honor to introduce Radia Perlman, Dr. Radia Perlman. And she is very distinguished. She's an IEEE fellow and also Academy of Sciences member. And she's most famous for her invention of spanning-tree protocol, which is fundamental to the operation of network bridges. So she also make large contribution to many other areas of network design, standardization such as link-state protocols. And so she work -- her work transform the Ethernet protocol from using a few codes over a limited distance into something able to create large networks. And she is author of one textbook on network, coauthor of one book on network security. She holds more than 100-issued patents. Her award include Internet Hall of Fame 2014. That's very prestigious. SIGCOMM Award; USENIX Lifetime Achievement Award, 2006; recipient of the first Anita Borg Institute Women of Vision Award or Innovation in 2005; Silicon Valley Intellectual Property Law Association Inventor of the Year, 2003; Honorary Doctorate, Royal Institute of Technology, 2000; so twice named as one of the 20 most influential people in the industry by Data. So let's welcome Dr. Perlman. [applause] >> Radia Perlman: So thank you and feel free to ask questions during or whatever. We'll be kind of informal. The real point of this lecture is not for me to tell you stuff but get you to question stuff. So not everything that you hear or read, including from me, is necessarily true, and sort of really wake you up to that point. So, yeah, especially in network protocols, a lot of what everybody knows is actually false. So this field is just so confusing. So there's things that everybody knows. If you ask a networking expert why do we have both IP and Ethernet, they confidently tell you because IP is layer 3 and Ethernet is layer 2. And I will be explaining why that's total nonsense because Ethernet is not layer 2, it's layer 3. And the more subtle question is why do we have these two layer 3 protocols. Another one is Ethernet is CSMA/CD. I'll explain what that thing is. And it was at one point. It isn't anymore. So all these papers about what Ethernet is is just irrelevant to what Ethernet is. This one I'll also be talking about. In '92 people suggested replacing IPv4 with a different envelope called CLNP, and they said, oh, that would be ripping the heart out of the Internet and putting in a foreign substance; whereas instead IPv6 is just sort of a gentle upgrade to a new version of the protocol. And I'll explain why that's kind of nonsense. And then security is built into IPv6 and it's just an add-on to IPv4. I have a few slides about that. And SDN is revolutionary stuff, and I'll talk about those two first. So security is built into IPv6 but it is just an add-on to IPv4. Where did this come from? Maybe those of you who aren't really in the field haven't heard that, but I hear it all the time. Oh, we have to upgrade to IPv6 so that we'll get security. So where did this come from that people started saying this? Well, turns out there's a protocol called IPsec and it's a protocol similar to SSL. You all know what SSL does. The spec for IPsec says that it's mandatory to implement for IPv6 and it's optional for IPv4. But it doesn't work better with IPv6 than IPv4. It works just as well. And mandatory is just words in a spec. So there's probably more implementations for IPv4 of IPsec than IPv6. So more people have implemented IPsec using IPv4 than IPv6. And there's plenty of IPv6 implementations without IPsec. Plus IPsec is not equivalent to security. Pretty much anything you could do with IPsec you can do with SSL. And plus there's lots of other security things that neither IPsec nor SSL has anything to do with. So it's just these kind of little phrases that get repeated over and over. SDN. All of a sudden in the industry every conference you wept to would have a keynote saying what are privilege it is to be born at this point in history when SDN is transforming the planet. So what exactly is SDN? It stands for software-defined networking. Does that help? It's three perfectly wonderful words. I have no idea what they're doing in the same phrase. But at any rate -- oh, and then there's like panel sessions at these things where I would hope that they would describe it. But, no, they'd have like seven panelists and the title of the panel would be SDN: Panacea Or Miracle? And then each person on the thing wouldn't say what it was but they'd say how important it was to their company and they were going to be implementing it. But, yeah, it's a buzzword. And it's a moving target. So what people are talking about being SDN today, you know, three years from now will have totally different things underneath that buzzword umbrella. And so I -- when you use a term like that, about 80 percent of the engineers just get traumatized. They think everyone else understands it and I don't, and if they discover I don't, then they'll think I'm stupid. And then the remainder of the people, if you ask them what it is, they will confidently give you a definition. But people's definitions will vary. So I actually separated it into about six completely orthogonal concepts, and none of them are new. So like one of them is implement a switch or a router, whatever you want to call these things, on a general purpose machine rather than a specialized box. That's how they used to do it until bandwidth got too fast. So it's a fine thing to do if that's possible. Sometimes you still need hardware assist for link compression or something like that. So you can't be all -- it must all be done in software, it's just do whatever is cheapest and whatever. Another thing is calculating the forwarding table which is what tells a switch how to forward the packet from a central place rather than a distributed algorithm. And you could do it either way. The older way actually is with this centralized thing. So ATM did it that way, X.25, InfiniBand does it that way. And then another one is managing a network from one place. And that was always the vision of networking. There was this room called the network operations center. And I've been to one, you know, like 30 years ago. And there was this, you know, comfortable chair in front of a fancy display that had a picture of the network with links that were blinking if there was some problem with them. And the network operator would sit at the chair and type commands. Who knows what he was commanding exactly. And information would go out. So none of this stuff is particularly new. Now, the way networking tends to be taught, not in my books, but in most books is they give you the impression that TCP/IP arrived on tablets from the sky in its awesome perfection and nothing else ever existed. And they assume that the students, you just cram their head full of the actual packet formats of what's actually deployed so that when they graduate they can write an application to the sockets interface. My books are very different. I say, well, here's a conceptual problem. You plug into a network and you need an address. And here's like six different ways I can think of doing it, here are the pros and cons between the various approaches. But, oh, and by the way, Appletalk did this, IPX did this, IPv4 does this. And some professors supposedly say why is there stuff in here that my students don't need to know? With the model of a students brain as being sort of very, very small and you have to make sure that you don't put any information in there that won't be relevant to a recruiter's checklist. They're not going to ask do you know Appletalk, so it's waste that they know anything about it. But I claim that if all you care about is IPv4, you will have a way deeper understanding of it if you can see alternatives. Plus if you ever actually are going to design something else. And I run into all the time people who claim to be networking researchers, and I say, well, you know, how does this thing compare to Appletalk. They've never heard of Appletalk. They've never heard of thinking else. And how you can do research without having -- you know, and the researchers tend to reinvent things, often not as well as things that existed before. So, yeah, where does this confusion come from? Hype, you know, people get all excited about their thing. People repeating stuff that they don't understand, and I hope you will never do that again after this lecture. Buzzwords with no clear definition. Or else the world might be changing, so something that used to be true is no longer true. So things are so confusing. I always want to get to the heart of the truth of things. So when there's two things that are similar, technology A and B, I want to understand what's really different about them. Nobody else seems to do it. So I ask people. Nobody knows them both. So, okay, I take a deep breath, they can the two specs and I get these two huge specs all with their own jargon for no good reason, badly written, and I'm trying to plow through them and I might try to save myself some time by asking someone who's an expert in A how it compares with B, and they say A is awesome and B sucks. And then I ask the B person, I get the opposite answer. But then if I discover things about B that are better and I tell the A people actually it has these features and it works better in this case because of the way it handled that, no problem. They steal the ideas. So both A and B are moving targets. Nobody cares about the technology in their spec, they just want credit. So I tell people it's natural to think of standards bodies as well-educated technologists that are carefully weighing engineering tradeoffs. But a much more accurate way to think of them is as drunken sports fans. So this, you know, what if you actually measure A versus B. That's actually science. So I was actually at a presentation once where somebody was trying to convince the executives of the company to bet the company on technology B where the existing thing was technology A, and everything that I understood about his thing seemed worse than technology A. In particular, he -- one little thing was he made everything little teeny packets so every packet would have the overhead of an extra envelope and the switches would have more switching decisions to have to make. So that seemed only one of N reasons why I thought it would be worse. So he was describing them according to various things. He was saying A doesn't scale beyond whatever, and it was like where he picked that number out. But in particular he said that, well, okay A was Ethernet. So he said that with Ethernet throughput it was 1 gig and with his technology it was 4 gig. And so throughput is like really important. And so getting four times as much throughput is like really a very important thing. But my mind was rebelled. It was I can't imagine any reason why it would be. All of my intuition says it should be the opposite. So in front of all the executives I completely innocently raised my hand and said were you by any chance using a 1 gig physical link with Ethernet. And indeed, and he said yes, that's all I could find in the lab. So he was measuring his thing on a 10 gig link getting 4 gig and Ethernet on a 1 gig link and getting 1 gig, but now it was on a PowerPoint slide and it was likely to get repeated over and over and over because it was science. So you have to be very careful when you're measuring things. You're only measuring one implementation of A versus one implementation of B. It's not necessarily an intrinsic part of the technology. So please practice critical thinking. And there's something else I'm passionate about, which is corporate culture. So one time I worked in this group that the culture was dominated by these really obnoxious people who were very aggressive and condescending. And if you would ask a question, they would snap back if you don't know that, you don't belong in this group. So I believe that it has to be safe to ask questions. So if somebody asks me a question that everybody knows, like they say what's a public key, I wouldn't say how can you not know that. I'll say, oh, my goodness, it's the coolest thing ever and I can't believe my good luck that I get to be the first person to explain that to you. But also sometimes naive questions make you rethink your assumptions, and sometimes someone looking at it with fresh eyes, you know, it's like, well, why was I thinking that. And the other thing is that once you get to be fairly senior, you sort of think everyone expects me to know everything. And you're afraid to admit that you don't know everything. Now, nobody knows everything. If you're truly a leader, you should be the first person to ask naive questions to show that you're perfectly comfortable in your own skin admitting you don't know everything and that it's safe for people to ask that sort of thing. So in your own companies please do that. So now an example of something confusing. What exactly is Ethernet? does it compare with or work with IP? And people talk about layer 2 solutions and layer 3 solutions, and I'll explain all of this. So first we need to review network layers. So was credited with naming the layers. And it's about networks. It's nobody does it this way, other layers. But it's a great way to kind of networking. So I will quickly review them. How ISO, one of the sports teams, really just a way of thinking layers look at things from start understanding So this is really Perlman's layers rather than ISO's layers. It's slightly different, and you'll understand in a minute. So layer 1 is the physical layer. It sort of says how you signal a bit to your neighbor, what the cable looks like and all that. Layer 2 is how you get a whole packet to your neighbor. And so layer 1 let you signal bits and somehow you -- in this bitstream of 0s and 1s you somehow signal that is the beginning of a packet, this is the end of a packet, here's a checksum. Layer 3, which was always sort of the layer I loved, I still do, is the thing where the network figures out how to create a whole path and forwards the packet across the network. And like IP is an example of a layer 3 protocol. Layer 4 is end-to-end stuff between the source and the destination. So you might number the messages and acknowledge things so that things that get lost or out of order can be put in, can be retrieved. And layers 5 and above are boring. So that's why it's Perlman's layers. So why are we forwarding Ethernet packets? We are, as I'll explain. Which means that Ethernet is no longer layer 2. It's a layer 3 protocol. So Ethernet was not invented to be forwarded. It was invented to just be a single link that everybody could hear everybody else on the link. So what exactly is it? And the only way to understand it is to see the history. Because it makes no sense. No one would have invented what it is without it having evolved. Yeah, sometimes I say intelligent design is probably better than evolution, but anyway, in this case, because it was kind of in each little step. So back then, which is like in the early 1980s, I was the one who was in charge of designing layer 3 of DECnet. Now, you may think, well, DECnet has died out. But actually the basic algorithms, you know, have made their way into IP. The routing protocol I designed was adopted by ISO and unfortunately renamed IS-IS, so -[laughter] >> Radia Perlman: I constantly have old friends e-mailing me newspaper articles about how ISIS ->>: What does it stand for? >> Radia Perlman: Oh. IS is intermediate system. So it was the protocol between intermediate systems, and the intermediate systems were what they called routers or switches. So, anyway, layer 3 calculates paths and forwards packets. And layer 2 was just supposed to get a chunk of information, a packet from one guy to its neighbor. So, yeah, this thing here receives a packet, looks at its forwarding table and decides -- it looks up something in the forwarding table like, for instance, the destination address based on what's in the packet, and it will tell it which link to forward it out. So how do you compute the forwarding table. And it could be done with a central node like ATM or InfiniBand, or you could do it with a distributed algorithm. And, by the way, anyone that wants the sides, you know, just e-mail me and I'll send you the slides. I don't make very good slides, but, yeah. So distributed -- a distributed algorithm is where you just plug the network together like Tinkertoys and the individual green circles there gossip amongst themselves and figure out how to compute their forwarding tables. So the one that is my favorite I call link-state routing, and this is what IS-IS is. So you're responsible -- here's a picture of a network. And here you see that C has three neighbors. B at a cost of 2, C at a cost -- G at a cost of 5, because there's a 5 there, and F at a cost of 2. Each one of these nodes is responsible for generating what I call a Link State Packet that says who you are, in this case I am A, and who your neighbors are and the cost of the link. So here A says he has two neighbors, B at a cost of 6 and he does, and D at a cost of 2, and he does. And this gets sent to everybody. So everyone has this information which means they know complete information about what the graph looks like and they can compute paths. So back to history. I was doing layer 3 innocently and along with great fanfare came the Ethernet. And so everyone was all excited about it, and I'll talk about how it evolved from CSMA/CD to spanning tree, and I'll talk a little about TRILL as well. So CSMA/CD was the original invention of the Ethernet. It's a way for a bunch of nodes to share a wire. It's actually -- I was born with CSMA/CD, apparently. You know, you sit in a conference room, and CS means don't interrupt if someone else is talking; MA is multiple access, be aware you're sharing the bandwidth, don't ramble on forever; CD is collision detect, meaning that while you're talking, if somebody else talks, you both stop and then you'll start again at a random time. I'm always amused at looking at a conference room because there's the people that do CSMA/CD, just like me, there's the people that raise their hands, and I don't know who they think is going to call on them, but -- and there's the people who don't even do the CS part, like when they feel like talking they just start talking and they don't do CD, if someone else talks, they'll keep talking, or they'll start speaking more loudly if somebody else -- but at any rate, this was a fine protocol for getting a bunch of nodes on a single link. If you have too much traffic, then you waste so much time on collisions that you don't -- you get less good throughput. And it didn't scale beyond a few hundred nodes and a limited distance like within a single building. So I saw Ethernet and I said whoops, this is a new type of link. And in my layer 3 kind of thing, it would not perform well with this kind of link unless I made modifications. So, for instance, if you remember link state from like three slides ago, if you had 500 nodes and each one of them reported 500 neighbors, then the link state database would get really big. So, you know, there was just kind of little things that I did. So I said, well, okay instead of doing this fully connected thing like over here, I'll pretend that the Ethernet itself is a node, I called it a pseudonode, and everybody just reports connectivity to that and it makes it -- you know, but no big deal. But I wish they had called Ethernet Etherlink because they confuse the industry. So people -- you know, it's easy to get confused. This is what an Ethernet packet looks like. You put an envelope on your data with the destination and the source. And a layer 3 packet looks the same. There's this extra field called a hop count. And the reason for that is that when the topology changes and people are modifying their forwarding tables, there will be a time when things are not matching. And so you'll have packets wandering around, and the hop count will get rid of them, you know, before they go around too many times. So it's easy to confuse Ethernet with layer 3. It kind of looks the same. There's no hop count fields. And it isn't because the Ethernet designers didn't know about hop count fields, it just never occurred to them anyone would be forwarding their header. It wasn't intended to be forwarded. Also, one of the geniuses of Ethernet, the genius qualities of Ethernet, is the flat address, which is that every device is born with a unique ID. So you can just plug them together and you know that the addresses won't conflict with each other. But if you did the Internet that way, then it's hard for the routers because they have to be careful of where every individual node was. If instead you get an address that conforms to where you are in the topology, then the routers can draw circles around portions of the Internet and just say everything in here has an address that looks like this. So why are we forwarding Ethernet packets? How did that happen? So people got confused and thought that Ethernet was the new way of doing networking. And so I -- you know, and they were building their applications directly on Ethernet without layer 3. So I tried to complain to them, and I said no, no, no, you still need layer 3. And they said, oh, go away, Radia, you're just upset because no one needs your layer anymore. And I said but you may want to talk from one Ethernet to another. And they said our customers would never want to do that. So they built their stuff directly on Ethernet. And they made a for the company because their application was really good but it scale beyond a single Ethernet. So they would have made just as had they done it properly. So but explaining this to management these guys are such heroes. lot of money wouldn't much money is hard when So I was kind of in a bad mood about all this when one day my manager said, Radia, you do this kind of distributed algorithm thing. We have to invent a magic box that will sit between two Ethernets and let someone on one talk to somebody on the other. And that's what my stuff did. But my stuff only works if the endnodes cooperate. to acts in certain ways. They have to put on the header, they have So the constraint was that we had to invent this box that would work without modifying how the endnode worked in any way, and the endnode thought it was speaking on a single CSMA/CD link, there was not a single spare bit in the Ethernet packet and there was a hard size limit on it. So the basic concept is fairly simple. You just move Ethernet packets around. So this thing listens promiscuously to every single packet and stores it up and then when the Ether is free on the other side or if it were a token ring when it gets the token, it forwards it. So that's all great. But in addition it can be even smarter than that. It can look at the source address and learn that A is on that port so if J were to transmit a packet with destination A, it doesn't need to forward it at all. Or if J sends to X, it knows it only needs to forward there; whereas if A sent a packet to J, since this doesn't know where it is, it has to send it on both. So this is a very simple concept, but it won't work if there's multiple ways of getting -- if there's loops basically because if you can receive something from the source in two different directions, where is it, really, and also packets will never die. There's no hop count in this thing. So why not just tell customers don't put in any loops. But then what about backup paths or miscabling. So that was why it was good to have the spanning tree algorithm, which is where you plug it together however you want, you can have as much redundancy as you want, but the bridges talk amongst themselves and figure out a loop-free subset of the topology for actually forwarding the packets on. So you have a physical topology like this, and then the bridges turn some of these things into dotted lines. Now, the fact that this is a dotted line means that bridge 3 never receives or forwards a packet on that port, but it's still running the spanning free algorithm in case the topology changes. And you'll notice it's not an optimal path because if A wants to talk to X, it goes this really long way, 11, 7, 6, 2, 14, 4, and then 3. And you might think, well, that's kind of a silly spanning tree. If it were a smarter spanning tree, you'd get better paths. But, no, if you're having one shared loop-free subset, someone's going to be unhappy. So intuitively if you imagine your topology to be a big circle, spanning tree has to chop at some place, and people on either side of the chop have to go around the long way. So yes. So the story of this is like really cool. My manager challenged me to come up with this thing that would break all the symmetries, require no configuration. He challenged me to this on a Friday. And then furthermore, he thought he was being clever. He thought it was going to be really hard. And so he said, well, while you're at it, make it scale as a constant so no matter how many links and bridges there are the amount of memory necessary to run this should be a constant, which is crazy. Linear is the best you can do. Nothing is a constant. And then he was going to be gone the whole next week. And that was before the days when people read e-mail on vacation or had cell phones or electricity or whatever. So that night I realized, oh, my goodness, it's trivial, and I could prove it. I knew exactly how to make it work. And it scaled as a constant. The reason it scales as a constant is that to run the spanning tree algorithm you have to remember the best spanning tree message you've heard on each one of your ports. So let's say you have four ports. When you're receiving packets on this port, you say is this a spanning tree message, and if so, you compare it with the one you have stored. And there's a trivial comparison, whichever one is better, you save that and throw the other one away. So a spanning tree message is about 50 bytes. So if you have four ports, it takes 200 bytes to run it, no matter how big the actual network is. So I was all excited. And then Monday and Tuesday I wrote the spec in enough detail because it's really a trivial algorithm that the implementers got it working in just a couple months without asking me a single question. But then I had the whole rest of the week where I couldn't concentrate on anything else because I had to show off and my manager wasn't around. So I spent the remainder of the week working on the poem that goes along with the algorithm. And the poem is the abstract of the paper in which I published it. So the poem is called Algorhyme because every algorithm should have an algorhyme. And the poem is I think that I shall never see a graph more lovely than a tree. A tree whose crucial property is loop-free connectivity. A tree which must be sure to span so packets can reach every LAN. First the root must be selected, by ID it is elected. Least cost paths from root are traced, in the tree these paths are placed. A mesh is made by folks like me, then bridges find a spanning tree. [applause] >> Radia Perlman: So then there was this really cool story where the -- this really I felt was a bad idea. I really thought they should have to put layer 3 back in the endnodes, I mean, because spanning tree doesn't give you optimal paths and stuff. If you want to make a network, make a network, have a civilized header ->>: So for the spirit of not being afraid to ask a question, may I ask you to clarify. >> Radia Perlman: Sure. >>: It looks like the previous slide that you have, it doesn't look like a regular tree [inaudible] tree. >> Radia Perlman: Well, it is. Because there's only one place to get from any place to any other place. Now, the reason it doesn't look like a tree to you is that it's not obvious who the root is. But this particular topology, any one of these things could be the root of the tree, it would still be a tree. >>: Huh. Okay, can you select one to show -- >> Radia Perlman: Okay. Let's pick one. Okay, 4 would have as children 9, 3 and 14. And then 14 would have as a child 2; 2 would have as a child 6 ->>: [inaudible] it's not the green bar. >> Radia Perlman: >>: These are Ethernets. And these are ports. These are bridges. Okay. >> Radia Perlman: >>: I'm sorry. Okay. >> Radia Perlman: >>: Oh, no. I got it. Yeah. Thank you. >> Radia Perlman: Yes? >>: Would you end up with the same tree regardless of which root you selected? >> Radia Perlman: No. The actual tree that you compute is sort of greedy with respect to the root. Everyone wants to be as close to the root as possible. And it's -- there's this thing called a minimum weight spanning tree which is you take a tree and you add up the cost of all the links. This is not a minimum weight tree either. You can calculate that as well. It's a lot more complicated. But also probably you want a tree which is as compact as possible. Yeah. Okay. So, yeah, I felt kind of story for the implementers because they felt like this whole thing was stupid and they thought everyone should put layer 3 in, as I did. And they just wanted to build the simplest possible device just to let our customers survive for a year or so until they could redo the endnodes to have a layer 3 in it. And I kind of sympathized. Of course once you like do something like this you'd love to see it deployed. And I didn't want to argue because I figured they'd think I was biased anyway. So I let management argue it out. And so they told the implementers, yes, you have to put in the spanning tree. And as trivial as it is, it made their device more complicated than if they'd done the simplest possible thing. But then when they sold the first one, I realized, yes, it was the right thing to make them do that. The very first bridge was sold to the world's most sophisticated networking customer, at that point, and they had the world's simplest topology, which was two Ethernets and one bridge. And the story as I heard it later was that the salespeople were telling them about this wonderful thing and they were saying, oh, but look at all the sophisticated networking things we're doing. And the sales guy said it really doesn't matter what you're doing, it's just going to work. And they were saying, no, we need to talk to the engineers to tell them all the stuff we're doing. And the salesman was saying, no, you don't, it's just going to work. And so they plugged it together and it didn't work and they were really angry. And when field service went to figure out what the world's most sophisticated customer had done with ->>: [inaudible]. >> Radia Perlman: That I'm not going to say. What the world's most sophisticated customer had done with the world's simplest topology, they discovered this. [laughter]. >> Radia Perlman: Which is that they plugged both ends into the same Ethernet. Because, you know, in the ceiling orange cable looks like orange cable. And I was relieved that I thought of that case actually. And everything was working perfectly actually. The spanning tree was saying, well, I don't need to forward packets. If I ever do, I will, but in the meantime. So if they -- yeah, again, that shows it was right to make them do that. So very soon, like a year or so, after this technology CSMA/CD died out. These days Ethernet is just wires between two switches and there's no contention at all, other than wireless. Okay. So the next stage in Ethernet evolution, why not just get rid of Ethernet and just use IP? Because at the time the problem was people didn't even have layer 3 in there. Furthermore, it was complicated by the fact that there wasn't just English, there was Italian and German and Dutch and whatever. Now everyone's agreed on IPv4, everybody has it in their networking stacks. Why not just get rid of bridges and just hook everything together with IP? And this is a very deep question with a deep answer that most networking people never think about. And the reason is that IP has an annoying idiosyncrasy that would make it unpleasant if you tried to hook up the entire world with IP. And what's wrong with it is that it's configuration intensive. Every block, every link, which is sort of surrounded by IP routers, must have a unique block of addresses. So if you have a block of IP addresses and you want to number your corporate network, you have to carve up the address space to have a unique block on each link and you have to configure the routers to know which block is on which port and if you move from one side of a router to another, you have to change your address. Now, that's just how IP works. It's not how layer 3 has to work. So let me give you an example. A different sports team did this other protocol that's like IP and they called it CLNP for connectionless network layer protocol. And so it was actually the same standards body that took my routing protocol. I took their packet format. So I saw CLNP and I used it for DECnet, it seemed like a perfectly good thing. It had 20-byte addresses. Now, keeping in mind IPv4 has 4-byte addresses. IPv6 has 16-byte addresses. That had 20-byte addresses. But, you know, if all you care about is big addresses, why not 735 bytes. But not only were the addresses bigger, but they used it in a very interesting kind of way, which is that with IP, and IPv6 works exactly the same way, every link has to have its own block of addresses. With this thing, this 14-byte prefix was shared by an entire large cloud. So the 14-byte prefix gets you to the cloud. Once you get to the cloud, it's the bottom six bytes is how the routers route to you, and the endnodes kind of let you know where they are in the cloud. So you have a real layer 3 protocol inside the cloud that keeps track of where all the N nodes are, and you can do shortest paths, you can do multi-pathing. You can do anything you would with a layer 3 protocol. So if you're using IP plus Ethernet, which is how things work today, IP gets you to what IP thinks of as a link and the only reason that it's not just the link is because Ethernet is kind of spanning tree and allows you to have much bigger things than a single link. But you also have to do ARP in order to find out once you get to the link what your Ethernet address is; whereas if you do the CLNP way, the top 14 bytes get you to the cloud. You don't have to do ARP because your address is right there, the bottom six bytes, and you can do true layer 3 routing there. And, again, another way to look at it is if you have one prefix per link like IP, you have to carve up the address space. And if you move around, you have to change your address. With one prefix per entire cloud, you need no configuration of the guys inside here. All you have to do is tell somebody what the 14-byte prefix is, and nodes inside of here can jump around and keep their address. So the single worst decision in the history of mankind was that in 1992 people said why don't we replace IP with CLNP. And people said good idea. And they showed how they could make TCP work on top of CLNP. It took just a couple months. And all the Internet applications just automatically worked. So with just a couple months of work, everything just worked. And imagine doing that back then. The Internet was just this small researchy thing. It wasn't the lifeblood of all these merchants. And IP also had not at that point out of necessity invented things like NAT and DHCP. So you could give people understandable advantages. If you said why don't you convert to CLNP, they'd say why, and you'd say auto configuration. They'd go, oh, yes. Because back then you had to configure each endnode. And you might think, well, IPv6 we've had like 25 years at this point to make it like really awesome. So it must be like so much better. But, no, it's exactly the same as IP where every link has its own prefix. So but, you know, why didn't they do that? Well, just, again, nobody kind of bothers learning anything else and they were saying this would be ripping the heart out of the Internet and putting in a foreign substance. So instead we're going to design something that will be just an upgrade to IP. And there is no sane sense in which IPv6 is just an upgrade whereas CLNP would have been a foreign substance. >>: So question. So, yes, there are single committee inside of change or upgrade, or who is actually people design it? >> Radia Perlman: Well, there was a committee called IETF which is sort of very political, as all of these things are, and they're very proud of the fact that they don't do voting. So it's basically instead of voting it's sort of the loudest voices win. So the loudest voices were saying this and what can you do. At that point, all of the vendors were actually behind CLNP because they all had it implemented and they wanted to do it. But, you know, some people were thinking, wow, maybe I can get a Nobel Prize by inventing a new packet format. And so this opportunity to invent this IPv6 thing. But there's really like nothing to it. So, yeah, as I said, the amount of money wasted, you know, by refusing to do that, which was exactly the right thing at that point in time. Okay. So now I'll quickly tell you about TRILL, which was I was sort of horrified that they were still using this stuff. I wasn't paying too much attention. I assumed it was a quick hack until they put layer 3 back, and I wasn't really thinking deeply. And then I realized, oh, my gosh, this stuff is everywhere. And so to kind of atone for this, I figured when also realizing what the world was kind of stuck with Ethernet because of the fact that IP requires something else to create a flat address space cloud, so I was thinking can I make Ethernet better. So the basic concept, this got standardized in IETF. It's called TRILL, which stands for transparent interconnection of lots of links, where you want the best of both worlds. You want the auto configuration and the flat address space from Ethernet, but from layer 3 you want optimal paths and stability and traffic engineering, all of that. So my general philosophy about protocol designs. I actually kind of hate technology. My company finally gave me a smartphone. I've never had one before. I don't know how to use it really. It sits on my desk and makes funny noises. I don't know how to stop it from making that. So when I design things, I design things for people like me, people who kind of hate technology, where it just works, you don't have to think about it. But then people said to me, Radia, we have customers that really like to configure things. And I said really? Well, fine. Okay. They want knobs, I'll put in knobs. But you don't have to touch the knobs. And if you do, you can't hurt yourself. Any setting of the knobs will still work. So that's kind of the -- you can play but you can't hurt yourself. And also be evolutionary if possible. You know, it's a fun exercise to say let's throw away the Internet and how would I design it. And you can't -it's hard to do that, so it's better to be able to say let's have a network and you don't have to snap your fingers and replace everything, but the more you upgrade sort of the better qualities you'll get. So you have a spanning tree Internet with bridges, and you can replace any subset of those with TRILL switches. And the more you replace, the more layer 3-like you'll get. So here if you have a mixture of TRILL switches, which are the red things, and bridges, which are the little Bs, is the TRILL switches sort of don't even notice the little Bs. So this is what the network, the TRILL switches, see. So the TRILL switches create with a link-state protocol, they know how to reach all other TRILL switches, but they have no idea where the endnodes are. So they make a little network just with the TRILL switches. And then when A transmits an Ethernet packet or one puts it in a TRILL envelope addressing it to R2, and it gets across here because this is the network of TRILL switches, it gets to R2, R2 removes the header and out pops and Ethernet packet. So the interesting questions are what does this header, which I'll show in a minute, and how does R1 know that R2 is the right destination. So given that this is a picture rather than lots of words, the header is actually only 6 bytes because the TRILL switches get a 2-byte nickname that they auto configure. So with 2 bytes your forwarding table can be at most 64,000. So you can do a direct table lookup. And it's nice and small. So it's 2 bytes for the first switch, 2 bytes for the last switch and a hop count and some flags. So the packet goes across here as if it's any layer 3 protocol, because it is. And then it gets removed there. And then the other question is how does R1 know that R2 is the right destination. There's a bunch of ways you could do it. You could have -usually in a cloud there's some sort of fabric manager that knows where everybody is. You could ask it. Or what the original TRILL, the deployed ones, today, act like bridges do, which is that if you don't know where it is, you send it on a tree, and then as every switch removes the header and sends it to their attached endnodes, you make a note that source Ethernet address A is attached to R1. And so you remember that for a while. If nobody on your link cares about A, you'll time it out. But if, in general, you will have in your table, you'll know who -- everyone that's corresponding with endnodes that you're attached to, you'll know which switches to send it to. So the advantage of this extra header. Switches inside the cloud don't need to know about all the endnodes. Their forwarding table is just the size of the number of switches. And it's evolutionary. You can replace any subset of your bridges with TRILL switches. And an orthogonal concept is who puts on the header. It could be the first switch or it could be the first hypervisor or the VM or the application. And a note that I have to say. In the original paper I called them RBridges for routing bridges, and I've come to dislike that term because whenever I try to explain it people hear O-U-R bridges. Our bridges. So I tried to get the working group to switch it to TRILL switches, and they said, oh, we have like a whole bunch of documentation already, we don't want to rewrite it for something silly like that. And then they finally said ah, the poem won't work. And so that shut me up. Now, the poem is actually -- the first time I was trying to explain this, I was going to give a talk the next day, and I called my son who was grown up and he was very familiar with Algorhyme because my daughter is a musician. She plays violin. I've always been her piano accompanist. And she also has started singing opera, and I was also her piano accompanist. And she was giving a recital of Italian and German arias, and my son set Algorhyme to music. So I called him up at ten o'clock at night, and I said, look, and I explained this new technology. I said can you come up with a version of Algorhyme that explains this new thing. And you have one hour because I want to go to sleep. So I'll call you back in an hour, and if you've done a good enough job, I'll use your poem in my talk; and if not, all you've done is wasted one hour of your life and you owe me that much. [laughter] >> Radia Perlman: So I called him. And he did such a good job that the poem is in the spec. So Algorhyme v2 is: I hope that we shall one day see a graph more lovely than a tree. A graph to boost efficiency while still configuration-free. A network where RBridges can route packets to their target LAN. The paths they find, to our elation, are least cost paths to destination. With packet hop counts we now see, the network need not be loop-free. RBridges work transparently, without a common spanning tree. [applause] >> Radia Perlman: So recently there were a bunch of similar things invented also under the umbrella of SDN for some reason. And it's really just a different kind of encapsulation. So VXLAN, for instance, is assumes that the inner thing is IP and you just treat the IP header as if it's a 32-bit flat address and has header on the outside which is IP plus UDP plus other things. So the way to think of it is that inside there's a flat address space which an Ethernet -- in TRILL is Ethernet and in some of the more recent stuff they use IP as the inner thing, just ignoring all the fields except for the address. And the outer thing, which in TRILL was 6 bytes and auto configured, you know, the nicknames you just -- everyone picks their own nickname. And the other ones you have a different header, but it's the same concept. Oh. And this I find kind of interesting. Suppose you had -- we had gone with CLNP versus doing IP to get you to the cloud and then TRILL or VXLAN or whatever inside. The advantage of CLNP is there's no need to do this ARP to get the address on the other thing. But on the other hand, the advantage of this thing with encapsulation is that only the edge guys need to keep track of where the endnodes are. The guys in the middle can just forward. Okay. So now I'm almost done. So now I'm going to insult you with things that are just so obvious but people always get them wrong. So version number. Most protocols have a field called version number. So if you look at the IP packet format, it's right there. And in the spec it says put a 4 here. That's why it's called IPv4. So the question is what is the purpose of that field. Is it decorative? Is there some reason for it? So now a deep question is what is the difference between a different version of a protocol and a totally different protocol. So the only thing that makes sense to me is that most protocols have a field in it that says what's inside. So in IP there's a field called the protocol type. In Ethernet, there's also field called the Ether type that says what's inside. In UDP and TCP it's a port. So I claim that if you want to share the same let's say Ether type with something else, then you are a different version of the same protocol. As many things as you want can share the same protocol type, provided that you differentiate based on the version number. If you have a different Ether type, then you are a different protocol. So even if the specs for the two things are identical except for which Ether type you use, there are different protocols; whereas if you share the same Ether type and you do it based on version number, it doesn't matter how different the thing is, it's a different -- it's a different version of the same protocol. That's the only definition that makes sense to me. So, now, let's look at IPv4. The spec says put a 4 there. It doesn't say to ever look at it. So they discovered that although they were hoping to share the same Ether type and just call it IP, IPv4 nodes just do who knows what if you give them an IPv6 packet. They'll just assume they should parse it this way. Because they don't look at it. And so therefore IPv6 is not a new version of IP. It's a new protocol. They have to use a different protocol. And there's no reason why they had to call it 6. They could have called it version 1. And you'd think that they would have learned their lesson. But the spec actually says here's a version number field; put a 6 there. And then there's other examples that are quite hilarious. Like SSL. When they went from version 2 to version 3, they redid all the packet formats. Who knows why. But they were hoping to keep the same port number. They moved the version number field too. And that's the one thing you can't change. So as it turns out, with SSL version 2 and version 3, they still managed to use the same port number but by the first packet that you send, the first hello, you have to send in version 2 format and you say in there that you can do version 3. Now, the fact that you could say everything that you needed to do in version 3 in version 2, I don't know why they had to redo it, but at any rate. So okay. Parameters. Let's see. Can I go for an extra five, possibly ten minutes without -- okay. So parameters. It would be nice not to have these at all. Like, for instance, for link cost, you could just measure the bandwidth and put in an equation, come up with a number. Every one of these things, you know, the customer has to read and understand it. So if you to have settable parameters, make sure they can't be set incorrectly. And that's easy enough. You just have a range that you're allowed to set it in. But sometimes you have a legal value here and a legal value here and they don't interoperate. And my favorite example of this is a protocol that I was never able to explain to my otherwise brave college-aged son, which is that there's no such thing as a reliable I am dead now message, so you have to periodically call your mother. [laughter]. >> Radia Perlman: Here is an example of a protocol mismatch possibility because it's like how often should he call his mother and how long should I wait before I call the police. So when I was doing IS-IS, I kind of realized this problem. So in the hello message I say hi, I'm Radia. I send hellos every 25 seconds. And you, my neighbor, multiply that by 3, maybe add a couple of seconds, so that you know how long to wait before declaring our link down. Well, I never thought it was profound enough to write a paper about or get a patent about it, but when the OSPF people basically copied IS-IS because, you know, they wanted to invent their own thing, they mostly copied it. And they saw those fields in there, but they kind of didn't quite understand the point in it. So in OSPF, it also says hi, I'm Radia, I send hellos every 25 seconds. What you do in OSPF is compare your neighbor's hello timer with your configured hello timer. And if they're not identical, you refuse to talk to each other. Which is like makes the network like so brittle, why shouldn't you have different values and whatever. >>: It would solve the protocol issue with your son. >> Radia Perlman: >>: If -- [inaudible]. >> Radia Perlman: Oh, how did I solve it. No, he sort of got it. I would eventually call him and say, hey, you know, are you okay. And he's like, well, of course. I would have told you if I wasn't. And I'd go no, that doesn't work. He's actually super smart, by the way. Okay. And last kind of technical example I'll give is latency. So suppose you really care how long it takes to deliver a packet across a network. So there's two ways you can deliver the packet. One is store and forward which is that each switch receives the entire packet and then forwards it to the next node. Well, if you care about latency, what you should do is as soon as you can make a forwarding decision about which port to send it on, you should start forwarding it there while you're still receiving it from the other one. And that's called cut through. So the question is what field in the header do you need to see in order to make this forwarding decision. Well, the destination address, right? Let's look at the IPv4 header. It's absolutely the last thing. And let's look at the IPv6 header. It's absolutely the last thing. So, okay. So parting thoughts. What wins out in the marketplace isn't necessarily the best thing technically. And don't repeat or believe things that you don't understand because they're often false. And then in my book, Interconnections, which is about layers 2 and 3, I have these little boxes that I call real-world examples to kind of illustrate a point I'm making. So when I talk about scalability, I talk about the wineglass clicking protocol which works okay with like five people but not with 20 where everyone has to click with everybody else. But the one that is absolutely everybody's favorite, and it's a hundred percent true, and the point I was trying to make is you should know what problem you're solving before you try to solve it. Which is a real problem in this industry. People hear about two special cases, they get all excited and start writing code. Doesn't work in all the cases. They add more code. But yeah. So the hundred percent true story of why you should know what problem you're trying to solve before you try to solve it is that when my son was three, he ran up to me crying, holding up his hand saying my hand, my hand. So I took it and kissed it a few times. What's the matter, honey? Did you hurt it? And he said, no, I got pee on it. [laughter] >> Radia Perlman: So thank you. [applause] >> He Xiaodong: >> Radia Perlman: Couple minutes for asking questions. Yes? >>: I want to ask about the software-defined networks. So actually I think I can classify what I know about them into kind of two kinds. Second kind I learned about is -- and actually I learned about a couple days ago -- is what you said, it's kind of like dynamic tunnelling, right, similar to TRILL. >> Radia Perlman: That's one of the definitions of SDN. >>: Yes. And the other definition I've heard before is that if you look at a router, there is kind of two parts to it. One of them is what gets the packets out of the wire, looks in the table, it says into which [inaudible] and sends it. And the other part which executes the protocols. Right? And that is that you take the second part and you move it out of the router, right, and all the other properties you specified, they kind of fall out of this. >> Radia Perlman: Yeah. So let me say what you're saying in different words. So there's this forwarding table that tells you what to do with the packets. And the question is where does that forwarding table come from. So it could come from a link state -- you know, from a distributed algorithm like a link state thing or it can come from a central thing. And the concept of doing it with a central thing is not new. Now, the arguments for doing it with the central thing from the original paper that I was, you know, not terribly impressed with is that one is that it would make switches really cheap. And, no, the reason switches are expensive has nothing to do with the distributed algorithm; it has to do with engineering it to be able to move packets really quickly. And also confusion about price versus cost. So just because, say, Cisco could get away with charging a lot for something doesn't mean that if somebody else built it. So I happen to like distributed algorithms better because if a link changes, a link goes down, information goes out as quickly as possible, everyone updates their own table; whereas with a central thing, you have to let the central thing know and it has to recompute forwarding tables for everybody and put that in and then tell everybody. Now, that's not a big deal. Topologies don't change very often. Either way works. But I have like an entire talk just about the seven orthogonal things that people think come under the SDN umbrella. And some of them are perfectly good. One thing is like virtualization, which I think is fantastic. It predated SDN. Right. But I can't give that whole talk right now. >>: Oh, no, just I wanted to say that it actually doesn't preclude you from using these distributed algorithms. Right? It just opens the way to use nondistributed algorithms. But also allows to do things like just upgrade the software, right, if it's in the switch or they got to get an update from the switch manufacturer if it's in a general purpose machine, or you just, you know, get compiled source and you get [inaudible]. >> Radia Perlman: Absolutely. I think doing it on general purpose machines is very attractive for a lot of reasons. In particular this other buzzword, network function virtualization, which I'm incredibly excited about. Instead of having -- like if you want to have a load splitter, instead of buying an extra box and putting in in there, you just put it in a VM on one of your switches. So, yes, if you can keep up with wire speeds with general purpose machine, yes, I'm all for it. So it's not like everything about it isn't good. It's just I don't like the buzzword because it confuses people. >>: Actually, I didn't -- I meant a different thing. So your hardware switch still manages keeping up with the line speeds, but the general purpose machine runs their algorithms, it builds the hierarchy tables for this [inaudible]. >> Radia Perlman: right. Yeah. And that's how often it is. But, yeah, anyway, >>: Just I'm trying to say that that's kind of the point of the software-defined switching I would classify, and that allows all the other things that you saved five different things. So they kind of fall out of there. Yes? >> Radia Perlman: >>: Okay. Well -- [inaudible] confusing? >> Radia Perlman: see. Yes? Yeah, we can -- right. We can sit down at some point and >>: So even though SDN has a number of different definitions, one aspect seems to be kind of the central control component. I go way back to S&A days and APPN, high-performance routing and all the stuff back then, [inaudible]. What are your thoughts given how you've seen this unfold about kind of why -and I tend to be, even with the S&A stuff, more of a distributed systems guy. But why do we have this kind of control point networking back and forth? Why does it keep coming back up? Why do we keep trying to solve it in different ways? Going back to it, I should say. >> Radia Perlman: Right. Yeah. A lot of times things get reinvented. As far as I can tell, containers are the same as time sharing systems. You know, I don't know. So, yeah, it's sometimes just sort of bright and shiny hype about some old idea. Now, indeed, you really do want a central place to manage your network from, like the knock. And networks were always done that way. But the interesting thing is what kind of wishes would this network operator want to be telling the network, what kind of -- for instance, in a public cloud, you want to allow a customer to pay you to say I want you to carve out a pretend network out of your public cloud with three servers and two disks and a pretend 4-gig link here and a 20-gig link there and somehow carve that out. So interesting questions is how can you make it easy for the human to express this and how can you engineer the switches to be able to grant these wishes. And the least interesting thing is the actual syntax by which the network management station conveys these wishes. Yes? >>: So what is the future in this area, layer 3, layer 5, and then you forget about talking about layer 6 that everybody in this company is working on layer 6 application. So I just want to know. I mean, you use these stories to show certain principles. But this is an interesting area [inaudible]. What do you see the future there? Or is it going to have a big breakthrough within a few years? >> Radia Perlman: Yeah. So people often ask me about what do I see for the future, and I've never been really good about answering that. As a matter of fact, if somebody had told suggested to me, I don't know, 15 or 20 years ago that they had this great idea for a company that they were going to map the entire Internet so you could search for anything and it would be free and all paid with ads, I would have said, first of all, you're nuts. There's no way that this is work. Second of all, economically, it wouldn't be feasible. You know, so I'm not that great about predicting the future I guess. A lot of these things, you know, this revolution with SDN is not a revolution, it's just sort of calling things different things. So that I don't see it. The security is astonishingly -- you know, none of this -- it's all held together with a chewing gum and thumbtacks and yet it sort of works. That's another whole lecture that I have that I might do some time. My title is How to Build an Insecure System Out of Perfectly Good Cryptography. [laughter] >> Radia Perlman: And sort of talk about kind of the security issues people haven't thought of. Yes? >>: When we [inaudible] HTTPS, right, during this whole section [inaudible] are those interconnection fixed or they are all over the places [inaudible] change the path connections? >> Radia Perlman: Yeah. When you speak HTTPS, it's just magic how the packets get there. HTTPS is only between the source and the destination. And the paths can change on every single packet and things like that. And let me rant about one thing. To me, layer 3 should be allowed to lose things, should be allowed to get things out of order. And it's up to TCP to keep them in order. That's the way the world should work. But you can't possibly sell a router that gets packets for a particular flow out of order. Now, originally this was because of lazy TCP implementations. Just because the network tended not to get things out of order, people made their implementation be okay if a packet got lost. But if packets got out of order, it would just assume if it got N plus 1 that, well, N must have gotten lost, too, and I'll throw N plus 1 away because it will get retransmitted. So if you got things out of order, you'd get really miserable performance. So I sort of had absolutely no sympathy with that because layer 3 can be much simpler and higher performance if it's allowed to kind of send things on a per-packet basis, not have to sort of cache which way it's sending things for these flow. But then when you have these middle boxes that want to do virus scanning, they have to see every single packet of your conversation. So you can't really send things different ways. So I'm not quite sure why this -- what you asked; that this has anything to do with, but, yeah, I mean -- well, whatever. I'm sorry. Okay? Any other questions? Okay. Well, thank you so much. [applause]

>> He Xiaodong: Hello. Welcome, everybody, come to... what's wonderful monthly lectures. And if you like, take...

Related documents

Products

Support

&gt;&gt; He Xiaodong: Hello. Welcome, everybody, come to... what's wonderful monthly lectures. And if you like, take...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

>> He Xiaodong: Hello. Welcome, everybody, come to... what's wonderful monthly lectures. And if you like, take...