How_net_works_2.dco How the Net Works Section 1 of 5 http://freesoft.org/CIE/Course/index.htm Connected: An Internet Encyclopedia Internet History Recently, everyone seems to have heard about the Internet, but did you know that the net has been around since 1969? I always like to start Internet classeswith a review of Internet history. Don't worry if you don't understand all the terms; the idea is to get a general picture of Internet history. 1969 - Birth of a Network The Internet as we know it today, in the mid-1990s, traces it origins back to a Defense Department project in 1969. The subject of the project was wartime digital communications. At that time the telephone system was about the only theater-scale communications system in use. A major problem had been identified in its design - its dependence on switching stations that could be targeted during an attack. Would it be possible to design a network that could quickly reroute digital traffic around failed nodes? A possible solution had been identified in theory. That was to build a "web" of datagram network, calledan "catenet", and use dynamic routing protocols to constantly adjust the flow of traffic through the catenet. The Defense Advanced Research Projects Agency(DARPA) launched the DARPA Internet Program. 1970s - Infancy DARPA Internet, largely the plaything of academic and military researchers, spent more than a decade in relative obscurity. As Vietnam, Watergate, the Oil Crisis, and the Iranian Hostage Crisis rolled over the nation, several Internet research teams proceeded through a gradual evolution of protocols. In 1975, DARPA declared the project a success and handed its management over to the Defense Communications Agency. Several of today's key protocols (including IP and TCP) were stable by 1980, and adopted throughout ARPANET by 1983. Mid 1980s - The Research Net Let's outline key features, circa-1983, of what was then called ARPANET. A small computer was a PDP-11/45, and a PDP-11/45 does not fit on your desk. Some sites had a hundred computers attached to the Internet. Most had a dozen or so, probably with 1 How_net_works_2.dco something like a VAX doing most of the work - mail, news, EGP routing. Users did their work using DEC VT-100 terminals. FORTRAN was the word of the day. Few companies had Internet access, relying instead on SNA and IBM mainframes. Rather, the Internet community was dominated by universities and military research sites. It's most popular service was the rapid email it made possible with distant colleagues. In August 1983, there were 562 registered ARPANET hosts (RFC 1296). UNIX deserves at least an honorable mention, since almost all the initial Internet protocols were developed first for UNIX, largely due to the availability of kernel source (for a price) and the relative ease of implementation (relative to things like VMS or MVS). The University of California at Berkeley (UCB) deserves special mention, because their Computer Science Research Group (CSRG) developed the BSD variants of AT&T's UNIX operating system. BSD UNIX and its derivatives would become the most common Internet programming platform. Many key features of the Internet were already in place, including the IP and TCP protocols. ARPANET was fundamentally unreliable in nature, as the Internet is still today. This principle of unreliable delivery means that the Internet only makes a besteffort attempt to deliver packets. The network can drop a packet without any notification to sender or receiver. Remember, the Internet was designed for military survivability. The software running on either end must be prepared to recognize data loss, retransmitting data as often as necessary to achieve its ultimate delivery. Late 1980s - The PC Revolution Driven largely by the development of the PC and LAN technology, subnetting was standardized in 1985 when RFC 950 was released. LAN technology made the idea of a "catenet" feasible - an internetwork of networks. Subnetting opened the possibilities of interconnecting LANs with WANs. The National Science Foundation (NSF) started the Supercomputer Centers program in 1986. Until then, supercomputers such as Crays were largely the playthings of large, well-funded universities and military research centers. NSF's idea was to make supercomputer resources available to those of more modest means by constructing five supercomputer centers around the country and building a network linking them with potential users. NSF decided to base their network on the Internet protocols, and NSFNET was born. For the next decade, NSFNET would be the core of the U.S. Internet, until its privatization and ultimate retirement in 1995. Domain naming was stable by 1987 when RFC 1034 was released. Until then, hostnames were mapped to IP address using static tables, but the Internet's exponential growth had made this practice infeasible. In the late 1980s, important advances related poor network performance with poor TCP performance, and a string of papers by the likes of Nagle and Van Jacobson (RFC 896, RFC 1072, RFC 1144, RFC 1323) present key insights into TCP performance. 2 How_net_works_2.dco The 1987 Internet Worm was the largest security failure in the history of the Internet. All things considered, it could happen again. Early 1990s - Address Exhaustion and the Web In the early 90s, the first address exhaustion crisis hit the Internet technical community. The present solution, CIDR, will sustain the Internet for a few more years by making more efficient use of IP's existing 32-bit address space. For a more lasting solution, IETF is looking at IPv6 and its 64-bit address space, but CIDR is here to stay. Crisis aside, the World Wide Web (WWW) has been one of Internet's most exciting recent developments. The idea of hypertext has been around for more than a decade, but in 1989 a team at the European Center for Particle Research (CERN) in Switzerland developed a set of protocols for transferring hypertext via the Internet. In the early 1990s it was enhanced by a team at the National Center for Supercomputing Applications (NCSA) at the University of Illinois - one of NSF's supercomputer centers. The result was NCSA Mosaic, a graphical, point-and-click hypertext browser that made Internet easy. The resulting explosion in "Web sites" drove the Internet into the public eye. Mid 1990s - The New Internet Of at least as much interest as Internet's technical progress in the 1990s has been its sociological progress. It has already become part of the national vocabulary, and seems headed for even greater prominence. It has been accepted by the business community, with a resulting explosion of service providers, consultants, books, and TV coverage. It has given birth to the Free Software Movement. The Free Software Movement owes much to bulletin board systems, but really came into its own on the Internet, due to a combination of forces. The public nature of the Internet's early funding ensured that much of its networking software was non-proprietary. The emergence of anonymous FTP sites provided a distribution mechanism that almost anyone could use. Network newsgroups and mailing lists offered an open communication medium. Last but not least were individualists like Richard Stallman, who wrote EMACS, launched the GNU Project and founded the Free Software Foundation. In the 1990s, Linus Torvalds wrote Linux, the popular (and free) UNIX clone operating system. \begin{soapbox} The explosion of capitalist conservatism, combined with a growing awareness of Internet's business value, has led to major changes in the Internet community. Many of them have not been for the good. First, there seems to be a growing departure from Internet's history of open protocols, published as RFCs. Many new protocols are being developed in an increasingly proprietary manner. IGRP, a trademark of Cisco Systems, has the dubious distinction as the most successful proprietary Internet routing protocol, capable only of operation between Cisco routers. Other protocols, such as BGP, are published as RFCs, 3 How_net_works_2.dco but with important operational details omitted. The notoriously mis-named Open Software Foundation has introduced a whole suite of "open" protocols whose specifications are available - for a price - and not on the net. I am forced to wonder: 1) why do we need a new RPC? and 2) why won't OSF tell us how it works? People forget that businesses have tried to run digital communications networks in the past. IBM and DEC both developed proprietary networking schemes that only ran on their hardware. Several information providers did very well for themselves in the 80s, including LEXIS/NEXIS, Dialog, and Dow Jones. Public data networks were constructed by companies like Tymnet and run into every major US city. CompuServe and others built large bulletin board-like systems. Many of these services still offer a quality and depth of coverage unparalleled on the Internet (examine Dialog if you are skeptical of this claim). But none of them offered nudie GIFs that anyone could download. None of them let you read through the RFCs and then write a Perl script to tweak the one little thing you needed to adjust. None of them gave birth to a Free Software Movement. None of them caught people's imagination. The very existence of the Free Software Movement is part of the Internet saga, because free software would not exist without the net. "Movements" tend to arise when progress offers us new freedoms and we find new ways to explore and, sometimes, to exploit them. The Free Software Movement has offered what would be unimaginable when the Internet was formed - games, editors, windowing systems, compilers, networking software, and even entire operating systems available for anyone who wants them, without licensing fees, with complete source code, and all you need is Internet access. It also offers challenges, forcing us to ask what changes are needed in our society to support these new freedoms that have touched so many people. And it offers chances at exploitation, from the businesses using free software development platforms for commercial code, to the Internet Worm and the security risks of open systems. People wonder whether progress is better served through government funding or private industry. The Internet defies the popular wisdom of "business is better". Both business and government tried to build large data communication networks in the 1980s. Business depended on good market decisions; the government researchers based their system on openness, imagination and freedom. Business failed; Internet succeeded. Our reward has been its commercialization. \end{soapbox} For the next few years, the Internet will almost certainly be content-driven. Although new protocols are always under development, we have barely begun to explore the potential of just the existing ones. Chief among these is the World Wide Web, with its potential for simple on-line access to almost any information imaginable. Yet even as the Internet intrudes into society, remember that over the last two decades "The Net" has developed a culture of its own, one that may collide with society's. Already business is making its pitch to dominate the Internet. Already Congress has deemed it necessary to regulate the Web. The big questions loom unanswered: How will society change the Internet... and how will the Internet change society? 4 How_net_works_2.dco Protocols One of the more important networking concepts is the protocol. Douglas Comer defines a protocol as "a formal description of message formats and the rules two or more machines must follow to exchange thosemessages." Protocols usually exist in two forms. First, they exist in a textual form for humans to understand. Second, they exist as programming code for computers to understand. Both forms should ultimately specify the precise interpretation of every bit of every message exchanged across a network. Protocols exist at every point where logical program flow crosses between hosts. In other words, we need protocols every time we want to do something on another computer. Every time we want to print something on a network printer we need protocols. Every time we want to download a file we need protocols. Every time we want to save our work on disk, we don't need protocols - unless the disk is on a network file server. Usually multiple protocols will be in use simultaneously. For one thing, computers usually do several things at once, and often for several people at one. Therefore, most protocols support multitasking. Also, one operation can involve several protocols. For example, consider the NFS (Network File System) protocol. A write to a file is done with an NFS operation, that uses another protocol (RPC) to perform a function call on a remote host, that uses another protocol (UDP) to deliver a datagram to a port on a remote host, that uses another protocol to delivery a datagram on an Ethernet, and so on. Along the way we made need to lookup host names (using the DNS protocol), convert data to a network standard form (using the XDR protocol), find a routing path to the host (using one or many of numerous protocols) - I think you get the idea. \begin{soapbox} One of the challenges facing network designers is to construct protocols that are as specific as possible to one function. For example, I consider NFS a good protocol design because one protocol does file transport (NFS), one protocol does procedure calls (RPC), etc. If you need to make a remote procedure call to print a file, you already have the RPC protocol that already does almost everything you need. Add one piece to the puzzle - a printing protocol, defined in terms using the RPC protocol, and your job is done. On the other hand, I do not consider TCP a very good protocol, because it mixes two functions: reliable data delivery and connection-oriented streams. Consequently, the Internet lacks a good, reliable datagram delivery mechanism, because TCP's reliable delivery techniques, while effective, are specific to stream connections. \end{soapbox} 5 How_net_works_2.dco Protocol Layering Protocols define the format of the messages exchanged over the Internet. They are normally structured in layers, to simplify design and programming. Protocol layering is a common technique to simplify networking designs by dividing them into functional layers, and assigning protocols to perform each layer's task. For example, it is common to separate the functions of data delivery and connection management into separate layers, and therefore separate protocols. Thus,one protocol is designed to perform data delivery, and another protocol, layered above the first, performs connection management. The data delivery protocol is fairly simple and knows nothing of connection management. The connection management protocol is also fairly simple, since it doesn't need to concern itself with data delivery. Protocol layering produces simple protocols, each with a few well-defined tasks. These protocols can then be assembled into a useful whole. Individual protocols can also be removed or replaced as needed for particular applications. The most important layered protocol designs are the Internet's original DoD model, and the OSI Seven Layer Model. The modern Internet represents a fusion of both models. DoD Networking Model The first layered protocol model we will study is the 4-layer DoD Model. This is the model originally designed for the Internet, and is important because all of the Internet's core protocols adhere to it. The Department of Defense Four-Layer Model was developed in the 1970s for the DARPA Internetwork Project that eventually grew into the Internet. The core Internet protocols adhere to this model, although the OSI Seven Layer Model is justly preferred for new designs. The four layers in the DoD model, from bottom to top, are: 6 How_net_works_2.dco 1. The Network Access Layer is responsible for delivering data over the particular hardware media in use. Different protocols are selected from this layer, depending on the type of physical network. 2. The Internet Layer is responsible for delivering data across a series of different physical networks that interconnect a source and destination machine. Routing protocols are most closely associated with this layer, as is the IP Protocol, the Internet's fundamental protocol. 3. The Host-to-Host Layer handles connection rendezvous, flow control, retransmission of lost data, and other generic data flow management. The mutually exclusive TCP and UDP protocols are this layer's most important members. 4. The Process Layer contains protocols that implement user-level functions, such as mail delivery, file transfer and remote login. Encapsulation Layered protocol models rely on encapsulation, which allows one protocol to be used for relaying another's messages. Encapsulation, closely related to the concept of Protocol Layering, refers to the practice of enclosing data using one protocol within messages of another protocol. To make use of encapsulation, the encapsulating protocol must be open-ended, allowing for arbitrary data to placed in its messages. Another protocol can then be used to define the format of that data. Encapsulation Example For example, consider an Internet host that requests a hypertext page over a dialup serial connection. The following scenario is likely: First, the HyperText Transfer Protocol (HTTP) is used to construct a message requesting the page. The message, the exact format of which is unimportant at this time, is represented as follows: Next, the Transmission Control Protocol (TCP) is used to provide the connection management and reliable delivery that HTTP requires, but does not provide itself. TCP defines a message header format, which can be followed by arbitrary data. So, a TCP message is constructed by attaching a TCP header to the HTTP message, as follows: Now TCP does not provide any facilities for actually relaying a message from one machine to another in order to reach its destination. This feature is provided by the Internet Protocol (IP), which defines its own message header format. An IP message is constructed by attaching an IP header to the combined TCP/HTTP message: 7 How_net_works_2.dco Finally, although IP can direct messages between machines, it can not actually transmit the message from one machine to the next. This function is dependent on the actual communications hardware. In this example, we're using a dialup modem connection, so it's likely that the first step in transmitting the message will involve the Point-to-Point Protocol (PPP): Note that I've drawn the PPP encapsulation a little differently, by enclosing the entire message, not just attaching a header. This is because PPP may modify the message if it includes bytes that can't be transmitted across the link. The receiving PPP reverses these changes, and the message emerges intact. The point to remember is that the encapsulating protocol can do anything it wants to the message - expand it, encrypt it, compress it - so long as the original message is extracted at the other end. Standards Protocols must be consistent to be effective. Therefore, standards are agreed upon and published. Standards are the things that make the Internet work. Almost always they take the form of protocols that everyone has agreed on. Role of Standards Standardized protocols provide a common meeting ground for software designers. Without standards, it is unlikely that an IBM computer could transfer files from a Macintosh, or print to a NetWare server, or login to a Sun. The technical literature of the Internet consists primarily of standard protocols that define how software and hardware from wildly divergent sources can interact on the net. Sources of Standards Standards come in two flavors - de facto and de jure. De facto standards are common practices; de jure standards have been "blessed" by some official standards body. In the Internet, many different organizations try to play the standards game. IETF, the Internet Engineering Task Force, is chief among them. IETF issues the RFCs that define Internet Standards, and it is IETF's working groups that do the real work of 8 How_net_works_2.dco developing new and enhanced Internet standards. ISO, the International Standards Organization, issues the OSI standards. IEEE, the Institute of Electrical and Electronic Engineers, issues key LAN standards such as Ethernet and Token-Ring. ANSI, the American National Standards Institute, issues FDDI. As the common oxymoron goes, "The nice thing about standards is that there's so many to choose from." Requests For Comments (RFCs) IETF's standards deserve special mention, since it is these standards, more than any other, that make the Internet work. IETF issues its standards as Requests For Comments (RFCs), but not all RFCs are standards. To understand IETF's standardization process, start with Internet Standard 1, "Official Internet Protocol Standards", which discusses the process and lists the current status of various Internet standards. Since RFCs, once issued, do not change, Standard 1 is periodically updated and reissued as a new RFC. At the time of this writing (October 1998), the most recent Standard 1 is RFC 2400. The Internet Society (ISOC), IETF's parent organization, has a long-standing commitment to open standards. RFC 1602, "Internet Standards Process", includes the following statement: Except as otherwise provided under this section, ISOC will not accept, in connection with standards work, any idea, technology, information, document, specification, work, or other contribution, whether written or oral, that is a trade secret or otherwise subject to any commitment, understanding, or agreement to keep it confidential or otherwise restrict its use or dissemination; and, specifically, ISOC does not assume any confidentiality obligation with respect to any such contribution. Example: Hypertext Page Transfer The encapsulation essay presented an example of transferring a hypertext page over a serial link. Let's take another look at the example, from the standpoint of layered, standard protocols. A Web browser requests this URL: http://www.FreeSoft.org/Connected/index.html A URL (Universal Resource Locator) is a name that identifies a hypertext page. This URL identifies the home page of Connected: An Internet Encyclopedia. I'll explain URLs in more detail later, but for now let's just say that there are three main parts to it. http identifies that the HyperText Transfer Protocol (HTTP) is to be used to obtain the page. www.FreeSoft.org is the name of the Internet host that should be contacted to obtain the Web page. Finally, /Connected/index.html identifies the page itself. The DNS protocol converts www.FreeSoft.org into the 32-bit IP address 205.177.42.129 The Domain Name System (DNS) doesn't fit neatly into our layered protocol model, but it is a very important protocol. The lower levels of the protocol stack all use 32-bit numeric addresses. Therefore, one of the first steps is to translate the textual 9 How_net_works_2.dco host name into a numeric IP address, written as four decimal numbers, separated by periods. • The HTTP protocol constructs a GET /Connected/index.html message, that will be sent to host 205.177.42.129 to request the Web page. The HTTP protocol also specifies that TCP will be used to send the message, and that TCP port 80 is used for HTTP operations. In the DoD model, this is a Process Layer operation. • The TCP protocol opens a connection to 205.177.42.129, port 80, and transmits the HTTP GET /Connected/index.html message. The TCP protocol specifies that IP will be used for message transport. In the DoD model, this is a Host-to-Host Layer operation. • The IP protocol transmits the TCP packets to 205.177.42.129 The IP protocol also selects a communication link to perform the first step of the transfer, in this case a modem. In the DoD model, this is an Internet Layer operation. • The PPP protocol encodes the IP/TCP/HTTP packets and transmits them across the modem line. In the DoD model, this is a Network Access Layer operation. 10 How_net_works_2.dco Section 1 Review Congratulations! You've completed the first section of the Programmed Instruction Course. Let's summarize the topics covered in this section: Protocol A protocol is "a formal description of message formats and the rules two or more machines must follow to exchange those messages." Protocols let us perform operations on other computers over a network. Many protocols can be in use at once. Protocols should be as specific to one task as possible. Standard Standards are protocols that everyone has agreed upon. Standard organizations exist to develop, discuss and enhance protocols. The most important Internet standard organization is the Internet Engineering Task Force (IETF). The most important Internet standard documents are the Requests For Comments (RFCs). Protocol Layering Protocols are usually organized by layering them atop one another. Protocol layers should have specific, well-defined functions. The most important protocol layering designs are the 4-layer Department of Defense (DoD) Model, and the 7-layer Open System Interconnect (OSI) Model. 4-Layer DoD Model The 4-layer DoD layered protocol model consists of the Process, Host-to-Host, Internet, and Network Access Layers. The DoD Model was developed for the Internet. The core Internet protocols adhere to the DoD Model. 11 How_net_works_2.dco Encapsulation Encapsulation happens when one protocol's message is embedded into another protocol's message. Protocol layering is implemented through encapsulation. 12