How the Internet works_2 PDF - NYU Computer Science Department

advertisement
How_net_works_2.dco
How the Net Works
Section 1 of 5
http://freesoft.org/CIE/Course/index.htm
Connected: An Internet Encyclopedia
Internet History
Recently, everyone seems to have heard about the Internet, but did you know that
the net has been around since 1969? I always like to start Internet classeswith a review of
Internet history. Don't worry if you don't understand all the terms; the idea is to get a
general picture of Internet history.
1969 - Birth of a Network
The Internet as we know it today, in the mid-1990s, traces it origins back to a
Defense Department project in 1969. The subject of the project was wartime digital
communications. At that time the telephone system was about the only theater-scale
communications system in use. A major problem had been identified in its design - its
dependence on switching stations that could be targeted during an attack. Would it be
possible to design a network that could quickly reroute digital traffic around failed
nodes? A possible solution had been identified in theory. That was to build a "web" of
datagram network, calledan "catenet", and use dynamic routing protocols to constantly
adjust the flow of traffic through the catenet. The Defense Advanced Research Projects
Agency(DARPA) launched the DARPA Internet Program.
1970s - Infancy
DARPA Internet, largely the plaything of academic and military researchers,
spent more than a decade in relative obscurity. As Vietnam, Watergate, the Oil Crisis,
and the Iranian Hostage Crisis rolled over the nation, several Internet research teams
proceeded through a gradual evolution of protocols. In 1975, DARPA declared the
project a success and handed its management over to the Defense Communications
Agency. Several of today's key protocols (including IP and TCP) were stable by 1980,
and adopted throughout ARPANET by 1983.
Mid 1980s - The Research Net
Let's outline key features, circa-1983, of what was then called ARPANET. A
small computer was a PDP-11/45, and a PDP-11/45 does not fit on your desk. Some sites
had a hundred computers attached to the Internet. Most had a dozen or so, probably with
1
How_net_works_2.dco
something like a VAX doing most of the work - mail, news, EGP routing. Users did their
work using DEC VT-100 terminals. FORTRAN was the word of the day. Few companies
had Internet access, relying instead on SNA and IBM mainframes. Rather, the Internet
community was dominated by universities and military research sites. It's most popular
service was the rapid email it made possible with distant colleagues. In August 1983,
there were 562 registered ARPANET hosts (RFC 1296).
UNIX deserves at least an honorable mention, since almost all the initial Internet
protocols were developed first for UNIX, largely due to the availability of kernel source
(for a price) and the relative ease of implementation (relative to things like VMS or
MVS). The University of California at Berkeley (UCB) deserves special mention,
because their Computer Science Research Group (CSRG) developed the BSD variants of
AT&T's UNIX operating system. BSD UNIX and its derivatives would become the most
common Internet programming platform.
Many key features of the Internet were already in place, including the IP and TCP
protocols. ARPANET was fundamentally unreliable in nature, as the Internet is still
today. This principle of unreliable delivery means that the Internet only makes a besteffort attempt to deliver packets. The network can drop a packet without any notification
to sender or receiver. Remember, the Internet was designed for military survivability. The
software running on either end must be prepared to recognize data loss, retransmitting
data as often as necessary to achieve its ultimate delivery.
Late 1980s - The PC Revolution
Driven largely by the development of the PC and LAN technology, subnetting
was standardized in 1985 when RFC 950 was released. LAN technology made the idea of
a "catenet" feasible - an internetwork of networks. Subnetting opened the possibilities of
interconnecting LANs with WANs. The National Science Foundation (NSF) started the
Supercomputer Centers program in 1986. Until then, supercomputers such as Crays were
largely the playthings of large, well-funded universities and military research centers.
NSF's idea was to make supercomputer resources available to those of more modest
means by constructing five supercomputer centers around the country and building a
network linking them with potential users. NSF decided to base their network on the
Internet protocols, and NSFNET was born. For the next decade, NSFNET would be the
core of the U.S. Internet, until its privatization and ultimate retirement in 1995.
Domain naming was stable by 1987 when RFC 1034 was released. Until then,
hostnames were mapped to IP address using static tables, but the Internet's exponential
growth had made this practice infeasible. In the late 1980s, important advances related
poor network performance with poor TCP performance, and a string of papers by the
likes of Nagle and Van Jacobson (RFC 896, RFC 1072, RFC 1144, RFC 1323) present
key insights into TCP performance.
2
How_net_works_2.dco
The 1987 Internet Worm was the largest security failure in the history of the
Internet. All things considered, it could happen again.
Early 1990s - Address Exhaustion and the Web
In the early 90s, the first address exhaustion crisis hit the Internet technical
community. The present solution, CIDR, will sustain the Internet for a few more years by
making more efficient use of IP's existing 32-bit address space. For a more lasting
solution, IETF is looking at IPv6 and its 64-bit address space, but CIDR is here to stay.
Crisis aside, the World Wide Web (WWW) has been one of Internet's most
exciting recent developments. The idea of hypertext has been around for more than a
decade, but in 1989 a team at the European Center for Particle Research (CERN) in
Switzerland developed a set of protocols for transferring hypertext via the Internet. In the
early 1990s it was enhanced by a team at the National Center for Supercomputing
Applications (NCSA) at the University of Illinois - one of NSF's supercomputer centers.
The result was NCSA Mosaic, a graphical, point-and-click hypertext browser that made
Internet easy. The resulting explosion in "Web sites" drove the Internet into the public
eye.
Mid 1990s - The New Internet
Of at least as much interest as Internet's technical progress in the 1990s has been
its sociological progress. It has already become part of the national vocabulary, and
seems headed for even greater prominence. It has been accepted by the business
community, with a resulting explosion of service providers, consultants, books, and TV
coverage. It has given birth to the Free Software Movement. The Free Software
Movement owes much to bulletin board systems, but really came into its own on the
Internet, due to a combination of forces. The public nature of the Internet's early funding
ensured that much of its networking software was non-proprietary. The emergence of
anonymous FTP sites provided a distribution mechanism that almost anyone could use.
Network newsgroups and mailing lists offered an open communication medium. Last but
not least were individualists like Richard Stallman, who wrote EMACS, launched the
GNU Project and founded the Free Software Foundation. In the 1990s, Linus Torvalds
wrote Linux, the popular (and free) UNIX clone operating system.
\begin{soapbox}
The explosion of capitalist conservatism, combined with a growing awareness of
Internet's business value, has led to major changes in the Internet community. Many of
them have not been for the good.
First, there seems to be a growing departure from Internet's history of open
protocols, published as RFCs. Many new protocols are being developed in an
increasingly proprietary manner. IGRP, a trademark of Cisco Systems, has the dubious
distinction as the most successful proprietary Internet routing protocol, capable only of
operation between Cisco routers. Other protocols, such as BGP, are published as RFCs,
3
How_net_works_2.dco
but with important operational details omitted. The notoriously mis-named Open
Software Foundation has introduced a whole suite of "open" protocols whose
specifications are available - for a price - and not on the net. I am forced to wonder: 1)
why do we need a new RPC? and 2) why won't OSF tell us how it works?
People forget that businesses have tried to run digital communications networks
in the past. IBM and DEC both developed proprietary networking schemes that only ran
on their hardware. Several information providers did very well for themselves in the 80s,
including LEXIS/NEXIS, Dialog, and Dow Jones. Public data networks were
constructed by companies like Tymnet and run into every major US city. CompuServe
and others built large bulletin board-like systems. Many of these services still offer a
quality and depth of coverage unparalleled on the Internet (examine Dialog if you are
skeptical of this claim). But none of them offered nudie GIFs that anyone could
download. None of them let you read through the RFCs and then write a Perl script to
tweak the one little thing you needed to adjust. None of them gave birth to a Free
Software Movement. None of them caught people's imagination.
The very existence of the Free Software Movement is part of the Internet saga,
because free software would not exist without the net. "Movements" tend to arise when
progress offers us new freedoms and we find new ways to explore and, sometimes, to
exploit them. The Free Software Movement has offered what would be unimaginable
when the Internet was formed - games, editors, windowing systems, compilers,
networking software, and even entire operating systems available for anyone who wants
them, without licensing fees, with complete source code, and all you need is Internet
access. It also offers challenges, forcing us to ask what changes are needed in our society
to support these new freedoms that have touched so many people. And it offers chances
at exploitation, from the businesses using free software development platforms for
commercial code, to the Internet Worm and the security risks of open systems.
People wonder whether progress is better served through government funding or
private industry. The Internet defies the popular wisdom of "business is better". Both
business and government tried to build large data communication networks in the 1980s.
Business depended on good market decisions; the government researchers based their
system on openness, imagination and freedom. Business failed; Internet succeeded. Our
reward has been its commercialization.
\end{soapbox}
For the next few years, the Internet will almost certainly be content-driven.
Although new protocols are always under development, we have barely begun to explore
the potential of just the existing ones. Chief among these is the World Wide Web, with its
potential for simple on-line access to almost any information imaginable. Yet even as the
Internet intrudes into society, remember that over the last two decades "The Net" has
developed a culture of its own, one that may collide with society's. Already business is
making its pitch to dominate the Internet. Already Congress has deemed it necessary to
regulate the Web. The big questions loom unanswered: How will society change the
Internet... and how will the Internet change society?
4
How_net_works_2.dco
Protocols
One of the more important networking concepts is the protocol.
Douglas Comer defines a protocol as "a formal description of message formats
and the rules two or more machines must follow to exchange thosemessages."
Protocols usually exist in two forms. First, they exist in a textual form for humans
to understand. Second, they exist as programming code for computers to understand.
Both forms should ultimately specify the precise interpretation of every bit of every
message exchanged across a network.
Protocols exist at every point where logical program flow crosses between hosts.
In other words, we need protocols every time we want to do something on another
computer. Every time we want to print something on a network printer we need
protocols. Every time we want to download a file we need protocols.
Every time we want to save our work on disk, we don't need protocols - unless the
disk is on a network file server.
Usually multiple protocols will be in use simultaneously. For one thing,
computers usually do several things at once, and often for several people at one.
Therefore, most protocols support multitasking. Also, one operation can involve several
protocols. For example, consider the NFS (Network File System) protocol. A write to a
file is done with an NFS operation, that uses another protocol (RPC) to perform a
function call on a remote host, that uses another protocol (UDP) to deliver a datagram to
a port on a remote host, that uses another protocol to delivery a datagram on an Ethernet,
and so on. Along the way we made need to lookup host names (using the DNS protocol),
convert data to a network standard form (using the XDR protocol), find a routing path to
the host (using one or many of numerous protocols) - I think you get the idea.
\begin{soapbox}
One of the challenges facing network designers is to construct protocols that are
as specific as possible to one function. For example, I consider NFS a good protocol
design because one protocol does file transport (NFS), one protocol does procedure calls
(RPC), etc. If you need to make a remote procedure call to print a file, you already have
the RPC protocol that already does almost everything you need. Add one piece to the
puzzle - a printing protocol, defined in terms using the RPC protocol, and your job is
done.
On the other hand, I do not consider TCP a very good protocol, because it mixes
two functions: reliable data delivery and connection-oriented streams. Consequently, the
Internet lacks a good, reliable datagram delivery mechanism, because TCP's reliable
delivery techniques, while effective, are specific to stream connections.
\end{soapbox}
5
How_net_works_2.dco
Protocol Layering
Protocols define the format of the messages exchanged over the Internet. They are
normally structured in layers, to simplify design and programming.
Protocol layering is a common technique to simplify networking designs by
dividing them into functional layers, and assigning protocols to perform each layer's task.
For example, it is common to separate the functions of data delivery and
connection management into separate layers, and therefore separate protocols. Thus,one
protocol is designed to perform data delivery, and another protocol, layered above the
first, performs connection management. The data delivery protocol is fairly simple and
knows nothing of connection management. The connection management protocol is also
fairly simple, since it doesn't need to concern itself with data delivery.
Protocol layering produces simple protocols, each with a few well-defined tasks.
These protocols can then be assembled into a useful whole. Individual protocols can also
be removed or replaced as needed for particular applications.
The most important layered protocol designs are the Internet's original DoD
model, and the OSI Seven Layer Model. The modern Internet represents a fusion of both
models.
DoD Networking Model
The first layered protocol model we will study is the 4-layer DoD Model. This is
the model originally designed for the Internet, and is important because all of the
Internet's core protocols adhere to it.
The Department of Defense Four-Layer Model was developed in the 1970s for the
DARPA Internetwork Project that eventually grew into the Internet. The core Internet
protocols adhere to this model, although the OSI Seven Layer Model is justly preferred
for new designs. The four layers in the DoD model, from bottom to top, are:
6
How_net_works_2.dco
1. The Network Access Layer is responsible for delivering data over the particular
hardware media in use. Different protocols are selected from this layer, depending on
the type of physical network.
2. The Internet Layer is responsible for delivering data across a series of different
physical networks that interconnect a source and destination machine. Routing
protocols are most closely associated with this layer, as is the IP Protocol, the
Internet's fundamental protocol.
3. The Host-to-Host Layer handles connection rendezvous, flow control, retransmission
of lost data, and other generic data flow management. The mutually exclusive TCP
and UDP protocols are this layer's most important members.
4. The Process Layer contains protocols that implement user-level functions, such as
mail delivery, file transfer and remote login.
Encapsulation
Layered protocol models rely on encapsulation, which allows one protocol to be
used for relaying another's messages.
Encapsulation, closely related to the concept of Protocol Layering, refers to the
practice of enclosing data using one protocol within messages of another protocol.
To make use of encapsulation, the encapsulating protocol must be open-ended,
allowing for arbitrary data to placed in its messages. Another protocol can then be used to
define the format of that data.
Encapsulation Example
For example, consider an Internet host that requests a hypertext page over a dialup
serial connection. The following scenario is likely:
First, the HyperText Transfer Protocol (HTTP) is used to construct a message
requesting the page. The message, the exact format of which is unimportant at this time,
is represented as follows:
Next, the Transmission Control Protocol (TCP) is used to provide the connection
management and reliable delivery that HTTP requires, but does not provide itself. TCP
defines a message header format, which can be followed by arbitrary data. So, a TCP
message is constructed by attaching a TCP header to the HTTP message, as follows:
Now TCP does not provide any facilities for actually relaying a message from one
machine to another in order to reach its destination. This feature is provided by the
Internet Protocol (IP), which defines its own message header format. An IP message is
constructed by attaching an IP header to the combined TCP/HTTP message:
7
How_net_works_2.dco
Finally, although IP can direct messages between machines, it can not actually
transmit the message from one machine to the next. This function is dependent on the
actual communications hardware. In this example, we're using a dialup modem
connection, so it's likely that the first step in transmitting the message will involve the
Point-to-Point Protocol (PPP):
Note that I've drawn the PPP encapsulation a little differently, by enclosing the
entire message, not just attaching a header. This is because PPP may modify the message
if it includes bytes that can't be transmitted across the link. The receiving PPP reverses
these changes, and the message emerges intact. The point to remember is that the
encapsulating protocol can do anything it wants to the message - expand it, encrypt it,
compress it - so long as the original message is extracted at the other end.
Standards
Protocols must be consistent to be effective. Therefore, standards are agreed upon
and published.
Standards are the things that make the Internet work. Almost always they take the
form of protocols that everyone has agreed on.
Role of Standards
Standardized protocols provide a common meeting ground for software designers.
Without standards, it is unlikely that an IBM computer could transfer files
from a Macintosh, or print to a NetWare server, or login to a Sun. The technical
literature of the Internet consists primarily of standard protocols that define
how software and hardware from wildly divergent sources can interact on the net.
Sources of Standards
Standards come in two flavors - de facto and de jure. De facto standards are
common practices; de jure standards have been "blessed" by some official standards
body. In the Internet, many different organizations try to play the standards game. IETF,
the Internet Engineering Task Force, is chief among them. IETF issues the RFCs that
define Internet Standards, and it is IETF's working groups that do the real work of
8
How_net_works_2.dco
developing new and enhanced Internet standards. ISO, the International Standards
Organization, issues the OSI standards. IEEE, the Institute of Electrical and Electronic
Engineers, issues key
LAN standards such as Ethernet and Token-Ring. ANSI, the American National
Standards Institute, issues FDDI. As the common oxymoron goes, "The nice thing about
standards is that there's so many to choose from."
Requests For Comments (RFCs)
IETF's standards deserve special mention, since it is these standards, more than
any other, that make the Internet work. IETF issues its standards as Requests For
Comments (RFCs), but not all RFCs are standards. To understand IETF's standardization
process, start with Internet Standard 1, "Official Internet Protocol Standards", which
discusses the process and lists the current status of various Internet standards. Since
RFCs, once issued, do not change, Standard 1 is periodically updated and reissued as a
new RFC. At the time of this writing (October 1998), the most recent Standard 1 is RFC
2400.
The Internet Society (ISOC), IETF's parent organization, has a long-standing
commitment to open standards. RFC 1602, "Internet Standards Process", includes the
following statement:
Except as otherwise provided under this section, ISOC will not accept, in
connection with standards work, any idea, technology, information, document,
specification, work, or other contribution, whether written or oral, that is a trade secret or
otherwise subject to any commitment, understanding, or agreement to keep it
confidential or otherwise restrict its use or dissemination; and, specifically, ISOC does
not assume any confidentiality obligation with respect to any such contribution.
Example: Hypertext Page Transfer
The encapsulation essay presented an example of transferring a hypertext page
over a serial link. Let's take another look at the example, from the standpoint
of layered, standard protocols.
A Web browser requests this URL: http://www.FreeSoft.org/Connected/index.html
A URL (Universal Resource Locator) is a name that identifies a hypertext page.
This URL identifies the home page of Connected: An Internet Encyclopedia. I'll explain
URLs in more detail later, but for now let's just say that there are three main parts to it.
http identifies that the HyperText Transfer Protocol (HTTP) is to be used to obtain the
page. www.FreeSoft.org is the name of the Internet host that should be contacted to
obtain the Web page. Finally, /Connected/index.html identifies the page itself.
The DNS protocol converts www.FreeSoft.org into the 32-bit IP address
205.177.42.129 The Domain Name System (DNS) doesn't fit neatly into our layered
protocol model, but it is a very important protocol. The lower levels of the protocol stack
all use 32-bit numeric addresses. Therefore, one of the first steps is to translate the textual
9
How_net_works_2.dco
host name into a numeric IP address, written as four decimal numbers, separated by
periods.
• The HTTP protocol constructs a GET /Connected/index.html message, that will
be sent to host 205.177.42.129 to request the Web page.
The HTTP protocol also specifies that TCP will be used to send the message, and
that TCP port 80 is used for HTTP operations.
In the DoD model, this is a Process Layer operation.
• The TCP protocol opens a connection to 205.177.42.129, port 80, and transmits
the HTTP GET /Connected/index.html message.
The TCP protocol specifies that IP will be used for message transport.
In the DoD model, this is a Host-to-Host Layer operation.
• The IP protocol transmits the TCP packets to 205.177.42.129
The IP protocol also selects a communication link to perform the first step of the
transfer, in this case a modem.
In the DoD model, this is an Internet Layer operation.
• The PPP protocol encodes the IP/TCP/HTTP packets and transmits them across
the modem line. In the DoD model, this is a Network Access Layer operation.
10
How_net_works_2.dco
Section 1 Review
Congratulations! You've completed the first section of the Programmed
Instruction Course. Let's summarize the topics covered in this section:
Protocol
A protocol is "a formal description of message formats and the rules two or more
machines must follow to exchange those messages."
Protocols let us perform operations on other computers over a network.
Many protocols can be in use at once.
Protocols should be as specific to one task as possible.
Standard
Standards are protocols that everyone has agreed upon.
Standard organizations exist to develop, discuss and enhance protocols.
The most important Internet standard organization is the Internet Engineering
Task Force (IETF).
The most important Internet standard documents are the Requests For Comments
(RFCs).
Protocol Layering
Protocols are usually organized by layering them atop one another.
Protocol layers should have specific, well-defined functions.
The most important protocol layering designs are the 4-layer Department of
Defense (DoD) Model, and the 7-layer Open System
Interconnect (OSI) Model.
4-Layer DoD Model
The 4-layer DoD layered protocol model consists of the Process, Host-to-Host,
Internet, and Network Access Layers.
The DoD Model was developed for the Internet.
The core Internet protocols adhere to the DoD Model.
11
How_net_works_2.dco
Encapsulation
Encapsulation happens when one protocol's message is embedded into another
protocol's message. Protocol layering is implemented through encapsulation.
12
Download