THE WEB, Chapter 4, 7 and 8 - CS

advertisement

THE WEB

Monica Stoica

Background Information

HTTP stands for Hypertext Transfer Protocol

FTP stands for File Transfer Protocol

Html stands for hypertext markup language. Html is the extension used for web page files. Some computers do not allow more than 3 characters for extensions, so the extension becomes htm.

Mail addresses are not case sensitive, but URLs are, especially if the web page resides on an Unix server.

Introduction to URLs

URL stands for Uniform

Resource Locator.

Example of URLs: http://www.uark.edu

, mailto:smonica@cs.bu.edu

, ftp://ftp.microsoft.com/Products/ msmq/demos.zip

URLs are composed of the addressing scheme ( http://, mailto:, ftp://) and the hostname .

If the left most part of an URL is http:// , the server is a World Wide Web server.

If the leftmost part of an URL is ftp:// , the server is an anonymous FTP server that allows people to download programs and other files for free.

There are also mail servers which provide email services and have names beginning with mail:// .

Other types of addressing schemes:

 news://

 file://

A scheme identifies the type of resource.

Most of the URLs use http scheme.

Since http is a protocol used to transfer Web pages data, the

URLs that use the http scheme point to Web pages, and not to anonymous ftp files, or mail addresses.

Examples of Hostnames

 www.bu.edu

 ftp.xoom.com

 mail.pacbell.net

 www.royal.gov.uk

The Rightmost Part of the Hostname

The right most part of the hostname is called the top-level domain. There are 2 types of toplevel domains:

 ORGANIZATIONAL ( com for commercial organizations, edu for education, gov for federal government, net for network, mil for military) and

 GEOGRAPHICAL DOMAINS ( uk, usa, ro ).

Domains

A DOMAIN is a set of hostnames that have the rightmost part of their names in common. For example all the hostnames that end in edu belong to the edu domain , and all the hostnames ending in uk belong to the uk domain .

When the hostnames have the two rightmost parts of their names in common, they belong to the same

Second-Level Domain . For example these following hostnames belong to the same secondlevel domain pacbell.net:

 pacbell.net

 mail.pacbell.net

 www.pacbell.net

 news.pacbell.net

DNS

All the hostnames (like www.bu.edu

) on the Internet are part of a system called

DNS, the domain name system.

DNS allows you to give a unique name to each computer on the Internet.

Those names are the hostnames.

IP Numbers

The computers on the Internet are given a unique number and the Net uses these numbers, not the hostnames. These numbers are called IP ADDRESSES or IP

NUMBERS.

Example: If I want to visit a site

(www.royal.gov.uk), my browser has to find the corresponding IP number for this address.

My browser calls upon DNS which translates the hostname into the IP address

(193.32.28.6 in this case).

All IP addresses have the same structure: four numbers separated by dots.

How DNS Works

Each organization manages its own hostnames, IP numbers and sub domains.

Each organization should have 2 computers (one as a backup), called Name Servers, to provide addressing information for all the hostnames in its domain.

At various places on the Net there are a number of special computers called ROOT NAME

SERVERS. Each root name server maintains a list of the name servers that handle top-level domains

(edu, com, uk). Each of these name servers maintains a list of other name servers that handle the second-level domain and so on.

Example of How DNS Works

If I was looking for a web site (www.cnn.com), the DNS contacts the root server then contacts the com name server and gets the IP address of the cnn.com name server. Then the DNS server contacts the cnn.com name server and gets the IP address of www.cnn.com

and the index.html to be displayed.

DNS is a good program, but it requires an amount of time to find an address. This is why your browser keeps in memory (cache) all the recently used URLs in case you need to revisit them.

URL Abbreviations

Most web servers follow a rule which says that if a URL specifies a directory but no file name, the server will automatically look for a file with a specific name. Some servers look for a file named index.html

, some for default.html

.

If you type a URL that does not have a scheme at the beginning, your browser assumes that the URL points to a Web page and inserts for you http://

Download