Módulo: Mecanizado (MEC)

advertisement

AF_IC01_U1. HTTP protocol and web servers

Unit 1

Table of contents

Introduction ............................................................................................................................. 2

Prerequisites ........................................................................................................................... 3

HTTP (HyperText Transfer Protocol) ....................................................................................... 4

URL / URI ................................................................................................................................ 5

Web servers ............................................................................................................................ 7

Web applications ..................................................................................................................... 8

HTTP security ....................................................................................................................... 10

HTTP transactions ................................................................................................................ 11

Bibliography .......................................................................................................................... 13

AF_IC01_U1. HTTP protocol and web servers / Unit 1 1

Introduction

In this unit you will learn about the HTTP protocol, which enables you to surf on the Internet.

You will see the most relevant features, and learn what a URL is. Web servers are the other topic. You will understand what a web server is, its purpose and also how to install and configure it. Security and transactions will be explained at the end of the unit.

AF_IC01_U1. HTTP protocol and web servers / Unit 1 2

Prerequisites

Some of prerequisites needed in this unit:

OSI and TCP/IP stack

Knowledge of HTML

Database concepts

AF_IC01_U1. HTTP protocol and web servers / Unit 1 3

HTTP (HyperText Transfer Protocol)

HTTP is a protocol designed to transfer hypertext (text that can be linked to other texts). The most common is the use of HTML pages, but also other f ormats like text files, images, etc…can be transferred.

TO KNOW MORE

Some more information about HTML at Wikipedia

Wikipedia. HTML

The most important features are:

Application layer protocol.

Using Universal Resource Identifier (URI), specifically Universal Resource Locator

(URL), defined further, which permits you to identify every resource on the Internet.

Client-Server architecture (request/response paradigm).

Default port is 80.

Communication works over TCP (transport layer), but also can be used over UDP.

Connectionless and stateless protocol

The server responds only to the current request, and remains unrelated to other connections.

A connection is set for every requested file (since version 1.1, a keep-alivemechanism was introduced, allowing to reuse connections). For instance: if a webpage has 2 images, 3 connections are needed: one for the HTML page, and one for each image .

Open to new data types.

Use of the MIME (Multipart Internet Mail Extension) in order to determine the type of data (designed for STMP protocol, but also used with HTTP).

Although HTTP is a connectionless and stateless protocol, there are some ways to provide memory , that is, to remember what pages are related (identification on a website…):

Cookies (web cookie, browser cookie). RFC 6265

Small piece of data stored on your own computer that a website can read when a connection is established. With these cookies, information can be retrieved and also users ’ activity can be recognized.

Cookies can install viruses neither malware, but they can compile a lot of information

(passwords for example)

HTTP authentication . RFC 2617

Use username and password to log into a web server.

Store data on the server (IP address…).

Embed a query in the URL

Example: … moodle2/course/view.php?id=16, where 16 indicates the number of the course

AF_IC01_U1. HTTP protocol and web servers / Unit 1 4

URL / URI

URL stands for Uniform Resource Locator, and URI stands for Uniform Resource Identifier. In

HTTP protocol, normally the term URL is used. Both are a string that assigns a unique address to each resource available on the Internet.

URL

Every resource on the Internet is identified by a unique address, the URL.

The resource URL is its Internet address, and allows the browser to find and display it correctly.

It is a combination of:

Protocol

Host

Path

Filename

In this case, the format is: protocol://host/folder/file. Example: http://ca.wikipedia.org/wiki/HTTP .

But there are more parameters. Therefore, the complete format is: protocol://user:password@host:port/path/file?query#fragment

Let’s see a more detailed explanation of every parameter. protocol ://user:password@host:port/path/file?query#fragment

Examples of protocols that can be used to retrieve data:

http: Hypertext Transfer Protocol

https: HTTP over SSL

gopher: The Gopher protocol

ftp: File Transfer Protocol

mailto: Electronic mail address

ldap: LDAP (Lightweight Directory Access Protocol)

file: Host-specific file names

news: USENET news

nntp: USENET news using NNTP access

telnet: Reference to interactive sessions

wais: Wide Area Information Servers

prospero: Prospero Directory Service protocol:// user:password @host:port/path/file?query#fragment

user:password specifies the user and the password on the server.

Careful! The password is transferred visibly. protocol://user:password@ host:port /path/file?query#fragment

AF_IC01_U1. HTTP protocol and web servers / Unit 1 5

host:port specifies the transport address, that is, the host machine and the service requested.

The host machine can be defined by its IP address or by a DNS name.

By default, port 80. protocol://user:password@host:port/ path /file?query#fragment

Indicates the path of the file. This is the path from the browser view.

To know where the file is located on the server, you must add the root directory at the beginning.

Example: http://www.domain.cat/path/file.html

Root directory: /var/www/htdocs

Location on the server: /var/www/htdocs/path/file.html

protocol://user:password@host:port/path/ file ?query#fragment

The file itself could either be an HTML file or a web programming language file

(explained in the next section). protocol://user:password@host:port/path/file?

query #fragment

The query is used to pass parameters to the server.

It is a list of parameter-value pairs separated by ampersands.

?param1=value2&param2=value2&... protocol://user:password@host:port/path/file?query# fragment

#fragment specifies a position within the document (defined by an anchor).

TO KNOW MORE

If you want to take a look at the specifications of the complete format, see the following sites:

RFC 1738. Uniform Resource Locator

RFC 3986. Uniform Resource Identifier (pay attention to section 1.1.3)

AF_IC01_U1. HTTP protocol and web servers / Unit 1 6

Web servers

HTTP is used to transfer resources. These resources, in addition to files, can be the result of a program execution, a query to a database, auto matic translation of a document, etc…

Therefore, for a web server, resources can be:

 files or

the result of a program execution

A web server is a server with a software able to accept HTTP requests from clients (known as web browsers), and deliver the web content.

The pages delivered by the server can be:

Static : there is an existing document (HTML file) in the file system.

Dynamic : the document is dynamically generated by a script or program executed by the web server.

Example: PHP, ASP, JSP pages .

Activity : My first web application

AF_IC01_U1. HTTP protocol and web servers / Unit 1 7

Web applications

Web applications are applications called by the web server or the browser in order to generate dynamic web pages.

Two types must be distinguished:

Applications on the client side:

The web client (browser) executes the code provided by the web server.

The browser must have the capacity to run applications (also called scripts). Modern browsers allow to you do that.

Programming language are usually Javascript or Flash (also Java applets).

Applications on the server side:

The web server executes the web application and generates the dynamic web page.

The generated web page is sent to the client using the HTTP protocol

Applications Advantages

Client side

Server side

If the application is loaded into the client, traffic can be reduced between the server and the client using modern technologies (AJAX).

The host machine does not need any additional capacity. They can be light clients.

Three levels (3-tier) can be distinguished in web applications, where each one provides a specific functionality. These 3 tiers are:

First tier : presentation layer which includes the browser and the web server.

Second tier : a program or script capable of generating some web content.

Third tier : provides access to databases.

Server

Client

Web browser

Web server

2nd tier

Script or application

File system

1st tier

AF_IC01_U1. HTTP protocol and web servers / Unit 1

Data base

3rd tier

8

This architecture is only used in dynamic pages. In static pages only the first tier is used, in order to access to the file system to retrieve some HTML file. In dynamic pages, the next scheme is followed:

1. Retrieve user data (1st level)

2. User data is used by the server, which executes a program or script (2nd level) in order to access to a database (3rd level).

3. A new web page is generated by this process and the result is sent to the browser (1st level again)

General scheme of web technologies :

Client

Browser Web server

HTML

XML

JavaScript

Applet

Flash

Apache

IIS

Tomcat

Server

Programming language

JSP

ASP

PHP

Servlets

CGI → application

Data

Database

MySQL

MSSQL

Oracle

PostgreSQL

TO KNOW MORE

A web server survey about the market share of the most significative web servers can be found at

Netcraft January 2013 Web Server Survey

Activity : Practice. LAMP server configuration

AF_IC01_U1. HTTP protocol and web servers / Unit 1 9

HTTP security

One of the most weaknesses of HTTP protocol is security. All the information which travels on the Internet is unencrypted.

In order to secure this protocol, HTTPS was developed using the SSL/TLS protocols, which provides cryptographic and authentication protocols. It uses port 443.

Not only must the communication be encrypted, but also a certification of who is sending the data is necessary. A trusted third party ( certification authority ) creates those certifications.

Information about these CA can be viewed through the browser.

AF_IC01_U1. HTTP protocol and web servers / Unit 1 10

HTTP transactions

A simple HTTP transaction HTTP could be:

1. A client requests a web page

2. The server responds sending the requested resource

Basically, these transactions are made by two methods: the request method and the response method. Both consists of a header and a body.

There are several request methods (GET, POST, HEAD…) but they have a common format.

The format of the initial line is 3 fields separated by blank spaces: method resource version_of_protocol

Example: GET http://www.xtec.cat/web/guest/home HTTP/1.0

TO KNOW MORE

To become familiar with the GET and POST method and find out differences between them, take a look at

HTTP methods

The response method is quite similar. The format of the initial line is as follows: version_of_protocol response_code message

Example: HTTP/1.0 403 Forbidden

In this case, the response code could be very useful when a problem appears. They are classified in ranges and every number is related to a type of error.

Range Meaning

100 - 199

200 - 299

Informational

OK

300 - 399

400 - 499

Redirection

Client Error

500 - 599 Server Error

Example: when a file is not found because you have mistyped or copied it wrong, a 404 error

(Not Found) is sent by the server .

AF_IC01_U1. HTTP protocol and web servers / Unit 1 11

Activity : Listen and Watch. HTTP 500 Internal Server Error.

AF_IC01_U1. HTTP protocol and web servers / Unit 1 12

Bibliography

 Instal·lació i manteniment de serveis d’Internet. Institut Obert de Catalunya (IOC). Edició

2006

Instal ·lació i manteniment de serveis d’Internet. Editorial McGraw-Hill. Edició 2006

RFC 1945 . Hypertext Transfer Protocol - HTTP/1.0

RFC 2616 . Hypertext Transfer Protocol - HTTP/1.1

AF_IC01_U1. HTTP protocol and web servers / Unit 1 13

Download