Table of contents
Introduction ............................................................................................................................. 2
Prerequisites ........................................................................................................................... 3
HTTP (HyperText Transfer Protocol) ....................................................................................... 4
URL / URI ................................................................................................................................ 5
Web servers ............................................................................................................................ 7
Web applications ..................................................................................................................... 8
HTTP security ....................................................................................................................... 10
HTTP transactions ................................................................................................................ 11
Bibliography .......................................................................................................................... 13
AF_IC01_U1. HTTP protocol and web servers / Unit 1 1
In this unit you will learn about the HTTP protocol, which enables you to surf on the Internet.
You will see the most relevant features, and learn what a URL is. Web servers are the other topic. You will understand what a web server is, its purpose and also how to install and configure it. Security and transactions will be explained at the end of the unit.
AF_IC01_U1. HTTP protocol and web servers / Unit 1 2
Some of prerequisites needed in this unit:
OSI and TCP/IP stack
Knowledge of HTML
Database concepts
AF_IC01_U1. HTTP protocol and web servers / Unit 1 3
HTTP is a protocol designed to transfer hypertext (text that can be linked to other texts). The most common is the use of HTML pages, but also other f ormats like text files, images, etc…can be transferred.
TO KNOW MORE
Some more information about HTML at Wikipedia
Wikipedia. HTML
The most important features are:
Application layer protocol.
Using Universal Resource Identifier (URI), specifically Universal Resource Locator
(URL), defined further, which permits you to identify every resource on the Internet.
Client-Server architecture (request/response paradigm).
Default port is 80.
Communication works over TCP (transport layer), but also can be used over UDP.
Connectionless and stateless protocol
The server responds only to the current request, and remains unrelated to other connections.
A connection is set for every requested file (since version 1.1, a keep-alivemechanism was introduced, allowing to reuse connections). For instance: if a webpage has 2 images, 3 connections are needed: one for the HTML page, and one for each image .
Open to new data types.
Use of the MIME (Multipart Internet Mail Extension) in order to determine the type of data (designed for STMP protocol, but also used with HTTP).
Although HTTP is a connectionless and stateless protocol, there are some ways to provide memory , that is, to remember what pages are related (identification on a website…):
Cookies (web cookie, browser cookie). RFC 6265
Small piece of data stored on your own computer that a website can read when a connection is established. With these cookies, information can be retrieved and also users ’ activity can be recognized.
Cookies can install viruses neither malware, but they can compile a lot of information
(passwords for example)
HTTP authentication . RFC 2617
Use username and password to log into a web server.
Store data on the server (IP address…).
Embed a query in the URL
Example: … moodle2/course/view.php?id=16, where 16 indicates the number of the course
AF_IC01_U1. HTTP protocol and web servers / Unit 1 4
URL stands for Uniform Resource Locator, and URI stands for Uniform Resource Identifier. In
HTTP protocol, normally the term URL is used. Both are a string that assigns a unique address to each resource available on the Internet.
URL
Every resource on the Internet is identified by a unique address, the URL.
The resource URL is its Internet address, and allows the browser to find and display it correctly.
It is a combination of:
Protocol
Host
Path
Filename
In this case, the format is: protocol://host/folder/file. Example: http://ca.wikipedia.org/wiki/HTTP .
But there are more parameters. Therefore, the complete format is: protocol://user:password@host:port/path/file?query#fragment
Let’s see a more detailed explanation of every parameter. protocol ://user:password@host:port/path/file?query#fragment
Examples of protocols that can be used to retrieve data:
http: Hypertext Transfer Protocol
https: HTTP over SSL
gopher: The Gopher protocol
ftp: File Transfer Protocol
mailto: Electronic mail address
ldap: LDAP (Lightweight Directory Access Protocol)
file: Host-specific file names
news: USENET news
nntp: USENET news using NNTP access
telnet: Reference to interactive sessions
wais: Wide Area Information Servers
prospero: Prospero Directory Service protocol:// user:password @host:port/path/file?query#fragment
user:password specifies the user and the password on the server.
Careful! The password is transferred visibly. protocol://user:password@ host:port /path/file?query#fragment
AF_IC01_U1. HTTP protocol and web servers / Unit 1 5
host:port specifies the transport address, that is, the host machine and the service requested.
The host machine can be defined by its IP address or by a DNS name.
By default, port 80. protocol://user:password@host:port/ path /file?query#fragment
Indicates the path of the file. This is the path from the browser view.
To know where the file is located on the server, you must add the root directory at the beginning.
Example: http://www.domain.cat/path/file.html
Root directory: /var/www/htdocs
Location on the server: /var/www/htdocs/path/file.html
protocol://user:password@host:port/path/ file ?query#fragment
The file itself could either be an HTML file or a web programming language file
(explained in the next section). protocol://user:password@host:port/path/file?
query #fragment
The query is used to pass parameters to the server.
It is a list of parameter-value pairs separated by ampersands.
?param1=value2¶m2=value2&... protocol://user:password@host:port/path/file?query# fragment
#fragment specifies a position within the document (defined by an anchor).
TO KNOW MORE
If you want to take a look at the specifications of the complete format, see the following sites:
RFC 1738. Uniform Resource Locator
RFC 3986. Uniform Resource Identifier (pay attention to section 1.1.3)
AF_IC01_U1. HTTP protocol and web servers / Unit 1 6
HTTP is used to transfer resources. These resources, in addition to files, can be the result of a program execution, a query to a database, auto matic translation of a document, etc…
Therefore, for a web server, resources can be:
files or
the result of a program execution
A web server is a server with a software able to accept HTTP requests from clients (known as web browsers), and deliver the web content.
The pages delivered by the server can be:
Static : there is an existing document (HTML file) in the file system.
Dynamic : the document is dynamically generated by a script or program executed by the web server.
Example: PHP, ASP, JSP pages .
Activity : My first web application
AF_IC01_U1. HTTP protocol and web servers / Unit 1 7
Web applications are applications called by the web server or the browser in order to generate dynamic web pages.
Two types must be distinguished:
Applications on the client side:
The web client (browser) executes the code provided by the web server.
The browser must have the capacity to run applications (also called scripts). Modern browsers allow to you do that.
Programming language are usually Javascript or Flash (also Java applets).
Applications on the server side:
The web server executes the web application and generates the dynamic web page.
The generated web page is sent to the client using the HTTP protocol
Applications Advantages
Client side
Server side
If the application is loaded into the client, traffic can be reduced between the server and the client using modern technologies (AJAX).
The host machine does not need any additional capacity. They can be light clients.
Three levels (3-tier) can be distinguished in web applications, where each one provides a specific functionality. These 3 tiers are:
First tier : presentation layer which includes the browser and the web server.
Second tier : a program or script capable of generating some web content.
Third tier : provides access to databases.
Server
Client
Web browser
Web server
2nd tier
Script or application
File system
1st tier
AF_IC01_U1. HTTP protocol and web servers / Unit 1
Data base
3rd tier
8
This architecture is only used in dynamic pages. In static pages only the first tier is used, in order to access to the file system to retrieve some HTML file. In dynamic pages, the next scheme is followed:
1. Retrieve user data (1st level)
2. User data is used by the server, which executes a program or script (2nd level) in order to access to a database (3rd level).
3. A new web page is generated by this process and the result is sent to the browser (1st level again)
General scheme of web technologies :
Client
Browser Web server
HTML
XML
JavaScript
Applet
Flash
…
↔
Apache
IIS
Tomcat
…
Server
Programming language
JSP
ASP
PHP
Servlets
CGI → application
…
↔
Data
Database
MySQL
MSSQL
Oracle
PostgreSQL
…
TO KNOW MORE
A web server survey about the market share of the most significative web servers can be found at
Netcraft January 2013 Web Server Survey
Activity : Practice. LAMP server configuration
AF_IC01_U1. HTTP protocol and web servers / Unit 1 9
One of the most weaknesses of HTTP protocol is security. All the information which travels on the Internet is unencrypted.
In order to secure this protocol, HTTPS was developed using the SSL/TLS protocols, which provides cryptographic and authentication protocols. It uses port 443.
Not only must the communication be encrypted, but also a certification of who is sending the data is necessary. A trusted third party ( certification authority ) creates those certifications.
Information about these CA can be viewed through the browser.
AF_IC01_U1. HTTP protocol and web servers / Unit 1 10
A simple HTTP transaction HTTP could be:
1. A client requests a web page
2. The server responds sending the requested resource
Basically, these transactions are made by two methods: the request method and the response method. Both consists of a header and a body.
There are several request methods (GET, POST, HEAD…) but they have a common format.
The format of the initial line is 3 fields separated by blank spaces: method resource version_of_protocol
Example: GET http://www.xtec.cat/web/guest/home HTTP/1.0
TO KNOW MORE
To become familiar with the GET and POST method and find out differences between them, take a look at
HTTP methods
The response method is quite similar. The format of the initial line is as follows: version_of_protocol response_code message
Example: HTTP/1.0 403 Forbidden
In this case, the response code could be very useful when a problem appears. They are classified in ranges and every number is related to a type of error.
Range Meaning
100 - 199
200 - 299
Informational
OK
300 - 399
400 - 499
Redirection
Client Error
500 - 599 Server Error
Example: when a file is not found because you have mistyped or copied it wrong, a 404 error
(Not Found) is sent by the server .
AF_IC01_U1. HTTP protocol and web servers / Unit 1 11
Activity : Listen and Watch. HTTP 500 Internal Server Error.
AF_IC01_U1. HTTP protocol and web servers / Unit 1 12
Instal·lació i manteniment de serveis d’Internet. Institut Obert de Catalunya (IOC). Edició
2006
Instal ·lació i manteniment de serveis d’Internet. Editorial McGraw-Hill. Edició 2006
RFC 1945 . Hypertext Transfer Protocol - HTTP/1.0
RFC 2616 . Hypertext Transfer Protocol - HTTP/1.1
AF_IC01_U1. HTTP protocol and web servers / Unit 1 13