Application Layer

advertisement
IT For Engineers
Application and Transport Layers
INFO 203
Dr. Jennifer Booker
INFO 203
Week #6
1
Application Layer


The Application Layer is the reason the rest of
the network exists – to serve applications
Most of the software familiar to end users are
applications


Email, FTP, newsgroups, chat, the Web, streaming
video, video conferencing, IPTV, etc.
We focus first on key concepts related to the
Application Layer, then discuss some specific
applications briefly
INFO 203
Week #6
2
Application Layer

New applications designed for network
implementation need to decide whether
the application is based on



INFO 203
Client-server architecture
Peer to peer (P2P)
Or some hybrid combination of the two
Week #6
3
Client-server Architecture

In client-server architecture, the server




Handles requests from many clients, and
Is generally always available
Often has a fixed IP address
Clients generally don’t communicate with each
other, and may be on or off independently of
each other and the server

INFO 203
Client-server applications include email, FTP,
the Web, remote login
Week #6
4
P2P Architecture

P2P architecture assumes the clients are on
or off at will, and all are treated equally as
potential servers and/or clients


INFO 203
Apps include BitTorrent, Skype, and IPTV
Client-server and P2P combinations exist, called
a Hybrid Architecture
Week #6
5
Process Communication

Any network application (no matter which
architecture) needs to communicate between
hosts using processes



INFO 203
In this sense, a process is a program running on a
client, server, or peer host
Processes may communicate with other
processes on the same host; this is controlled by
the host’s operating system (OS)
We are interested in processes that communicate
between hosts
Week #6
6
Process Communication

Processes exchange messages



The sending or client process creates a message
and sends it into the network
The receiving or server process gets the message
from the network and might reply
Notice that client and server process only
relate to their relative roles in sending a
message, not the client-server or other
architectures mentioned earlier
INFO 203
Week #6
7
Addressing Processes


For the server process to get the message, it
has to be addressed correctly
The host address and receiving process are
the key parts of the address


INFO 203
The host address is its IP address (the 32or 128-bit address of the host’s network interface)
The receiving process is identified by its
port number, since many processes can be
running at once
Week #6
8
Addressing Processes
Client process
Server process
IP address
Socket
Port
TCP or UDP
and lower
Layers
Internet
Sockets send packets
INFO 203
TCP or UDP
and lower
Layers
Ports listen for them
Week #6
9
Port Number

Port numbers follow default values, set by
the IANA, unless specified otherwise







INFO 203
21 = FTP
23 = Telnet
25 = SMTP
53 = DNS
80 = HTTP, http://mine.com implies http://mine.com:80
110 = POP3
194 = IRC, and hundreds more
Week #6
10
More Protocols

Application-layer protocols define how a
particular application’s processes are
structured




INFO 203
What types of messages are allowed
The syntax of those messages
The meaning of the fields in the syntax
Rules for processing messages – when and
how to send messages, how to reply, etc.
Week #6
11
Application vs its protocols

A single application often needs to use
several application-layer protocols



A web browser might use HTTP, but also FTP,
telnet, gopher, etc.
An email application might use POP3, SMTP,
IMAP, etc.
Many app protocols are defined in RFCs

INFO 203
But many application-layer protocols are
proprietary
Week #6
12
RFC Summary

The “Internet Official Protocol Standards”
RFC used to identify the current standards
(STD) for every protocol


INFO 203
As a result of RFC 7100, that information is on a
website http://www.rfc-editor.org/search/standards.php
For example, STD 9 is the standard for FTP
Week #6
13
Application Services



The transport layer connects the application
layer to everything else
Have a choice of two protocols, TCP and
UDP, unless you want to write your own!
Key services include

INFO 203
Reliable data transfer – how important is it?
Or is your app loss-tolerant?
Week #6
14
Application Services

How much bandwidth or throughput does your
app need?



How sensitive is your app to timing?


Does sending rate have to equal receiving rate?
Some apps are elastic – can tolerate wide
ranges of available bandwidth
Games and telephony tend to be sensitive to
slow or erratic transmission delays
How important is security?
INFO 203
Week #6
15
TCP Services

TCP provides a connection-oriented service,
where the sockets of the client and server
recognize a connection for the duration of the
session


INFO 203
Connection is duplex – messages can go both
ways at once
TCP is highly reliable – the bits leaving one side
all get to the other side, and get put back in the
original order
Week #6
16
TCP Services

TCP also provides congestion control, for benefit
of the Internet



This throttles the sending processes when the
connection is congested, and can limit bandwidth
TCP does not guarantee any level of
transmission rate, or provide delay guarantees
So you’ll get your data across, but we
don’t know when
INFO 203
Week #6
17
UDP Services

UDP is a lightweight protocol – meaning it
doesn’t do much!




INFO 203
UDP is connectionless
UDP is unreliable – data may never get there
UDP packets may arrive out of order and not
realize it
There are no transmission rate guarantees
Week #6
18
Services NOT Provided


TCP and UDP do not provide guarantees of
throughput or timing
TCP does nothing for security per se, but
SSL can be added on

INFO 203
See Chapter 7 in INFO 331
Week #6
19
Application Protocols

We’ll examine protocols for Internet-based
applications






INFO 203
HTTP
FTP
SMTP
POP3
IMAP
DNS
Week #6
20
HTTP

The HyperText Transfer Protocol (HTTP)
is the heart of the Web




Defined by RFCs 1945 (v1.0) and 2616 (v1.1)
Has client and server programs which
communicate via HTTP messages
Web pages contain objects – files of various
sorts, such as a base HTML file, which cites
JPG and/or GIF images, etc.
App to use HTTP is a browser
INFO 203
Week #6
21
HTTP

A Web server houses the objects


Apache and Microsoft Internet Information
Services (IIS) are common Web server apps
HTTP defines the messages that pass
between client and server


INFO 203
Uses TCP for transport protocol
HTTP has no memory of previous actions (a
stateless protocol) – so if you ask for a file 126
times, it will send the file 126 times
Week #6
22
HTTP vs HTML

Don’t confuse HTTP with HTML



HTTP is the protocol used to define how files
are requested and transferred between server
and clients
HTML is the format of web pages
So an HTML file might be the structure of
an entity body transferred using HTTP
INFO 203
Week #6
23
HTTP Messages

HTTP messages are two types, request
messages (from client) and response
messages (from server)

All HTTP messages are plain ASCII text


INFO 203
‘Both types of message consist of a start-line, zero or
more header fields (also known as "headers"), an
empty line (i.e., a line with nothing preceding the
CRLF) indicating the end of the header fields, and
possibly a message-body.’ [RFC 2616, para 4.1]
CRLF is a “carriage return and line feed”
Week #6
24
HTTP Messages

There are many headers which could appear
in requests or responses

Cache-Control, Connection, Date, Pragma,
Trailer, Transfer-Encoding, Upgrade, Via, and/or
Warning [RFC 2616, para 4.5]
Disclaimer: RFC 2616 is 176 pages long – so
we’re just providing a summary!
INFO 203
Week #6
25
HTTP Requests


Request messages have variable number
of lines, depending on the method called
General request syntax is


Method Request-URI HTTP-Version
Methods are OPTIONS, GET, HEAD, POST, PUT,
DELETE, TRACE, or CONNECT
[RFC 2616, para 5.1.1]


INFO 203
Most commonly used is GET
Request-URI is the desired Uniform Resource
Identifier (URI, commonly called a URL)
Week #6
26
HTTP Requests


HTTP-Version is what it sounds like, e.g.
HTTP/1.1
There are many possible request headers

INFO 203
Accept, Accept-Charset, Accept-Encoding,
Accept-Language, Authorization, Expect, From,
Host, If-Match, If-Modified-Since, If-None-Match,
If-Range, If-Unmodified-Since, Max-Forwards,
Proxy-Authorization, Range, Referer, TE
(extension transfer-codings), and/or User-Agent
[RFC 2616, para 5.3]
Week #6
27
HTTP Responses


HTTP responses go from server to client
General syntax starts with


HTTP-Version Status-Code Reason-Phrase
[RFC 2616, para 6.1]
The Status-Code could be dozens of values




INFO 203
"200" OK
"403" Forbidden
"404" Not Found
The Reason-Phrase is any text phrase assigned
Week #6
28
HTTP Responses

Response headers can include


Accept-Ranges, Age, ETag, Location,
Proxy-Authenticate, Retry-After, Server, Vary,
and/or WWW-Authenticate [RFC 2616,
para 6.2]
Responses usually include entities, unless
the HEAD method was used
INFO 203
Week #6
29
HTTP Entities


An entity is the object sent or returned with an
HTTP message
Entities can be with requests or responses

Entity headers include Allow, Content-Encoding,
Content-Language, Content-Length (bytes), ContentLocation, Content-MD5, Content-Range, ContentType, Expires, Last-Modified, and/or extensionheader [RFC 2616, para 7.1]

INFO 203
Where extension-header is any allowable
message-header for that kind of message
Week #6
30
HTTP

So HTTP describes request and response
message formats



INFO 203
Both types typically have a first line which
tells its purpose (the request or status line)
There can be many header lines
There might be an entity attached
Week #6
31
FTP


The File Transfer Protocol is one of the oldest
Internet applications (now RFC 959, but started
as RFC 114 in 1971)
While HTTP and FTP both send files

FTP uses two connections – one for control, one for
data (control information is out-of-band)


INFO 203
User login and commands are on the control connection, files
move on the data connection
HTTP uses one connection for both purposes (control
information is in-band)
Week #6
32
FTP


FTP uses TCP, and usually connects to the
server on ports 20 and 21
The client sends user ID and password


FTP may be done to some sites with generic ID,
known as anonymous FTP
Once logged in, the user may navigate and
view directories, and upload (STOR or PUT)
or download (RETR or GET) files
INFO 203
Week #6
33
Electronic Mail


E-mail is another ancient Internet application,
with origins in RFC 772 in 1980
It provides asynchronous text communication
and allows files to be attached to messages


Even voice and video messages
Main elements are users (sender and recipient),
mail servers, and the Simple Mail Transfer
Protocol (SMTP, RFC 5321)

INFO 203
Careful, there’s also an SNTP for network time
Week #6
34
Electronic Mail


Email is composed in a client, which sends it to
a mail queue in the sender’s mail server
The sending mail server uses SMTP to send the
message to the recipient’s mail server


If mail can’t be sent successfully, the sender’s mail
server will put the message in a queue, and keep
trying (typically for 3 days)
The recipient is notified that the message is
present, which they read with their client
INFO 203
Week #6
35
Electronic Mail

Each user has a mailbox on the mail server


Access to the mailbox is controlled with user name
and password
SMTP is the main protocol to get email from one
mail server to another



It uses TCP, not surprisingly
Defined in draft standard RFC 5321
Only uses 7-bit ASCII for message AND body

INFO 203
Forces binary files to be converted to ASCII & back
Week #6
36
Mail Message Formats

Email contains header information defined
by RFC 822, now RFC 5322 “Internet
Message Format”



INFO 203
The sender headers can include: FROM,
SENDER, REPLY-TO, RESENT-FROM,
RESENT-SENDER, and RESENT-REPLY-TO
Receiver headers can be: TO, CC, and BCC
Reference headers can be: MESSAGE-ID, INREPLY-TO, REFERENCES and KEYWORDS
Week #6
37
MIME


Multipurpose Internet Mail Extensions (MIME)
are used for handling non-ASCII contents in
email, e.g. non-Latin character sets, binary files,
images, audio, video, etc.
MIME (RFC 2045) adds the ability to handle

INFO 203
(1) textual message bodies in character sets other
than US-ASCII, (2) an extensible set of different
formats for non-textual message bodies, (3) multi-part
message bodies, and (4) textual header information in
character sets other than US-ASCII.
Week #6
38
MIME


The received message also includes a
Received: header added to the top of
the message
This is familiar in email if you look at the
full headers
INFO 203
Week #6
39
Mail Access Protocols



If you log directly into your email server,
SMTP is all you need to handle email
But if you wish to access email from a local
host, you need to use a mail access protocol
The biggies at present are


INFO 203
Post Office Protocol version 3 (POP3) and
Internet Mail Access Protocol (IMAP)
Week #6
40
POP3

POP3 is defined in RFC 1939




It’s a pretty simple protocol compared to many
SMTP sends mail between mail servers,
and from the user agent (email app) to their mail
server
POP3 transfers mail from your mail server
to your user agent
From a user’s view, SMTP handles outgoing
email, and POP3 handles incoming email
INFO 203
Week #6
41
IMAP

IMAP, defined in RFC 3501, allows folders to
be defined on the mail server to organize
email there



Messages are associated with a folder – first the
generic INBOX, then moved by the user
Hence state information about the folder for each
message must be saved across sessions
IMAP also provides search capability
within the mailbox
INFO 203
Week #6
42
DNS


A key need, once the Internet grew beyond a
few thousand hosts, was to automate converting
human* readable addresses or hostnames
(www.microsoft.com) to IP addresses
(207.46.198.60) got IP here
That is the purpose of the Domain Name System
(DNS)

Before DNS, really big lookup tables were used!
* Humans who read English, at least!
INFO 203
Week #6
43
Host vs Domain Names

A hostname is the name of a particular host
computer, such as banner.drexel.edu



May really represent multiple computers, but logically
they are all the same host
A domain name is the top level domain and the
specific domain name, like drexel.edu
Top level domains are com, edu, gov, mil, org,
net, etc. and the country codes uk, de, fr, etc.
INFO 203
Week #6
44
IP Addresses

IP addresses have four groups of bytes, each
group from 0 to 255, separated by periods


Why called bytes? Each value from 0 to 255
corresponds to a value of from 0 to (28-1), and
a byte is eight bits
IP addresses are typically static (fixed) for
servers and other semi-permanent Internet
connections, and dynamic for temporary
connections (e.g. dial-up, wireless)
INFO 203
Week #6
45
DNS



DNS runs over UDP, port 53 (something uses UDP!)
DNS is managed by DNS servers, typically
running Berkeley Internet Name Domain
(BIND) software
DNS is used by other applications (HTTP, SMTP,
FTP) to translate host names to IP addresses

INFO 203
You can also do a reverse DNS lookup (convert
205.188.97.2 to www-vd03.evip.aol.com)
Week #6
46
DNS

DNS also provides other key services

Host aliasing allows the true or canonical
hostname to have aliases



INFO 203
When blah.com works to get to www.blah.com, it’s
because blah.com is a host alias of www.blah.com
Mail server aliasing – same concept, but for
mail server names
Load distribution across many servers for the
same hostname – so everyone in the world
doesn’t use one IP address for microsoft.com
Week #6
47
DNS Lookup

This would be terribly tedious without caching


INFO 203
Common queries are stored on each level of DNS
server, so they don’t have to be looked up
constantly
Cached values are cleared typically every two
days or less, in case the data changes
Week #6
48
nslookup


The command nslookup provides basic IP
data for a hostname or domain
Nslookup snip.net





Server: ns2.snip.net
Address: 209.204.64.3
Name:
snip.net
Address: 216.83.103.123
A registrar makes changes to the DNS
database

INFO 203
The list of registrars is at http://www.internic.net/
Week #6
49
Transport Layer

The Transport Layer handles logical
communication between processes


INFO 203
It’s the last layer not used between processes for
routing, so it’s the last thing a client process and
the first thing a server process sees of a packet
By logical communication, we recognize that the
means used to get between processes, and the
distance covered, are irrelevant
Week #6
50
Transport vs Network

Notice we didn’t say ‘hosts’ in the previous
slide…that’s because

INFO 203
The network layer provides logical communication
between hosts
Week #6
51
Two Choices

Here we choose between TCP and UDP



In the transport layer, a packet is a segment
In the network layer, a packet is a datagram
The network layer is home to the Internet
Protocol (IP)


INFO 203
IP provides logical communication between hosts
IP makes a “best effort” to get segments where they
belong – no guarantees of delivery, or delivery
sequence, or delivery integrity
Week #6
52
IP


Each host has an IP address
Common purpose of UDP and TCP is extend
delivery of IP data to the host’s processes



This is called transport-layer multiplexing and
demultiplexing
Both UDP and TCP also provide error checking
That’s it for UDP – data delivery and error
checking!
INFO 203
Week #6
53
TCP

TCP also provides reliable data transfer (not
just data delivery)


Uses flow control, sequence numbers,
acknowledgements, and timers to ensure
data is delivered correctly and in order
TCP also provides congestion control

TCP applications share the available bandwidth
(they watched Sesame Street!)

INFO 203
UDP takes whatever it can get (greedy little protocol)
Week #6
54
Segment Header


Hence the segment header starts with the
source and destination port numbers
Each port number is a 16-bit (2 byte) value
(0 to 65,535)


Well known port numbers are from 0 to 1023 (210 1)
After the port numbers are other headers,
specific to TCP or UDP, then the message
INFO 203
Week #6
55
UDP


The most minimal transport layer has to
do multiplexing and demultiplexing
UDP does this and a little error checking
and, well, um, that’s about it!




INFO 203
UDP was defined in RFC 768
An app that uses UDP almost talks directly to IP
Adds only two small data fields to the header, after the
requisite source/destination addresses
There’s no handshaking; UDP is connectionless
Week #6
56
UDP for DNS


DNS uses UDP
A DNS query is packaged into a segment,
and is passed to the network layer


The DNS app waits for a response; if it doesn’t
get one soon enough (times out), it tries another
server or reports no reply
Hence the app must allow for the unreliability
of UDP, by planning what
to do if no response comes back
INFO 203
Week #6
57
UDP Advantages

Still UDP is good when:




You want the app to have detailed control over what is
sent across the network; UDP changes it little
No connection establishment delay
No connection state data in the end hosts; hence a
server can support more UDP clients than TCP
Small packet header overhead per segment

INFO 203
TCP uses 20 bytes of header data, UDP only 8 bytes
Week #6
58
UDP Apps

Other than DNS, UDP is also used for





Network management (SNMP)
Routing (RIP)
Multimedia & telephony (proprietary protocols)
Remote file server (NFS)
The lack of congestion control in UDP can be
a problem when lost of large UDP messages
are being sent – can crowd out
TCP apps
INFO 203
Week #6
59
Checksum



Noise in the transmission lines can lose
bits of data or rearrange them in transit
Checksums are a common method to
detect errors (RFC 1071)
To create a checksum:



INFO 203
Find the sum of the binary digits of the message
The checksum is the 1s (ones) complement of
the sum
If message is uncorrupted, sum of message plus
checksum is all ones 1111111111111…
Week #6
60
1s Complement?

The 1s complement is a mirror image of a
binary number – change all the zeros to
ones, and ones to zeros


So the 1s complement of 00101110101 is
11010001010
UDP does error checking because not all
lower layer protocols do error checking

INFO 203
This provides end-to-end error checking, since it’s
more efficient than every step along the way
Week #6
61
Reliable Data Transfer
Mechanisms






INFO 203
Checksum, to detect bit errors in a packet
Timer, to know when a packet or its ACK was lost
Sequence number, to detect lost or duplicate packets
Acknowledgement, to know packet got to receiver
correctly
Negative acknowledgement, to tell packet was
corrupted but received
Window, to pipeline many packets at once before an
ACK was received for any of them
Week #6
62
TCP Intro

Now see how all this applies to TCP



TCP starts with a handshake protocol, which
defines many connection variables



First in RFC 793, now RFC 2581
Invented circa 1974 by Vint Cerf and Robert Kahn
Connection only at hosts, not in between
Routers are oblivious to whether TCP is used!
TCP is a full duplex service – data can flow both
directions at once, and is connection-oriented
INFO 203
Week #6
63
TCP Segment Structure

A TCP segment consists of header fields
and a data field


The data field size is limited by the MSS
Typical header size is 20 bytes

INFO 203
The header is 32 bits wide (4 bytes), so it has
five lines at a minimum
Week #6
64
TCP Header Structure

The header lines are






INFO 203
Source and destination port numbers (16 bit ea.)
Sequence number (32 bit)
ACK number (32 bit)
A bunch of little stuff (header length, URG, ACK, PSH,
RST, SYN, and FIN bits), then the receive window (16
bit)
Internet checksum, urgent data pointer (16 bit ea.)
And possibly several options
Week #6
65
TCP Segment Structure



We’ve seen the port numbers (16 bits each)
Sequence and ACK numbers (32 bits each)
keep track of pieces of a file
The ‘bunch of little stuff’ includes


Header length (4 bits)
A flag field includes six one-bit fields: ACK, RST, SYN,
FIN, PSH, and URG


The URG bit marks urgent data later on that line
The receive window is used for flow control
INFO 203
Week #6
66
TCP Segment Structure

The checksum is used for bit error detection,
as with UDP


The urgent data pointer tells where the urgent
data is located
The options include negotiating the MSS,
scaling the window size, or time stamping
INFO 203
Week #6
67
Telnet Example


Telnet (RFC 854) is an old app for remote
login via TCP
Telnet interactively echoes whatever was
typed to show it got to the other side
INFO 203
Week #6
68
Timeout Calculation

We want the timeout interval larger than
EstimatedRTT, but not huge; use




TimeoutInterval = EstimatedRTT + 4*DevRTT
EstimatedRTT is a running average RTT
DevRTT is a running standard deviation for RTT
Timeout interval is constantly being
calculated, with frequent measurement of
SampleRTT to find current values for:

INFO 203
Estimated RTT, DevRTT, & TimeoutInterval
Week #6
69
Flow Control

TCP connection hosts maintain a receive
buffer, for bytes received correctly and
in order


Apps might not read from the buffer for a while, so
it can overflow
Flow control focuses on preventing overflow
of the receive buffer

INFO 203
So it also depends on how fast the receiving
app is reading the data!
Week #6
70
Flow Control

Hence the sender in TCP maintains a receive
window (RcvWindow) variable – how much
room is left in the receive buffer

The amount of room in RcvWindow is returned
to the sender in the receive window field of
every segment

If the RcvWindow goes to zero, the sender can’t
send more data to the receiver ever!
To prevent this, TCP makes the sender transmit
one byte messages when RcvWindow is zero,

INFO 203
Week #6
71
UDP Flow Control


There ain’t none (sic!)
UDP adds newly arrived segments to a buffer
in front of the receiving socket


INFO 203
If the buffer gets full, segments are dropped
Bye-bye data!
Week #6
72
Congestion Control

Now address congestion control issues



Congestion is a traffic jam in the middle of the network
somewhere
Most common cause is too many sources
sending data too fast into the network
Key lessons are:

INFO 203
A congested network forces retransmissions for
packets lost due to buffer overflow, which adds to
the congestion
Week #6
73
Congestion Control

And:



INFO 203
A congested network can waste its bandwidth
by sending duplicate packets which weren’t
lost in the first place
Dropping a packet wastes the transmission
capacity of every upstream link that packet
saw
If loss and transmission delay are small,
CongWin bytes of data can be sent every
RTT, for a send rate of CongWin/RTT
Week #6
74
Fairness

Unequal connections are less fair



INFO 203
Lower RTT gets more bandwidth (CongWin
increases faster)
UDP traffic can force out the more polite
TCP traffic
Multiple TCP connections from a single host
(e.g. from downloading many parts of a Web
page at once) get more bandwidth
Week #6
75
Are We Done Yet?

So we’ve covered transport layer protocols
from the terribly simple UDP to a seemingly
exhaustive study of TCP


INFO 203
Key features along the way include
multiplexing/demultiplexing, error detection,
acknowledgements, timers, retransmissions,
sequence numbers, connection management,
flow control, end-to-end congestion control
So much for the “edge” of the Internet; next is
the network layer, to start looking at the core
Week #6
76
Download