Department of Engineering Science ES465/CES 440, Intro. to Networking & Network Management Traditional Internet Applications http://www.sonoma.edu/users/k/kujoory References • “Computer Networks & Internet,” Douglas Comer, 6th ed, Pearson, 2014, Ch 4, Textbook, 5th ed, slides by Lami Kaya (LKaya@ieee.org) with some changes. • “Computer Networks,” A. Tanenbaum, 5th ed., Prentice Hall, 2011, ISBN: 13:978013212695-3. • “Computer & Communication Networks,” Nader F. Mir, 2nd ed, Prentice Hall, 2015, ISBN: 13: 9780133814743. • “Data Communications Networking,” Behrouz A. Forouzan, 4th ed, Mc-Graw Hill, 2007 • “Data & Computer Communications,” W. Stallings, 7th ed., Prentice Hall, 2004. • “Computer Networks: A Systems Approach," L. Peterson, B. Davie, 4th Ed., Morgan Kaufmann 2007. Ali Kujoory 6/30/2016 Not to be reproduced without permission 1 Topics Covered • • • • • • • • • • • • • • 4.1 Introduction 4.2 Application-Layer Protocols 4.3 Representation & Transfer 4.4 Web Protocols 4.5 Document Representation with HTML 4.6 Uniform Resource Locators & Hyperlinks 4.7 Web Document Transfer with HTTP 4.8 Caching in Browsers 4.9 Browser Architecture 4.10 File Transfer Protocol (FTP) 4.11 FTP Communication Paradigm 4.12 Electronic Mail 4.13 The Simple Mail Transfer Protocol (SMTP) 4.14 ISPs, Mail Servers, & Mail Access Ali Kujoory 6/30/2016 • 4.15 Mail Access Protocols (POP, IMAP) • 4.16 Email Representation Standards (RFC2822, MIME) • 4.17 Domain Name System (DNS) • 4.18 Domain Names That Begin with www • 4.19 The DNS Hierarchy & Server Model • 4.20 Name Resolution • 4.21 Caching in DNS Servers • 4.22 Types of DNS Entries • 4.23 Aliases & CNAME Resource Records • 4.24 Abbreviations & the DNS • 4.25 Internationalized Domain Names • 4.26 Extensible Representations (XML) Not to be reproduced without permission 2 4.1 Introduction The chapter – Explains that Internet services are defined by application programs – Characterizes the client-server model that such programs use to interact – Covers the socket API – Examines Internet applications – Defines the concept of a transfer protocol – Explains how applications implement transfer protocols – Considers standard Internet applications – Describes the transfer protocol each uses Ali Kujoory 6/30/2016 Not to be reproduced without permission 3 4.2 Application-Layer Protocols • Whenever a programmer creates two network applications, the programmer specifies some details, such as: – – – – The syntax & semantics of messages that can be exchanged Whether the client or server initiates interaction Actions to be taken if an error arises How the two sides know when to terminate communication • There are two broad types of application-layer protocols that depend on the intended use: – Private communication – Standardized service “Network” means two applications that can communicate. Ali Kujoory 6/30/2016 Not to be reproduced without permission 4 4.2 Application-Layer Protocols • Private communication – A programmer creates a pair of applications that communicate over the Internet with the intention that the pair is for private use – Interaction between the two applications is straightforward • code can be written without writing a formal protocol specification • Standardized service – Expectation is that many programmers will create server software to offer the service or client software to access the service, in this case • Application protocol must be documented independent of implementation • The specification must be precise & unambiguous • The size of a protocol specification depends on the complexity of the service Ali Kujoory 6/30/2016 Not to be reproduced without permission 5 4.3 Representation & Transfer • Application-layer protocols specify two aspects of interaction – Representation – Transfer • Fig. 4.1 explains the distinction Figure 4.1 Two key aspects of & application layer protocol Ali Kujoory 6/30/2016 Not to be reproduced without permission 6 4.4 Web Protocols • The World Wide Web (WWW) is one of the most widely used services in the Internet • Web is complex – many protocol standards have been devised to specify various aspects & details Client page Hyperlink to sonoma.edu using HTTP over TCP connection Server sonoma.edu file HTTP server Browser program DISK DISK The Internet • Fig. 4.2 illustrates major WWW standards Figure 4.2 Three key standards that the World Wide Web service uses. Ali Kujoory 6/30/2016 Not to be reproduced without permission 7 4.5 Document Representation with HTML • HyperText Markup Language (HTML) is a representation standard that specifies the syntax of a web page • HTML has the following general characteristics: – – – – – – Uses a textual representation Describes pages that contain multimedia Follows a declarative rather than procedural paradigm Provides markup specifications instead of formatting Permits a hyperlink to be embedded in an arbitrary object Allows a document to include metadata • It allows a programmer to specify a complex web page that contains graphics, audio, video, as well as text – We should have used hypermedia in the name instead of hypertext Ali Kujoory 6/30/2016 Not to be reproduced without permission 8 4.5 Document Representation with HTML • HTML is classified as declarative – It allows one to specify what is to be done, not how to do it • HTML is classified as a markup language – It only gives general guidelines for display & does not include detailed formatting instructions – HTML allows a page to specify the level of importance of a heading – HTML does not require the author to specify the exact font, typeface, point size, or spacing for the heading • HTML extensions have been created that do allow the specification of an exact font, typeface, point size, & formatting • A browser chooses all display details – The use of a markup language is important • because it allows a browser to adapt the page to the underlying display hardware • a page can be formatted for a high resolution or low resolution display, a large screen or a small hand-held device such as an iPhone or PDA Ali Kujoory 6/30/2016 Not to be reproduced without permission 9 4.5 Document Representation with HTML • To specify markup – HTML uses tags embedded in the document (see Fig. 4.3) • Tags provide structure as well as formatting • Tags control all display – white space (i.e., extra lines & blank characters) can be inserted at any point in the HTML document • without any effect on the formatted version that a browser displays • HTML tags are case insensitive – does not distinguish between uppercase & lowercase letters • Examples: – IMG tag to encode a reference to an external image • Additional parameters can be specified in an IMG tag to specify the alignment of the figure with surrounding text • An example is given in Fig. 4.4 Ali Kujoory 6/30/2016 Not to be reproduced without permission 10 4.5 Document Representation with HTML HTML is based on declaration for each markup. <HTML> <HEAD> <TITLE> text that forms the document title </TITLE> </HEAD> <BODY> body of the document appears here </BODY> </HTML> Figure 4.3 The general form of an HTML An example of a web display looks like this: Sonoma State University (SSU) Engineering Science Department: The Engineering Science Department offers an Electrical Engineering program. The simplified page source looks like this: <html> <h1> Sonoma State University (SSU) </h1> <b><h2> Engineering Science Department: </h2></b> The Engineering Science Department offers an Electrical Engineering program. </html> HTML uses IMG tag to encode a reference to an external image Here is an icon of a house. <IMG SRC=“house_icon.jpg” ALIGN=middlel> Ali Kujoory 6/30/2016 Not to be reproduced without permission 11 4.6 Uniform Resource Locators & Hyperlinks • The Web uses a syntactic form known as a Uniform Resource Locator (URL) to specify a web page • The general form of a URL is: • where – protocol is the name of the protocol used to access the document – computer_name is the domain name of the computer on which the document resides – port (optional) port number at which the server is listening – document_name (optional) name of the document – % (optional) parameters for the page – Example: Ali Kujoory 6/30/2016 Not to be reproduced without permission 12 4.6 Uniform Resource Locators & Hyperlinks • In a typical URL, a user can omit many of the parts • Which omits the – – – – protocol (http is assumed) port (80 is assumed) document name (index.html is assumed), & parameters (none are assumed) • A URL contains the information a browser needs to retrieve a page • Browser uses the separator characters – colon, slash, & percent, to divide the URL into four components: • a protocol, a computer name, a document name, & parameters • Browser uses the computer name & protocol port to form a connection to the server on which the page resides • Browser uses the document name & parameters to request a page Ali Kujoory 6/30/2016 Not to be reproduced without permission 13 4.7 Web Document Transfer with HTTP • HyperText Transfer Protocol (HTTP) is the primary transfer protocol that a browser uses to interact with a web server • A browser is a client that extracts a server name from a URL & contacts the server Ali Kujoory 6/30/2016 • Most URLs contain an explicit protocol reference of http:// or omit the protocol altogether (HTTP is assumed) • HTTP can be characterized as follows: – Uses textual control messages – Transfers binary data files – Can download or upload data – Incorporates caching Not to be reproduced without permission 14 4.7 Web Document Transfer with HTTP • Once it establishes a connection – a browser sends an HTTP request to the server • Fig. 4.5 lists the four major request types Most common Figure 4.5 The four major HTTP request types. Ali Kujoory 6/30/2016 Not to be reproduced without permission 15 4.7 Web Document Transfer with HTTP • The most common form of interaction begins with the browser requesting a page from the server • The browser (client) sends a GET request over • The server responds by sending a header, a blank line, & the requested document • A GET request has the following form: GET /item version CRLF – item gives the URL for the item being requested Ali Kujoory 6/30/2016 – version specifies a version of the protocol (HTTP/1.0 or HTTP/1.1) – CRLF denotes two ASCII characters • carriage return & linefeed, that are used to signify the end of a line of text • Version information is important in HTTP – it allows the protocol to change & yet remain backward compatible – a browser sends version information which allows • a server to choose the highest version that they can both understand Not to be reproduced without permission 16 4.7 Web Document Transfer with HTTP • Fig. 4.6 shows the general format of lines in a basic response header Figure 4.6 General format of lines in a basic response header. Figure 4.7 Example of status codes used in HTTP. Ali Kujoory 6/30/2016 Not to be reproduced without permission 17 4.7 Web Document Transfer with HTTP • The first line of a response header contains a status code – that tells the browser whether the server handled the request – If the request was incorrectly formed or the requested item was not available, the status code pinpoints the problem • Additional lines of the header give further information, such as – its length – when it was last modified – and the content type • E.g., a server returns status code 404 if the requested item cannot be found • When it honors a request, a server returns status code 200 Ali Kujoory 6/30/2016 Not to be reproduced without permission 18 4.7 Web Document Transfer with HTTP • Fig. 4.8 shows sample output from an Apache web server • The item being requested is a text file containing 16 characters – i.e., the text “This is a test.” plus a NEWLINE character • Although the GET request specifies HTTP version 1.0, the server runs version 1.1 • The server returns 9 lines of header, a blank line, & the contents of the file Figure 4.8 Sample HTTP response from an apache web server. Ali Kujoory 6/30/2016 Not to be reproduced without permission 19 4.8 Caching in Browsers • Caching provides an important optimization for web access – when users tend to visit the same web sites repeatedly • Much of the content at a given site consists of large images – Graphics Image Format (GIF) – Joint Picture Encoding Group (JPEG) • Such images often contain backgrounds or banners • A browser can reduce download times significantly – by saving a copy of each image in a cache on the user's disk & using the cached copy • What happens if the document on the web server changes after a browser stores a copy in its cache? – How can a browser tell whether its cached copy is up-to-date? – they do not change frequently Ali Kujoory 6/30/2016 Not to be reproduced without permission 20 4.8 Caching in Browsers • Whenever a browser obtains a document from a web server, the header specifies the last time the document was changed • A browser saves the LastModified date information along with the cached copy – A browser makes a HEAD request to the server & compares the Last-Modified date of the server's copy to the LastModified date in the cached – If the cached version is stale, the browser downloads the new version Ali Kujoory 6/30/2016 • Algorithm 4.1 summarizes caching, but omits several minor details, e.g., – HTTP allows a web site to include a No-cache header that specifies a given item should not be cached • Browsers do not cache small items – because the time to download the item with a GET request is almost the same as the time to make a HEAD request & keeping many small items in a cache can increase cache lookup times Not to be reproduced without permission 21 4.8 Caching in Browsers Ali Kujoory 6/30/2016 Not to be reproduced without permission 22 Browser Architecture Parts of Web model, “Computer Networks,” A. Tanenbaum, 4th ed. Ali Kujoory 6/30/2016 Not to be reproduced without permission 23 4.9 Browser Architecture • A browser structure is complex • It must understand HTTP • A browser also provides support for other protocols – It must contain client code for each of the protocols used – It must know how to interact with a server & how to interpret responses – It must know how to access the File Transfer Protocol (FTP) service • Fig. 4.9 illustrates components of a browser Ali Kujoory 6/30/2016 Figure 4.9 Architecture of a browser that can access multiple services. Not to be reproduced without permission 24 4.10 File Transfer Protocol (FTP – Information type – naming – file access mechanisms – a document, spreadsheet, computer program, graphic image, or data • Examples of differences in OSes • FTP can send a copy of a file from one computer to another – provides a powerful mechanism for the exchange of data • File transfer across the Internet is complicated because computers are heterogeneous • Each computer system may have a different: 6/30/2016 ) – file representations • A file is the fundamental storage abstraction • A file can hold an arbitrary object, e.g., Ali Kujoory http://en.wikipedia.org/wiki/File_Transfer_Protocol – file extension .jpg or .jpeg for a JPEG image – Line termination in a text file by a LINEFEED character or CARRIAGE RETURN & LINEFEED – a slash (/) or a backslash (\) for a separator in file names Not to be reproduced without permission 25 4.10 File Transfer Protocol (FTP) • FTP, RFC 959, a standardized file transfer service, provides for: – Transfer of any file content & data type • FTP protocol is usually invisible – Invoked automatically by a browser when a user requests a file download • There is also TFTP (Trivial FTP, RFC 1350) – Bidirectional Transfer (download/get or upload/put) http://en.wikipedia.org/wiki/Trivial_File_Transfer_Protocol – Uses TCP for reliability – Authentication & Ownership Support • Allows each file to have ownership & access restrictions – A simple FTP version to get from or put a file onto a remote Host – But not as reliable as FTP • Uses UDP • Mainly over LANs – Browse of Folders – No authentication – Use of ASCII text for Textual Control Messages – Heterogeneity among computer, Operating Systems Ali Kujoory 6/30/2016 Not to be reproduced without permission 26 4.11 FTP Communication Paradigm • FTP uses a client-server approach for interactions – A client establishes a connection to an FTP server which is listening & sends a series of requests to which the server responds – FTP uses at the server a • control connection (port 21) & • data connection (port 20) – Each time the server needs to download or upload a file, the server opens a new connection – Most commands & interactions are transparent to the users Client User at terminal File System Ali Kujoory User interface function Server User protocol interpreter Control comm FTP commands/replies port #21 User protocol interpreter User data transfer function Data comm port #20 Server data transfer function 6/30/2016 Not to be reproduced without permission File System 27 Fig. 4.10 Illustration of FTP connections during a typical session The exchanges for security (password) not shown Ali Kujoory 6/30/2016 Not to be reproduced without permission 28 4.11 FTP Communication Paradigm • Fig. 4.10 omits several important details, e.g., • When accessing public files, a client uses anonymous login – after creating the control connection, a client must log into the server that provide • a USER command that the client sends to provide a login name • a PASS command that the client sends to provide a password – which consists of user name anonymous & password mostly guest • What protocol port number should a server specify when connecting to the client? – The server sends a numeric status response over the control connection to let the client know whether the login was successful • A client can only send other commands after a login is successful Ali Kujoory 6/30/2016 Not to be reproduced without permission 29 4.11 FTP Communication Paradigm • A client allocates a protocol port on its local OS & sends the port # to the server – i.e., the client binds to the port to await a connection – Then transmits a PORT command over the control connection to inform the server about the port # being used • Algorithm 4.2 summarizes the steps • FTP protocol may face problems in certain cases – transmission of a protocol port # will fail if one of the two endpoints lies behind a Network Address Translation (NAT) device • i.e., as a wireless router used in a residence or small office • Ch 23 explains that FTP is an exception ⌐ A NAT device recognizes an FTP control connection, ⌐ inspects the contents of the connection, & ⌐ rewrites the values in a PORT command Ali Kujoory 6/30/2016 Not to be reproduced without permission 30 Ali Kujoory 6/30/2016 Not to be reproduced without permission 31 FTP Commands & Examples • Ftp commands: • Example 2: Obtain a copy of tcpbook.tar $ $ ftp arthur.cs.purdue.edu abort cd get put ascii close help pwd bell delete ls rename 200 arthur.cs.purdue.edu FTP Server (DYNIX V3.0.12) ready binary debug mkdir rmdir Name (arthur:usra): anonymous bye disconnect open status 331 Guest login ok, send ident as password Connected to arthur.cs.purdue.edu Password: guest • Example 1: 230 Guest login ok, access restrictions apply – ftp> ftp> get pub/comer/tcpbook.tar bookfile help ls 200 PORT Command okay • ls list content of remote directory – ftp> 150 Opening data connection for /bin/ls (128.10.2.1, 2363) (7897088 bytes) help bell • bell beep when command completed • Anonymous ftp session – User does not need an account or 226 Transfer complete 8272793 bytes received in 98.04 seconds (82 Kbytes/s) ftp> close password – Used for publicly available files 221 Goodbye Ftp> quit Ali Kujoory 6/30/2016 Not to be reproduced without permission 32 4.12 Electronic Mail • One of the most widely used Internet applications • Fig. 4.11 illustrates a simplified architecture of electronic email • Email software is divided into two conceptually pieces: – An email interface application • A mechanism for a user to compose & edit outgoing messages as well as read & process incoming email Ali Kujoory 6/30/2016 – A mail transfer program • Acts as a client to send a message to the mail server on the destination computer • the mail server accepts incoming messages & deposits each in the appropriate user's mailbox • Email system is architecturally based on postal system. – Message (Content) & Envelope are separate. – Very helpful in handling content & envelope separately. Not to be reproduced without permission 33 Algorithm 4.3 lists the steps taken to send an email Ali Kujoory 6/30/2016 Not to be reproduced without permission 34 4.12 Electronic Mail • The specifications used for Internet email can be divided into three broad categories as Fig. 4.12 lists Figure 4.12 The three types of protocols used with email Specifications & Standards: • IETF Simple Mail Transfer Protocol (SMTP) delivers simple text messages. Originally RFC 821, currently RFC 5321, both over TCP/IP, carry ASCII text only. • IETF Multi-purpose Internet Mail Extension (MIME) can deliver other types of data (Voice, images, video clips). Originally RFC 822, currently RFC 5322. • ITU-T (ISO/OSI) Message Handling System, X.400 (MHS) counterpart of SMTP. not used as much due to its complexity. Ali Kujoory 6/30/2016 Not to be reproduced without permission 35 Email Architecture & Operation • User Agent (UA) program creates, reads, sends, & receive email. – Uses a local (client) program, e.g., MS Outlook. • MTA also implements mailing list to deliver a message to a list • UA can configure MTA. – Can be command-based or graphical. • Message Transfer Agent (MTA, mail server) is a server process. – Queues & moves the message from source to destination. • Mailboxes can be implemented in MTA (mail server) to store the email received by a user. • Users can use different UA to access mailbox. – Uses SMTP over TCP to transfer the message. Ali Kujoory 6/30/2016 Not to be reproduced without permission 36 4.13 The Simple Mail Transfer Protocol (SMTP) • The Simple Mail Transfer Protocol (SMTP) is the standard protocol that a mail transfer program uses • SMTP can be characterized as: – Follows a stream paradigm (TCP) – Allows a sender to specify recipients’ names & • check each name – Sends one copy of a given message Ali Kujoory 6/30/2016 – MIME (Multipurpose Internet Mail Extensions) standard that allows email to include attachments, e.g., • graphic images or binary files – Uses textual control messages – Only transfers text messages • SMTP has a restriction to send only textual content • SMTP can send a single message to multiple recipients – The protocol allows a client to list users & then send a single copy of a message for all users on the list Not to be reproduced without permission 37 Differences between FTP & Email • FTP provides point-topoint, peer-to-peer, 2-way transfer. – More efficient for file transfer. • Knowledge of peer’s status, data retrieval, .. – More suitable for file operation. Ali Kujoory 6/30/2016 – More efficient for mail service with features. • Blind Carbon copy, with or without receipt, .. • Access control & management, file format, file size • Files can be big (GB) • SMTP provides point-tomultipoint, based on store-and-forward at the application layer. – Has limitation on file size & type. – No file operation • No file access & management Not to be reproduced without permission 38 Email Message • eMail comprises Envelope & Message that are separate. – Envelope encapsulates the message for routing. • Added & read by MTA. • Has info needed for transporting the message. – Message consists of Header & Body that are separate. • Made by UA • Consists of ⌐ Header (control info for UA) ⌐ Body (transparent to MTA) Envelopes & messages: (a) Paper mail (b) Electronic mail. Ali Kujoory 6/30/2016 Not to be reproduced without permission 39 An Example of SMTP Session Message from John To: Paul (OK) To: Matthew (no such user) CR = Carriage Return, LF=Left Feed <CR><LF>= end of line & to next line <CR><LF>, <CR><LF>=end of data Ali Kujoory 6/30/2016 Not to be reproduced without permission 40 User Agent (UA) • Called the email reader that can accept a variety of commands. – Composing, receiving, & replying to messages, & managing mailboxes. – E.g., MS Outlook, Google gmail, Mozilla Thunderbird. • Has menu or icon-driven interface using mouse or touch screen. • Displays message folders, message summary, message search, & sometimes the calendar • Auto responder works on behalf of UA but run on mail server – Can forward incoming email to a different address, or work as vacation agent. Typical elements of the user agent interface. Ali Kujoory 6/30/2016 Not to be reproduced without permission 41 Message Formats • SMTP messages consist of simple envelope based on RFC 5321. • UA builds a message & passes it to MTA. • MTA uses some of the header fields to construct the envelope. RFC 5322 header fields related to message transport. Ali Kujoory • RFC 822 extended to RFC 5322 to support multimedia. • “To” field in the header gives DNS address of primary recipients, 1st party. • Messages sent by UA must be placed in a standard format to be handled by message transfer agent. Header • ASCII email based on RFC 822. • Cc (2nd party) & Bcc (3rd party) give secondary addresses. Meaning To: E-mail address(es) of primary recipient(s) Cc: E-mail address(es) of secondary recipient(s) Bcc: E-mail address(es) for blind carbon copies From: Person or people who created the message Sender: E-mail address(es) of the actual sender Received: Line added by each transfer agent along the route Return-Path: Can be used to identify a path back to the sender 6/30/2016 Not to be reproduced without permission 42 4.14 ISPs, Mail Servers, & Mail Access Ali Kujoory 6/30/2016 Not to be reproduced without permission 43 4.14 ISPs, Mail Servers, & Mail Access • The web browser (webmail) approach is straightforward: – an ISP provides a special web page that displays messages from a user's mailbox • Advantages of webmail – ability to read email from any computer anywhere connected to Internet – a user does not need to run a special mail interface application • Disadvantage – No access to email when off line – May lose the emails in the mailbox by changing the provider Ali Kujoory 6/30/2016 • Using a special mail application can download an entire mailbox onto a local computer, such as a laptop – When connected to the Internet, a user can run an email program that downloads an entire mailbox onto the laptop • Advantages – Can process email when the laptop is offline (on an airplane) – Once online can upload emails the user has created & download any new email – Always has access to emails in the laptop Not to be reproduced without permission 44 IMAP vs POP3 (a) IMAP - Sending & reading email when receiver has a permanent Internet connection. • UA runs on the same machine as the MTA. • Client connects to server using a secure transport & begins to issue commands. • Assumes all emails remain on server indefinitely. • Displays all messages on a computer – Do not use IMAP on slow modems. • RFC 3501 (over TCP port 143) (b) POP3 - Reading e-mail when receiver has an Internet connection to an email provider. • Allows a UA to contact email provider’s MTA. • No need for receiver to have a connection after download. • Can clear out of provider mailbox after read. • Emails spreads over multiple PC’s when read. • RFC 1939 (over TCP port 110); allows – Authorization - login – Transaction - collect emails, mark/delete – Update - delete email IMAP = Internet Message Access Protocol POP3 = Post Office Protocol ver 3 Server Internet connection Ali Kujoory 6/30/2016 Not to be reproduced without permission 45 A Comparison of POP3 & IMAP RFC 3501 Used by Ali Kujoory ISPs 6/30/2016 Corporations Not to be reproduced without permission 46 4.15 Mail Access Protocols (POP3, IMAP)(Skip to MIME) • Protocols have been created that provide email access • An access protocol is distinct from a transfer protocol – access only involves a single user interacting with a single mailbox – transfer protocols allow a user to send mail to other users • Access protocols have the following characteristics: • Viewing a list of messages without downloading the message contents is useful – Especially, in cases where the link between two parties is slow – E.g., a user browsing on a cell phone may look at headers & delete spam without waiting to download the message contents – Provide access to a user’s mailbox – Permit a user to view headers, download, delete, or send messages – Client runs on user’s personal computer – Server runs on a computer that stores user’s mailbox Ali Kujoory 6/30/2016 Not to be reproduced without permission 47 4.15 Mail Access Protocols (POP, IMAP) • A variety of mechanisms available for email access – Some ISPs provide free email access software to their subscribers – In addition, two standard email access protocols have been created • Fig. 4.15 lists the standard protocol names • Two protocols differ in many details – In particular, each provides its own authentication mechanism that a user follows to identify themselves Figure 4.15 The email access protocols. Ali Kujoory 6/30/2016 Not to be reproduced without permission 48 4.16 Email Representation Standards (RFC2822, MIME) • Two important email representation standards exist: – Mail Message Format RFC 2822 – Multi-purpose Internet Mail Extensions (MIME) RFC 2045 • 2822 Mail Message Format: – takes its name from the IETF standards document RFC 2822 – an email message is represented as a text file & consists of Ali Kujoory 6/30/2016 • a header section • a blank line, & • a body – Header lines each have the form: Keyword: information • where the set of keywords is defined to include ⌐ From:, To:, Subject:, Cc: • MIME, next slide Not to be reproduced without permission 49 4.16 Email Representation Standards (RFC 2822, MIME) • But MIME does not restrict encoding to a specific form • Multi-purpose Internet Mail Extensions (MIME) – MIME standard extends the functionality of email to allow the transfer of non-text data in a message – MIME specifies how a binary file can be encoded into printable characters, included in a message, & decoded by the receiver – The Base64 encoding standard is most popular • Maps 6-bit block into 8-bit block printable ASCII output Ali Kujoory 6/30/2016 • MIME permits a sender /receiver to choose a convenient encoding • The sender includes extra lines in the header to specify encoding used – MIME allows a sender to divide a message into several parts & – To specify an encoding for each part independently • a user can send a plain text message & attach a graphic image, a spreadsheet, & an audio clip, each with their own encoding Not to be reproduced without permission 50 4.16 Email Representation Standards (RFC2822, MIME) • MIME adds two lines to an email header – One to declare that MIME has been used to create the message, & – Another to specify how MIME information is included in the body, e.g., – The header lines: • MIME-Version: 1.0 • Content-Type: Multipart/Mixed; Boundary=Mime_separator – Mime_separator will appear in the message body before each part Ali Kujoory 6/30/2016 • When MIME is used to send a standard text message, the 2nd line becomes Content-Type: text/plain • MIME is backward compatible with email systems that do not understand the MIME standard or encoding – such systems have no way of extracting non-text attachments – they treat the body as a single block of text Not to be reproduced without permission 51 4.17 Domain Name System (DNS) • DNS provides a service that maps human-readable symbolic names to computer addresses • Whenever an application needs to translate a name, the – application becomes a client of the naming system – maps name to address • Provides a directory service for TCP/IP applications, e.g., – Browsers, mail software • DNS is an interesting example of client-server interaction mapping – Is not performed by a single server – Is distributed among many servers located at sites across the Internet Ali Kujoory 6/30/2016 – client sends a request message to a name server – server finds the corresponding address & sends a reply message • if it cannot answer a request, a name server temporarily becomes the client of another name server, until a server is found that can answer the request • RFC 1034 & 1035, Domain Names • ITU X.500, Directory Service Not to be reproduced without permission 52 4.17 Domain Name System • Syntactically, each name consists of a sequence of alpha-numeric segments separated by periods, e.g., – A computer in the Computer Science Department at Purdue University has the domain name: mordred.cs.purdue.edu – A computer at Cisco, Inc. has the domain name: anakin.cisco.com • Domain names are hierarchical, with the most significant part of the name on the right (e.g., edu, com – The left-most segment of a name (mordred & anakin in the examples) is the name of an individual computer – Other segments in a domain name identify the group that owns it, e.g., the segment • purdue gives the name of a university, & • cisco gives the name of a company Ali Kujoory 6/30/2016 Not to be reproduced without permission 53 DNS Structure (Partly from A. Tanenbaum) • A portion of the Internet domain name space – A hierarchy, a tree structure • Each domain is partitioned into subdomains – Subdomains are further partitioned • Leaves may be a single host or a company with many hosts Generic Countries unnamed root conceptual server for domain server for sub-domain com edu att sonoma cs es gov mil nsf org net acm ieee int jp us nl ac co ali.kujoory nec ali.kujoory@ieee.org Ali Kujoory 6/30/2016 Not to be reproduced without permission 54 4.17 Domain Name System • DNS does not specify the number of segments in a name • DNS does specify values for the most significant segment, which is called a top-level domain (TLD) – Controlled by the Internet Corporation for Assigned Names & Numbers (ICANN) – ICANN designates one or more domain registrars to administer a given top-level domain & approve specific names • Some TLDs are generic, meaning they are generally available • Fig. 4.16 lists example top-level DNS domains • An organization applies for a name under one of the existing top-level domains – most US corporations choose to register under the com domain • DNS allows organizations to use a geographic registration – E.g., the Corporation For National Research Initiatives registered the domain: cnri.reston.va.us – Other TLDs are restricted to specific groups or government agencies Ali Kujoory 6/30/2016 Not to be reproduced without permission 55 Fig. 4.16 Example top-level domains & the group to which each is assigned Ali Kujoory 6/30/2016 Not to be reproduced without permission 56 DNS Operation • DNS application program calls a library procedure called resolver • Armed with the IP address, the program can then – establish a TCP connection with the destination, or – send its UDP packets – Provides to it the name as a parameter • Resolver (DNS client) sends a UDP packet to a local DNS server – which then looks up the name & returns the IP address to the resolver – which then returns the IP address to the caller Ali Kujoory 6/30/2016 1 2 5 3 4 DNS server 6 Not to be reproduced without permission 57 Name Servers • Finding the IP address for a given hostname is called resolution & is done with the DNS protocol. • Resolution: • DNS protocol: – Runs on UDP port 53, retransmits lost messages. – Caches name server answers for better performance. – Computer requests local name server to resolve. • Example of a computer looking up the IP for a name. – Local name server asks the root name server. – Root returns the name server for a lower zone. – Continue down zones until name server can answer. Example of a resolver looking up a remote name in 10 steps. Ali Kujoory 6/30/2016 Not to be reproduced without permission 58 DNS - Resource Records • Records that make up the database are known as “Resource Records”. • Namespace stored on a “Name Server”. • Every domain, whether a single host or a top-level domain can have a set of resource records associated with it. – For a single host most common resource record is its IP address. • When a resolver gives a domain name to DNS, – it gets back the resource record associated with that name. • Real function of DNS is to map domain names onto resource records. Ali Kujoory 6/30/2016 Not to be reproduced without permission 59 DNS - Resource Records (2) • A resource record is a five-tuple: – Encoded in binary for efficiency, but represented in ASCII text • One line per resource record Example Domain_name Time_to_live Class Type Value 128.32.137.3 a) ucbvax.berkeley.edu 60 IN A b) berkeley.edu 86400 IN NS ucbvax.berkeley.edu 1. Domain_name field indicates the domain to which this record applies. – There are many records for each domain. – This field is the primary search key to satisfy queries. 2. Time_to_live field indicates how stable the record is. – E.g., 86400 (# of sec in 1 day). – Large value for a highly stable record. 3. Class use IN for Internet information. – Other codes for non-Internet. 4. Type field indicates what kind of record this is (next slide). 5. Value field - a number, a domain name, or an ASCII string. – Semantics depend on the record type. Ali Kujoory 6/30/2016 Not to be reproduced without permission 60 DNS - Resource Records (3) Type Meaning Value SOA Start of authority Parameters for this zone A IPv4 address of a host 32-Bit integer AAAA IPv6 address of a host 128-Bit integer MX Mail exchange Priority, domain willing to accept email NS Name server Name of a server for this domain CNAME Canonical name Domain name PTR Pointer Alias for an IP address SPF Sender policy framework Text encoding of mail sending policy SRV Service Host that provides it TXT Text Descriptive ASCII text The principal DNS resource record types. Ali Kujoory 6/30/2016 Not to be reproduced without permission 61 DNS - Query/Response Scenarios QUERY Client e.g., email agent, resolver RESPONSE Name Server Example: Client asking Name server for the IP address of a host. ID operation type query name answer 23 QUERY A gemini.tuc.noao.edu 23 RESPONSE A gemini.tuc.noao.edu 140.252.3.54 Ali Kujoory 6/30/2016 Not to be reproduced without permission 62 4.18 Domain Names That Begin with www (can skip to XML) • Many organizations assign domain names that reflect the service a computer provides, e.g., – A computer that runs a server for FTP might be named: ftp.foobar.com – Similarly, a computer that runs a web server might be named: www.foobar.com Ali Kujoory 6/30/2016 • Such names are mnemonic, but are not required • The use of www to name computers that run a web server is merely a convention – an arbitrary computer can run a web server, even if the computer's domain name does not contain www – a computer that has a domain name beginning with www is not required to run a web server Not to be reproduced without permission 63 4.19 The DNS Hierarchy & Server Model • Each organization is free to choose the details of its servers, e.g., – a small organization that only has a few computers can contract with an ISP to run a DNS server • An organization that runs its own server can choose to place all names for the organization in a single physical server, or it can choose to divide its names among multiple servers, e.g., – Fig. 4.17 illustrates how the hypothetical Foobar Corporation might choose to structure servers if the corporation had a candy division & a soap division Ali Kujoory 6/30/2016 Not to be reproduced without permission 64 4.19 The DNS Hierarchy & Server Model Ali Kujoory 6/30/2016 Not to be reproduced without permission 65 4.19 The DNS Hierarchy & Server Model • DNS is designed to allow each organization to assign names – To computers or to change those names without informing a central authority – To achieve autonomy, each organization is permitted to operate DNS servers for its part of the hierarchy • Purdue University operates a server for names ending in purdue.edu • IBM Corporation operates a server for names ending in ibm.com • Each DNS server contains information that links the server to other domain name servers up & down the hierarchy – a given server can be replicated, e.g., • multiple physical copies of the server exist • Replication is useful for heavily used servers, such as root servers that provide information about top-level domains – administrators must guarantee that all copies are coordinated • so they provide exactly the same information Ali Kujoory 6/30/2016 Not to be reproduced without permission 66 4.19 The DNS Hierarchy & Server Model Ali Kujoory 6/30/2016 Not to be reproduced without permission 67 4.20 Name Resolution • The translation of a domain name into an address is called – Name resolution,i.e., – “Name is said to be resolved to an address” – Software to perform the translation is known as a name resolver (or simply resolver) • In the socket API, e.g., – the resolver is invoked by calling function gethostbyname Ali Kujoory 6/30/2016 • The resolver becomes a client by contacting a DNS server – DNS server returns an answer to the caller • Each resolver is configured with the address of one or more local domain name servers • The resolver forms a DNS request message – sends the message to the local server – waits for the server to send a DNS reply message for the answer Not to be reproduced without permission 68 4.20 Name Resolution • A resolver can choose to use either the stream or message paradigm when communicating with a DNS server – most resolvers are configured to use a message paradigm because it imposes less overhead for a small request • Fig. 4.17a illustrates & assume a computer in the soap division generates a request for name • The resolver will be configured to send the request to the local DNS server (i.e., the server for foobar.com) – Although it cannot answer the request, the server knows to contact the server for candy.foobar.com, which can generate an answer chocolate.candy.foobar.com Ali Kujoory 6/30/2016 Not to be reproduced without permission 69 4.21 Caching in DNS Servers • The locality of reference principle that forms the basis for caching applies to the Domain Name System in two ways: – Spatial: A user tends to look up the names of local computers more often than the names of remote computers • For this, a name resolver contacts a local server first – Temporal: A user tends to look up the same set of domain names repeatedly • For this, a DNS server caches all lookups • Algorithm 4.4 summarizes the process Ali Kujoory 6/30/2016 Not to be reproduced without permission 70 Ali Kujoory 6/30/2016 Not to be reproduced without permission 71 4.21 Caching in DNS Servers • From the algorithm, when a request arrives for a name outside the set for which the server is an authority further clientserver interaction results • The server temporarily becomes a client of another name server • When the other server returns an answer • In addition to knowing the address of all servers down the hierarchy – each DNS server must know the address of a root server • How long should items be cached? – If an item is cached too long, the item will become stale – The cache timeout that DNS has specified for each item – the original server caches the answer & sends a copy of the answer back to the resolver from which the request arrived Ali Kujoory 6/30/2016 Not to be reproduced without permission 72 4.22 Types of DNS Entries • Each entry in a DNS database consists of three items: – a domain name – a record type • specifies how the value is to be interpreted Type A for ipv4 Type CNAME for Domain name Type MX for Mail Exchange Type NS for Name server – a value • A query sent to a DNS server specifies both a domain name & a type & the – server only returns a binding that matches the type of the query • The principal type maps a domain name to an IP address – DNS classifies such bindings as type A • type A lookup is used by applications such as FTP, ping, or a browser – DNS supports several other types, including type MX • that specifies a Mail eXchanger • when it looks up the name in an email address, SMTP uses type MX Ali Kujoory 6/30/2016 Not to be reproduced without permission 73 4.22 Types of DNS Entries • Each entry in a DNS server has a type • When a resolver looks up a name, the – resolver specifies the type that is desired – DNS server returns only entries that match the specified type • The DNS type system can produce unexpected results • a corporation may decide to use the name corporation.com for both web & email services • It is possible for the corporation to divide the workload between separate computers by – mapping type A lookups to one computer & – Mapping type MX lookups to another – because the address returned can depend on the type, e.g., Ali Kujoory 6/30/2016 Not to be reproduced without permission 74 4.23 Aliases & CNAME Resource Records • The DNS offers a CNAME (Canonical Name) – it is analogous to a symbolic link in a file system – the entry provides an alias for another DNS entry • Aliases can be useful, e.g., – Suppose Foobar Corporation has a computer named as hobbes.foobar.com to run a web server using name www Ali Kujoory 6/30/2016 • Organization foobar can create a CNAME entry for www.foobar.com that points to hobbes • Whenever a resolver sends a request for www.foobar.com, the server returns the address of computer hobbes Not to be reproduced without permission 75 4.23 Aliases & CNAME Resource Records • The use of aliases is especially convenient – it permits an organization to change the computer used for a particular service without changing the names or addresses: • E.g., Foobar Corporation can move its web service from hobbes calvin • changing the CNAME record in the DNS server, the two computers retain their original names & IP addresses Ali Kujoory 6/30/2016 • The use of aliases also allows an organization to associate multiple aliases with a single computer – Thus, Foobar Corporation can run an FTP server & a web server on the same computer, & can create CNAME records: www.foobar.com ftp.foobar.com Not to be reproduced without permission 76 4.24 Abbreviations & the DNS • DNS does not incorporate abbreviations – a server only responds to a full name • Most resolvers can be configured with a set of suffixes that allow a user to abbreviate names, e.g., – each resolver at Foobar Corporation might be programmed to look up a name twice: • once with no change & once with the suffix foobar.com appended Ali Kujoory 6/30/2016 • If a user enters a full domain name – the local server will return the address, & processing will proceed • If a user enters an abbreviated name – it will first try to resolve the name, & – will receive an error because no such name exists – then it will try appending a suffix & looking up the resulting name Not to be reproduced without permission 77 4.25 Internationalized Domain Names • DNS uses the ASCII character set • Languages such as Russian, Greek, Chinese, & Japanese each contain characters for which no ASCII representation exists – Many European languages use diacritical marks that cannot be represented in ASCII • IETF debated modifications & extensions of the DNS to accommodate international domain names Ali Kujoory 6/30/2016 – After considering many proposals, IETF chose an approach known as Internationalizing Domain Names in Applications (IDNA) • IDNA uses ASCII to store all names • If a domain name contains a non-ASCII character – IDNA translates the name into a sequence of ASCII characters & – stores the result in the DNS Not to be reproduced without permission 78 4.25 Internationalized Domain Names • IDNA relies on applications to translate between the international character set & the internal ASCII form used • The rules for translating international domain names are complex & • Use the latest versions of the widely-used browsers, e.g., – Firefox & Internet Explorer, can accept & display non-ASCII domain names because they each implement IDNA Ali Kujoory 6/30/2016 Not to be reproduced without permission 79 4.26 Extensible Markup Language (XML) • XML is a markup language that defines a set of rules for encoding documents • Although the design of XML focuses on documents, – in a format which is both humanreadable & machine-readable • Defined by the W3C's XML 1.0, it is free open standards • Design goals of XML emphasize simplicity, generality & usability across the Internet • It is a textual data format with strong support via Unicode for different human languages https://en.wikipedia.org/wiki/XML Ali Kujoory 6/30/2016 – it is widely used for representation of arbitrary data structures such as those used in web services • XML describes the structure of data & – provides names for each field • XML does not assign any meaning to tags – tag names can be created as needed – tag names can be selected to make data easy to parse or access Not to be reproduced without permission 80 4.26 Extensible Markup Language (XML) Example: • Two companies agree to exchange corporate telephone directories, they – define an XML format that has data items (Fig. 4.18), e.g., • as employee's name, phone number, & office, & – choose to further divide a name into a last & a first name Figure 4.18 XML example Ali Kujoory 6/30/2016 Not to be reproduced without permission 81