DNS Domain Name Systems Source http://isr.uncc.edu/ITIS3100/3100PowerPoint/DNS.ppt Domain Name System DNS Overview DNS Zones » Forward » Reverse » Fowarding DNS Delegation/Parenting Mail Exchangers DNS Overview http://en.wikipedia.org/wiki/Dns Overview On the Internet, the Domain Name System (DNS) associates various sorts of information with domain names » Serves as the "phone book" for the Internet » Translates human-readable computer hostnames into IP addresses » Required by networking equipment to delivering information » Also stores other information » Such as the list of mail exchange servers that accept email for a given domain. By providing a worldwide keyword-based redirection service, the Domain Name System is an essential component of the modern Internet Uses Uses The most basic use of DNS is to translate hostnames to IP addresses. » Very much like a phone book » For example, what is the internet address of en.wikipedia.org? » The Domain Name System can be used to tell you it is 66.230.200.100 Uses DNS also has other important uses » DNS makes it possible » Assign Internet destinations to the human organization or concern they represent » Independent of the physical routing hierarchy represented by the numerical IP address. » Hyperlinks and Internet contact information can remain the same » Whatever the current IP routing arrangements may be » Can take a human-readable form (such as "wikipedia.org") Easier to remember than an IP address (such as 66.230.200.100). » People take advantage of this when they recite meaningful URLs and e-mail addresses » Do not need to care how the machine will actually locate them Uses The Domain Name System distributes the responsibility for assigning domain names and mapping them to IP networks » allows an authoritative server for each domain to keep track of its own changes » avoids the need for a central registrar to be continually consulted and updated History History Using a name as a more human-legible abstraction of a machine's numerical address on the network predates even TCP/IP » All the way to the ARPAnet era Back then however, a different system was used, as DNS was only invented in 1983, shortly after TCP/IP was deployed. » With the older system, each computer on the network retrieved a file called HOSTS.TXT from a computer at SRI (now SRI International). » The HOSTS.TXT file mapped numerical addresses to names. » A hosts file still exists on most modern operating systems, either by default or through configuration » Allows users to specify an IP address (eg. 192.0.34.166) to use for a hostname (eg. www.example.net) without checking DNS. » Nowadays, the hosts file serves primarily for troubleshooting DNS errors or for mapping local addresses to more organic names » Systems based on a hosts file have inherent limitations » The obvious requirement that every time a given computer's address changed, every computer that seeks to communicate with it would need an update to its hosts file On Windows: C:\WINDOWS\system32\drivers\etc> History The growth of networking called for a more scalable system » Records a change in a host's address in one place only » Other hosts would learn about the change dynamically through a notification system » Completes a globally accessible network of all hosts' names and their associated IP Addresses History At the request of Jon Postel, Paul Mockapetris invented the Domain Name System in 1983 and wrote the first implementation. » The original specifications appear in RFC 882 and 883 » In 1987, the publication of RFC 1034 and RFC 1035 updated the DNS specification » Made RFC 882 and RFC 883 obsolete. » Several more-recent RFCs have proposed various extensions to the core DNS protocols. History In 1984, four Berkeley students1 wrote the first UNIX implementation » In 1985 Kevin Dunlap of DEC significantly re-wrote the DNS implementation » Renamed it BIND (Berkeley Internet Name Domain) » BIND was ported to the Windows NT platform in the early 1990s. Due to BIND's long history of security issues and exploits, several alternative name server/resolver programs have been written and distributed in recent years. 1Douglas Terry, Mark Painter, David Riggle and Songnian Zhou How DNS works Theory How DNS Works - Theory Domain names com. » Arranged in a tree » Cut into zones » Each served by a name server example.com. . How DNS Works - Theory The domain name space consists of a tree of domain names. » Each node or leaf in the tree has one or more resource records, which hold information associated with the domain name. » The tree sub-divides into zones. » A zone consists of a collection of connected nodes authoritatively served by an authoritative DNS name server. » Note that a single name server can host several zones How DNS Works - Theory When a system administrator wants to let another administrator control a part of the domain name space within his or her zone of authority » Can delegate control to the other administrator. » Splits a part of the old zone off into a new zone » Comes under the authority of the second administrator's name servers » The old zone becomes no longer authoritative for what goes under the authority of the new zone. How DNS Works - Theory A resolver looks up the information associated with nodes. » A resolver knows how to communicate with name servers by sending DNS requests, and heeding DNS responses. » Resolving usually entails iterating through several name servers to find the needed information. Some resolvers function simplistically and can only communicate with a single name server. » These simple resolvers rely on a recursing name server to perform the work of finding information for them. How DNS Works - Theory Parts of a domain name A domain name usually consists of two or more parts (labels), separated by dots » Example: wikipedia.org. » The rightmost label conveys the top-level domain The address en.wikipedia.org has the top-level domain org » Each label to the left specifies a subdivision or subdomain of the domain above it. » Note that "subdomain" expresses relative dependence, not absolute dependence: wikipedia.org comprises a subdomain of the org domain en.wikipedia.org comprises a subdomain of the domain wikipedia.org How DNS Works - Theory Parts of a domain name A domain name usually consists of two or more parts (labels), separated by dots » In theory » this subdivision can go down to 127 levels deep » each label can contain up to 63 characters » the whole domain name does not exceed a total length of 255 characters » In practice » some domain registries have shorter limits than that. How DNS works in theory Parts of a domain name A hostname refers to a domain name that has one or more associated IP addresses » For example, the en.wikipedia.org and wikipedia.org domains are both hostnames, but the org domain is not The Domain Name System consists of a hierarchical set of DNS servers » Each domain or subdomain has one or more authoritative DNS servers that publish information about that domain and the name servers of any domains "beneath" it. » The hierarchy of authoritative DNS servers matches the hierarchy of domains » At the top of the hierarchy stand the root nameservers: the servers to query when looking up (resolving) a toplevel domain name (TLD) How DNS works in theory Parts of a domain name Iterative and recursive queries: » Iterative query: the DNS server may provide a partial answer to the query (or give an error) » DNS servers must support non-recursive queries. » Recursive query: the DNS server will fully answer the query (or give an error) » DNS servers are not required to support recursive queries and both the resolver (or another DNS acting recursively on behalf of another resolver) negotiate use of recursive service using bits in the query headers How DNS works in theory Address resolution mechanism A full host name may have several name segments » e.g ahost.ofasubnet.ofabiggernet.inadomain.example In practice full host names typically consist of three segments » ahost.inadomain.example » www.inadomain.example Software interprets the name segment by segment, right to left » Using an iterative search procedure » At each step along the way, the program queries a corresponding DNS server to provide a pointer to the next server which it should consult. Example: » A DNS recursor consults three nameservers to resolve the address www.wikipedia.org. (This description deliberately uses the fictional .example TLD in accordance with the DNS guidelines themselves.) How DNS works in theory Address resolution mechanism As originally envisaged, the process was as simple as: » The local system is pre-configured with the known addresses of the root servers in a file of root hints » Needed to be updated periodically by the local administrator from a reliable source to be kept up to date with the changes which occur over time » Query one of the root servers to find the server authoritative for the next level down » Query this second server for the address of a DNS server with detailed knowledge of the second-level domain » Repeat the previous step to progress down the name, until the final step which would return the final address sought (This description deliberately uses the fictional .example TLD in accordance with the DNS guidelines themselves.) How DNS works in theory Address resolution mechanism The diagram illustrates this process for the real host www.wikipedia.org. (This description deliberately uses the fictional .example TLD in accordance with the DNS guidelines themselves.) How DNS works in theory Address resolution mechanism A search done in this simple form has a major problem: » a huge operating burden on the root servers » each and every search for an address would be started by querying one of them Root nameservers are critical to the overall function of the system » Such a heavy use would create an insurmountable bottleneck for trillions of queries placed every day In practice preemptive measures are taken (This description deliberately uses the fictional .example TLD in accordance with the DNS guidelines themselves.) How DNS works in theory Circular dependencies and glue records Name servers in delegations appear listed by name, rather than by IP address. » This means that a resolving name server must issue another DNS request to find out the IP address of the server to which it has been referred » Since this can introduce a circular dependency if the nameserver referred to is under the domain that it is authoritative of, it is occasionally necessary for the nameserver providing the delegation to also provide the IP address of the next nameserver » This record is called a glue record How DNS works in theory Circular dependencies and glue records For example, assume that the sub-domain en.wikipedia.org contains further sub-domains (such as something.en.wikipedia.org) and that the authoritative nameserver for these is at ns1.en.wikipedia.org » A computer trying to resolve something.en.wikipedia.org will thus first have to resolve ns1.en.wikipedia.org » Since ns1 is also under the en.wikipedia.org subdomain, resolving ns1.en.wikipedia.org requires resolving ns1.en.wikipedia.org which is exactly the circular dependency mentioned above » The dependency is broken by the glue record in the nameserver of wikipedia.org that provides the IP address of ns1.en.wikipedia.org directly to the requestor, enabling it to bootstrap the process by figuring out where ns1.en.wikipedia.org is located How DNS Works In Practice How DNS Works In Practice When an application tries to find the IP address of a domain name » It doesn't necessarily follow all of the steps outlined in the Theory section » Uses caching How DNS works In practice Caching and time to live » Huge volume of requests generated by a system like DNS » Need to provide a mechanism to reduce the load on individual DNS servers » DNS resolution process allows for caching for a given period of time after a successful answer Caching: the local recording and subsequent consultation of the results of a DNS query » How long a resolver caches a DNS response is determined by a value called the time to live (TTL) » The TTL is set by the administrator of the DNS server handing out the response The period of validity may vary from just seconds to days or even weeks or years How DNS Works In Practice - Caching time As a consequence of the distributed and caching architecture, changes to DNS do not always take effect immediately and globally » This is best explained with an example: » An administrator has set a TTL of 6 hours for the host www.wikipedia.org Then changes the IP address to which www.wikipedia.org resolves at 12:01pm Administrator must consider that a person who cached a response with the old IP address at 12:00pm will not consult the DNS server again until 6:00pm. » The period between 12:01pm and 6:00pm in this example is called caching time The period of time that begins when you make a change to a DNS record and ends after the maximum amount of time specified by the TTL expires » This essentially leads to an important logistical consideration when making changes to DNS: not everyone is necessarily seeing the same thing you're seeing. RFC 1537 helps to convey basic rules for how to set the TTL. How DNS Works In Practice - Caching time Note that the term "propagation", although very widely used in this context, does not describe the effects of caching well » Specifically, it implies that » [1] when you make a DNS change, it somehow spreads to all other DNS servers (instead, other DNS servers check in with yours as needed) » [2] that you do not have control over the amount of time the record is cached you control the TTL values for all DNS records in your domain except your NS records and any authoritative DNS servers that use your domain name How DNS Works In Practice - Caching time Some resolvers may override TTL values » Protocol supports caching » up to 68 years » no caching at all Negative caching (the non-existence of records) is determined by name servers authoritative for a zone which MUST include the SOA record (Start Of Authority) when reporting no data of the requested type exists. The MINIMUM field of the SOA record and the TTL of the SOA itself is used to establish the TTL for the negative answer How DNS Works In Practice - Caching time Many people incorrectly refer to a mysterious 48 hour or 72 hour propagation time when you make a DNS change. » When one changes the NS records for one's domain or the IP addresses for hostnames of authoritative DNS servers using one's domain (if any), there can be a lengthy period of time before all DNS servers use the new information » Those records are handled by the zone parent DNS servers » Typically cache those records for 48 hours » However, those DNS changes will be immediately available for any DNS servers that do not have them cached » And any DNS changes on your domain other than the NS records and authoritative DNS server names can be nearly instantaneous, if you choose for them to be (by lowering the TTL once or twice ahead of time, and waiting until the old TTL expires before making the change) How DNS Works In Practice - In the Real World DNS resolving from program to OS-resolver to ISP-resolver to greater system. Users generally do not communicate directly with a DNS resolver » Instead DNS-resolution takes place transparently in client-applications such as web-browsers, mailclients, and other Internet applications » When an application makes a request which necessitates a DNS lookup » Such programs send a resolution request to the local DNS resolver in the local operating system » Which in turn handles the communications required How DNS Works In Practice - In the Real World The DNS resolver will almost invariably have a cache containing recent lookups » If the cache can provide the answer to the request, the resolver will return the value in the cache to the program that made the request » If the cache does not contain the answer, the resolver will send the request to one or more designated DNS servers » In the case of most home users, the Internet Service Provider to which the machine connects will usually supply this DNS server: » Such a user will either have configured that server's address manually or allowed DHCP to set it » System administrators have configured systems to use their own DNS servers Their DNS resolvers point to separately maintained nameservers of the organization » In any event, the name server thus queried will follow the process outlined above » Until it either successfully finds a result or does not » It then returns its results to the DNS resolver » Assuming it has found a result, the resolver duly caches that result for future use Hands the result back to the software which initiated the request How DNS Works In Practice - Broken Resolvers Broken resolvers » An additional level of complexity emerges when resolvers violate the rules of the DNS protocol » Some people have suggested that a number of large ISPs have configured their DNS servers to violate rules (presumably to allow them to run on less-expensive hardware than a fullycompliant resolver), such as by disobeying TTLs, or by indicating that a domain name does not exist just because one of its name servers does not respond » As a final level of complexity, some applications (such as webbrowsers) also have their own DNS cache, in order to reduce the use of the DNS resolver library itself. » This practice can add extra difficulty when debugging DNS issues, as it obscures the freshness of data, and/or what data comes from which cache. » These caches typically use very short caching times of the order of one minute » Internet Explorer offers a notable exception: recent versions cache DNS records for half an hour How DNS Works In Practice - Other Applications The system outlined above provides a somewhat simplified scenario. The Domain Name System includes several other functions: » Hostnames and IP addresses do not necessarily match on a one-toone basis. » Many hostnames may correspond to a single IP address: combined with virtual hosting » this allows a single machine to serve many web sites » Alternatively a single hostname may correspond to many IP addresses: » Facilitates fault tolerance and load distribution » Allows a site to move physical location seamlessly Many uses of DNS besides translating names to IP addresses » E.g. Mail transfer agents use DNS to find out where to deliver e-mail for a particular address The domain to mail exchanger mapping provided by MX records accommodates another layer of fault tolerance and load distribution on top of the name to IP address mapping 39 How DNS Works In Practice - Other Applications Sender Policy Framework and DomainKeys » Instead of creating their own record types » Designed to take advantage of another DNS record type, the TXT record To provide resilience in the event of computer failure, multiple DNS servers are usually provided for coverage of each domain, and at the top level, thirteen very powerful root servers exist, with additional "copies" of several of them distributed worldwide via Anycast DNS primarily uses UDP on port 53 to serve requests. » Almost all DNS queries consist of a single UDP request from the client followed by a single UDP reply from the server » TCP comes into play only when the response data size exceeds 512 bytes, or for such tasks as zone transfer » Some operating systems such as HP-UX are known to have resolver implementations that use TCP for all queries, even when UDP would suffice DNS Extensions Extensions to DNS » EDNS is an extension of the DNS protocol » Enhances the transport of DNS data in UDP packages » Adds support for expanding the space of request and response codes » Described in RFC 2671 Types of DNS records Important categories of data stored in DNS include the following: » An A record or address record maps a hostname to a 32-bit IPv4 address. » An AAAA record or IPv6 address record maps a hostname to a 128-bit IPv6 address. » A CNAME record or canonical name record is an alias of one name to another » The A record to which the alias points can be either local or remote - on a foreign name server. » This is useful when running multiple services (like an FTP and a webserver) from a single IP address. » Each service can then have its own entry in DNS (like ftp.example.com. and www.example.com.) » An MX record or mail exchange record maps a domain name to a list of mail exchange servers for that domain. » A PTR record or pointer record maps an IPv4 address to the canonical name for that host. » Setting up a PTR record for a hostname in the in-addr.arpa. domain that corresponds to an IP address implements reverse DNS lookup for that address. » For example (at the time of writing), www.icann.net has the IP address 192.0.34.164, but a PTR record maps 164.34.0.192.in-addr.arpa to its canonical name, referrals.icann.org. » An NS record or name server record maps a domain name to a list of DNS servers authoritative for that domain. » Delegations depend on NS records. Types of DNS records Important categories of data stored in DNS include the following: (cont.) » An SOA record or start of authority record specifies the DNS server providing authoritative information about an Internet domain, the email of the domain administrator, the domain serial number, and several timers relating to refreshing the zone. » An SRV record is a generalized service location record. » A TXT Record allows an administrator to insert arbitrary text into a DNS record. » For example, this record is used to implement the Sender Policy Framework and DomainKeys specifications. » NAPTR records ("Naming Authority Pointer") are a newer type of DNS record that support regular expression based rewriting. » Other types of records simply provide information (for example, a LOC record gives the physical location of a host), or experimental data (for example, a WKS record gives a list of servers offering some well known service such as HTTP or POP3 for a domain). » When sent over the internet, all records use the common format specified in RFC 1035 shown below. » RR (Resource Record) Fields FieldDescriptionLength (Octets)NAMEName of the node to which this record pertains.(variable)TYPEType of RR. » For example, MX is type 15.2CLASSClass code.2TTLSigned time in seconds that RR stays valid.4RDLENGTHLength of RDATA field.2RDATAAdditional RR-specific data.(variable)For a complete list of DNS Record types consult IANA DNS Parameters. DNS Records Complete List http://www.iana.org/assignments/dns-parameters Example DNS Record for logicbbs.org IN NS ns.planix.com IN NS ns1.mydyndns.org IN NS ns2.mydyndns.org IN MX 10 mail IN A 69.17.158.109 www IN A 69.17.158.109 mail IN A 69.17.158.109 First three lines describe valid name servers for logicbbs.org. Following entry indicates that the mail exchanger for logicbbs.org has a priority of 10 and messages should be directed to mail.logicbbs.org. Priority values indicate where to send e-mail if a server is unavailable; the lower the priority value, the higher the priority of that server. » Mail servers send e-mail to the server with the lowest priority value, and then work their way up the values listed as necessary. The last three lines indicate that logicbbs.org (the second-level domain) points to 69.17.158.109. » The www and mail subdomains (www.logicbbs.org, mail.logicbbs.org) also point to 69.17.158.109. The DNS record is the reason why some internet addresses do not need the www prefix, while others do. » If that particular domain has a www A record that differs from the basic A record, then anydomain.com may be different from www.anydomain.com, and the former may not work. » Other sites, like logicbbs.org, have both the top-level domain and the www subdomain pointing to the same IP address, which reduces confusion and ambiguity Internationalized Domain Names While domain names technically have no restrictions on the characters they use and can include non-ASCII characters, the same is not true for host names » Host names are the names most people see and use for things like e-mail and web browsing. » Host names are restricted to a small subset of the ASCII character set that includes » » » » the Roman alphabet in upper and lower case the digits 0 through 9 the dot the hyphen » This prevented the native representation of names and words of many languages » ICANN has approved the Punycode-based IDNA system, which maps Unicode strings into the valid DNS character set, as a workaround to this issue » Some registries have adopted IDNA Security issues DNS was not originally designed with security in mind, and thus has a number of security issues. » DNS responses are traditionally not cryptographically signed, leading to many attack possibilities; » DNSSEC modifies DNS to add support for cryptographically signed responses » There are various extensions to support securing zone transfer information as well Even with encryption it still doesn't prevent the possibility that a DNS server could become infected with a virus (or for that matter a disgruntled employee) that would cause IP addresses of that server to be redirected to a malicious address with a long TTL. » This could have far reaching impact to potentially millions of internet users if busy DNS servers cache the bad IP data. » This would require manual purging of all affected DNS caches as required by the long TTL (up to 68 years). Some domain names can spoof other, similar-looking domain names. » For example, "paypal.com" and "paypa1.com" are different names, yet users may be unable to tell the difference when the user's typeface (font) does not clearly differentiate the letter l and the number 1. » This problem is much more serious in systems that support internationalized domain names, since many characters that are different, from the point of view of ISO 10646, appear identical on typical computer screens. Legal users of domains Registrant » Most of the NICs in the world receive an annual fee from a legal user in order for the legal user to utilize the domain name (i.e. a sort of a leasing agreement exists, subject to the registry's terms and conditions) » Depending on the various naming convention of the registries, legal users become commonly known as "registrants" or as "domain holders" » ICANN holds a complete list of domain registries in the world » One can find the legal user of a domain name by looking in the WHOIS database held by most domain registries » For most of the more than 240 country code top-level domains (ccTLDs), the domain registries hold the authoritative WHOIS (Registrant, name servers, expiry dates, etc.). » For instance, DENIC, Germany NIC, holds the authoritative WHOIS to a .DE domain name » However, some domain registries, such as for .COM, .ORG, .INFO, etc., use a registry-registrar model » There are hundreds of Domain Name Registrars that actually perform the domain name registration with the end user (see lists at ICANN or VeriSign) » By using this method of distribution, the registry only has to manage the relationship with the registrar, and the registrar maintains the relationship with the end users, or 'registrants' » For .COM, .NET domain names, the domain registries, VeriSign holds a basic WHOIS (registrar and name servers, etc.) » One can find the detailed WHOIS (registrant, name servers, expiry dates, etc.) at the registrars » Since about 2001, most gTLD registries (.ORG, .BIZ, .INFO) have adopted a socalled "thick" registry approach, i.e. keeping the authoritative WHOIS with the various registries instead of the registrars Legal users of domains Administrative contact » A registrant usually designates an administrative contact to manage the domain name » The administrative contact usually has the most immediate power over a domain » Management functions delegated to the administrative contacts may include: the obligation to conform to the requirements of the domain registry in order to retain the right to use a domain name authorization to update the physical address, e-mail address and telephone number etc. in WHOIS Technical contact » A technical contact manages the name servers of a domain name. The many functions of a technical contact include: » making sure the configurations of the domain name conforms to the requirements of the domain registry » updating the domain zone » providing the 24×7 functionality of the name servers allows accessibility of the domain name Billing contact » The party whom a NIC invoices Name servers » Namely the authoritative name servers that host the domain name zone of a domain name Politics Many investigators have voiced criticism of the methods currently used to control ownership of domains » Critics commonly claim abuse by monopolies or near-monopolies, such as VeriSign, Inc » Particularly noteworthy was the VeriSign Site Finder system which redirected all unregistered .com and .net domains to a VeriSign webpage » Despite widespread criticism, VeriSign only reluctantly removed it after the Internet Corporation for Assigned Names and Numbers (ICANN) threatened to revoke its contract to administer the root name servers There is also significant disquiet regarding the United States' political influence over ICANN » This was a significant issue in the attempt to create a .xxx top-level domain and sparked greater interest in alternative DNS roots that would be beyond the control of any single country Truth in Domain Names Act » Main article: Anticybersquatting Consumer Protection Act » In the United States, the "Truth in Domain Names Act" (actually the "Anticybersquatting Consumer Protection Act"), in combination with the PROTECT Act, forbids the use of a misleading domain name with the intention of attracting people into viewing a visual depiction of sexually explicit conduct on the Internet Other Internet Resources See also » Dynamic DNS » Alternative DNS root » Comparison of DNS server software DNS Zones Forward DNS Zones Reverse http://en.wikipedia.org/wiki/Reverse_DNS_lookup DNS Reverse Lookup Overview » Typically, the Domain Name System is used to determine what IP address is associated with a given domain name. » So, to reverse-resolve a known IP address is to look up what the associated domain name is belonging to that IP address. » A reverse lookup is often referred to as reverse resolving, or more specifically reverse DNS lookup, and is accomplished using a "reverse IN-ADDR entry" in the form of a PTR record DNS Reverse Lookup IPv4 Reverse DNS » Reverse DNS lookups for IPv4 addresses use a reverse IN-ADDR entry in the special domain in-addr.arpa. » An IPv4 address is represented in the in-addr.arpa domain by a sequence of bytes in reverse order, represented as decimal numbers, separated by dots with the suffix .in-addr.arpa. » For example the reverse lookup domain name corresponding to the IPv4 address 10.12.13.140 is 140.13.12.10.in-addr.arpa. A host name for 1.2.3.4 can be obtained by issuing a DNS query for the PTR record for that special address 4.3.2.1.inaddr.arpa. DNS Reverse Lookup Classless Reverse DNS » Historically, IP addresses were allocated in blocks of 256 » Each block fell upon an octet boundary » Configuration of the PTR records easy Dot separators delimited each block » IP addresses are now allocated in very much smaller blocks » Traditional way of configuring a nameserver to perform reverse DNS cannot work » A means of overcoming this problem was devised and published as RFC 2317 » Uses a CNAME entry which corresponds to each block Multiple PTR records » While most rDNS entries only have one PTR record, it is perfectly legal to have many different PTR records » Although it is perfectly legal having multiple PTR records for the same IP address it is generally not recommended, unless you have a specific need » For example, if a webserver supports many virtual hosts Can be one PTR record for each host Some versions of name server software will automatically add a PTR record for each host » Multiple PTR records can cause a couple of problems » Including triggering bugs in programs that only expect there to ever be a single PTR record » In the case of a large webserver, having hundreds of PTR records can cause the DNS packets to be much larger than normal Records other than PTR records » While uncommon compared with PTR records, it is also legal to put other types of records in the reverse DNS tree. » In particular, encryption keys can be placed there » for, example, IPsec (RFC 4025) SSH (RFC 4255) IKE (RFC 4322) » Less standardized usages include » comments placed in TXT records and LOC records to identify the location of the IP address DNS Forward -vsReverse Lookups Forward DNS lookup » Forward DNS lookup is using an Internet domain name to find an IP address. Reverse DNS lookup » Reverse DNS lookup is using an Internet IP address to find a domain name. http://searchsmb.techtarget.com/sDefinition/0,,sid44_gci213968,00.html Lookups When you enter an address for a Web site at your browser » The address is transmitted to a nearby router » The router does a forward DNS lookup in a routing table to locate the IP address Forward DNS lookup is the more common lookup » Most users think in terms of domain names rather than IP addresses Occasionally you may see a Web page with a URL in which the domain name part is expressed as an IP address (sometimes called a dot address) and want to be able to see its domain name. An Internet facility that lets you do either forward or reverse DNS lookup yourself is called nslookup » Comes with some operating systems » Can download the program and install it in your computer DNS Zones Forwarding DNS Forwarding In large, well organized, academic or ISP networks you will sometimes find that the network people have set up a forwarder hierarchy of DNS servers » Helps lighten the internal network load and the load on the outside servers It's not easy to know if you're inside such a network or not By using the DNS server of your network provider as a and less of a load on your network Your nameserver forwards queries to your ISP nameserver Each time this happens you will dip into the big cache of your ISPs nameserver » Thus speeding your queries up, your nameserver does not have to do all the work itself If you use a modem this can be quite a win http://tldp.org/HOWTO/DNS-HOWTO-4.html DNS Delegation/Parenting Mail Exchangers Covered in previous section: How DNS Works In Practice - Other Applications Content Delivery Networks This slides are provided by Saverio Nicolini of NEC Laboratories Europe AN INTRODUCTION Retrieving Content WITHOUT a CDN Imagine you want to get the latest iPhone software update* and there is no CDN » apple.com/update Do you see any problem with this? *the same concepts apply to the downloading of a web page Retrieving Content WITHOUT a CDN: problems Server Scalability » Origin server becomes a bottleneck » Too many requests to be served in short time » Too few bandwidth to deliver the (huge) content Network Performance » Network utilization and congestion: same data flows repeatedly over links between clients and origin server (even if clients are next to each other) » Delay: content can encounter big delays when flowing each time from remote locations Server Scalability Solutions: server farms Trivial: use a large number of servers and balance the requests Load Balancer » Load balancer can balance at random (or on L3/4/7 parameters) » Does not improve network utilization » Does not improve user latency due to network congestion Network Performance Solutions: caching proxies Solution: ISP intercepts and redirects specific traffic (e.g., TCP traffic on port 80) to a local cache (if content was previously seen) » Example: Squid Web Proxy Cache (http://www.squid-cache.org/) Major benefits (for ISPs) » Reduced network utilization » Imrpoved user latency Drawbacks of caching proxies » Serve only ISP customers » Content providers can not assume caching proxies are deployed » not all ISPs deploy caching proxies » Content providers can not assume they work correctly » correct handling of time to live » correct caching of what can be cached and what can not » Accounting and reporting is very important to service operators » E.g., would need to retrieve number of hits from caching proxies to monetize advertisement ISP corriere.it Cache Content -the- Internet Retrieving Content WITH a CDN What if the iPhone software update has been previously distributed to multiple locations all over the world? Replica/Surrogate Server apple.com/update Replica/Surrogate Server Origin Server Replica/Surrogate Server Replica/Surrogate Server Do you see the improvement? Retrieving Content WITH a CDN: advantages The same data traverses potentially congested links on the Internet only few times » Most importantly: the number of times is now independent from the number of user requests The origin server load is highly reduced » And again: the load is now independent from the number of user requests Users receive their content quicker » The nearest the location the smaller the delay Further optimizations are possible (and independent from the network operators routing strategies) » The content delivery can avoid congested regions (by intelligently redirecting the content request to a non-congested zone) A number of geographically distributed servers deployed to facilitate the distribution of contents in a timely and efficient manner CDN Arena: players CDN (Service) Providers » The (traditional) operators of a CDN » Owners of the geographically distributed infrastructure hosting the content replicas » Operating the software that manages the CDN (either with software developed inhouse or acquired) » Delivering content to users on behalf of the content providers Content Providers (CP) » The owner of the contents (and normally of the origin server as well) » The ones willing to pay a CDN Service Provider in order to distribute their contents efficiently » E.g., by delegating the URI name space of the objects to be distributed » Example(s): CNN, Netflix, YouTube, Disney, Warner Bros. CDN Solution Vendors » Technology providers developing turnkey solutions to run a CDN (or just some software components) » Example(s): CoBlitz, Velocix, Jet-Stream CDN Arena: players Internet Service Providers (ISPs) » (Traditionally) Bit-pipes (the looser of in the CDN arena) » they upgrade their infrastructure to face the huge increase in content demand but they do not earn anything from the contents » Example(s): Telecom Italia, France Telecom, etc. » (Network) Equipment Vendors » Vendors developing (traditionally) network infrastructures devices in order to transport intelligently and efficiently the bit pipes » Example(s): Cisco, Alcatel-Lucent, Ericsson, NEC » End Users What is the difference between a CDN and Caching Proxies? Who wants them and why? » An ISP deploys a caching proxy to reduce bandwidth consumption, save costs (OPEX: bandwidth, CAPEX: less network upgrades) and satisfy its own users » A content provider gives its content to a CDN Service Provider in order to improve the quality of service towards its users (independently of the ISP, i.e., all Internet users) How they operate? » Caching proxies operate in a reactive manner » They start caching after they have seen the content one or multiple times » CDN operative in a proactive manner » Content Providers push the latest blockbuster movie to multiple locations around the world What is the difference between a CDN and Caching Proxies? Whom are they pleasing? » providers) » A CDN satisfies all end users and content providers (who are in full control of their contents) we will discuss later why ISPs are not fully satisfied with the traditional CDN model QUIZ: NAME THE 5 BIGGEST CDN SERVICE PROVIDERS WORLDWIDE Five Biggest CDN Service Providers Worldwide Source: Yankee Group 2009 CDN Scorecard 1. 2. 3. 4. 5. Akamai » 84.000 servers in 1.000 networks* Limelight EdgeCast AT&T Level3 » became in Nov. 2010 the primary CDN for Netflix * source: Mike Afergan, Akamai CTO CDN FUNCTIONAL ARCHITECTURE CDN Functionalities Content Ingestion/Acquisition » Content entry point inside the CDN from the content source » i.e., origin server, content provider » Content preparation for the CDN: multiple formats and codec rates Content Distribution » Placement: moving or replicating content to surrogate servers (replica placement) » Outsourcing/Retrieval: moving or replicating content within surrogate servers » Selection/Delivery: selecting and delivering the right content (in the rigth format) from the surrogate server to the end user CDN Functionalities Request Routing » Directing a request to the closest suitable surrogate server CDN Management » Monitoring performance, logging, reporting and accounting on distribution and delivery activities » Can have interfaces towards Content Provider and/or other CDNs (e.g., for interconnection) CDN Functional Architecture Origin Server Ingestion / Acquisition CDN Preparation Placement CDN Management Request Routing Retrieval Delivery Surrogate Surrogate Distribution Origin Server Talking to CDN Origin Server CDN Ingestion / Acquisition Preparation Placement CDN Management Retrieval Distribution Delivery Origin server interactions Content Ingestion/Acquisition » New content pushed to the CDN (e.g., http, nfs, webdav, etc.) afterwards content is prepared for Placement Origin server interactions with CDN Management » Exchange of logs and other accounting info End-user Interactions with CDN: request routing End-user requests for www.very-popular.com » Request Routing intercepts the request and redirects the request towards nap.very-popular.akamai.com) Request Routing and Delivery make sure that surrogate server has the right content in the right format (otherwise surrogate asks to Distribution and Retrieval kicks in) Placement CDN Retrieval Request Routing Delivery (IT) Surrogate nap.akamai.very-popular.com (DE) Surrogate fra.akamai.very-popular.com Distribution Request Routing Responsible for routing client requests to best surrogate servers » Metrics used to select the surrogate server to route the request to » Network proximity » Client-perceived latency » Surrogate server load » Algorithms » Adaptive (take into account current system conditions) commonly used by successful commercial systems (e.g., Akamai, Cisco, etc.) with proprietary combinations of the metrics above » Non-adaptive (based on heuristics ignoring current system conditions) E.g., round-robin load-balancing » Mechanisms » See next slides » For a complete taxonomy read http://tools.ietf.org/html/rfc3568 Request Routing Mechanisms (requiring no interaction with origin server) DNS-based (most used by CDN systems) » A modified DNS servers is inserted in the DNS resolution process » capable of returning a different set of A, NS or CNAME records based on user defined policies, metrics, or a combination of both » mapping the surrogate server symbolic name to IP address » Single reply » DNS server is authorative for the entire DNS domain » The DNS server returns the IP address of the best surrogate in an A record to the requesting DNS server » Could also be a virtual IP address of the best set of surrogates for requesting DNS server Request Routing nap.akamai.very-popular.com Local DNS (IT) Surrogate (DE) Surrogate fra.akamai.very-popular.com CDN Request Routing Mechanisms (requiring no interaction with origin server) » Multiple reply » Request routing DNS server returns multiple replies such as several A records for various surrogates » Common implementations of client site DNS server's cycles through the multiple replies in a Round-Robin fashion » Multi-level resolution » Multiple request routing DNS servers returned in a single DNS resolution » Allow to distribute more complex decisions from a single server to multiple, more specialized, request-routing DNS servers Request Routing nap.akamai.very-popular.com Local DNS (IT) Surrogate (DE) Surrogate fra.akamai.very-popular.com CDN DNS-based request routing is not errorRequest routing function guesses the location of the user based on » Possible solution: ISPs to communicate to CDN Service Providers the location of their Local DNS servers » Even more complicated: end-users configure manually their DNS servers » E.g., they do not have trust in the DNS server of their ISPs » Possible solution: recent IETF standardization on ALTO (Application Layer Traffic Optimization) » ALTO: ISP service to provide applications guidance on the underlying network not deployed yet ALTO Service: P2P-use case Request Routing with the help of an ALTO service Request Routing nap.akamai.very-popular.com ALTO Protocol 3a (IT) Surrogate 3b CDN fra.akamai.very-popular.com 2a 2b 4a 4b ALTO Local DNS 1a (DE) Surrogate 1b ISP 1a. Give me www.very-popular.com 2a. Please resolve www.very-popular.com 3a. Which one to prefer between nap.akamai and frap.akamai for that end-user? 3b. Go to nap.akamai 2b. Go to nap.akamai 1b. Go to IP address of nap.akamai 4a. Give me content 4b. Here it is... Request Routing Mechanisms (requiring interaction with origin server) HTTP redirection » Information about surrogate servers are propagated in the HTTP headers » Web server answers to client telling him to resubmit its request to another server Flexible and simple Lacks of transparency, introduces extra round-trip, introduces extra processing on origin server URL rewriting (used by some CDN systems) » Origin server rewrites the embedded objects URLs in the dynamically generated pages thus the endsurrogate servers Allows fine granularity, flexible (routing potentially can change at every request) Introduces extra load on origin server Distribution: Placement (I) cntv.cn/videos Replica/Surrogate Server apple.com/update Origin Server Origin Server Replica/Surrogate Server Replica/Surrogate Server Replica/Surrogate Server Distribution: Placement (II) Replica Placement » Objective: optimize the placement of replica (choosing which contents to replicate where, among the surrogate servers available) » Performance improvements targeted » Reduce user-perceived latency for accessing content » Minimize the overall network bandwidth consumption for transferring content to clients » Benefits » for the CDN Service Provider: reduced infrastructure (CAPEX) » for the ISP: reduced communications costs (OPEX) » for the users: improved quality of experience » Related optimization problem: choice of physical locations where to place surrogate servers » In reality: locations are given by a mix of theoretical analysis and business constraints (e.g., hosting/storage facilities actually available from third-parties and their costs) » Single-ISP approach (e.g., AT&T): many choices on locations (up to base station level) » Multi-ISP approach: caches placed at global ISP Points of Presence (POPs) by CDN Service Provider (e.g., Akamai, Amazon CloudFront) Replica Placement Strategies very hard to solve computationally, i.e., NP-complete) » Theoretical approaches » centers and servers inside the storage centers, minimize the maximum distance between a node (end-user) and the nearest center Minimum K-center problem [1] K-hierarchically well-separated trees (K-HST) [2] (graph-theory) » Heuristics » Reducing the computational complexity as a trade-off of optimality: find an acceptably good solution in a fixed amount of time (rather than the best possible solution) Greedy algorithms [3] Topology-informed algorithms [4] Hot-spot algorithms [5] Probabilistic meta-heuristics for combinatorial optimization [6] th [6] http://en.wikipedia.org/wiki/Metaheuristic Annual IEEE Symposium on Foundations of Computer Science, 1996 Distribution: Retrieval (I) After content was placed » origin)? cntv.cn/videos Replica/Surrogate Server ? apple.com/update ? Origin Server ? Origin Server Replica/Surrogate Server Replica/Surrogate Server Replica/Surrogate Server Distribution: Retrieval (II) Outsourcing/Retrieval » Objective: given contents are placed in surrogate servers, how contents get outsourced by other surrogate servers not having the specific content? » Cooperative push-based Pre(e.g., based on P2P technologies) the delivering of the content to the surrogates not having the content » Cooperative pull-based used by many commercial systems » Non-cooperative pull-based used by many commercial systems » Related issue: Once the content gets to the surrogate server of interest, should it stay there (placement)? many successful commercial systems (e.g., YouTube) always replicate the content to the surrogate server located in the region where it was created and/or requested at least once » given the huge number of contents they do not bother with replica placement (same design strategy as Verivue, a successful startup in the CDN space, http://www.verivue.com/blog/, from Larry Peterson, Chief Scientist at Verivue and Professor at Princeton University) Distribution: Cache Update Cache Update » Objective: declare content cached in the surrogate servers as obsolete » Periodic updates (most common cache update method) Origin server conveys information (e.g., Cache-Control HTTP settings) on what is cacheable, for how long, etc, Such information is stored in the surrogate servers Surrogate servers are inspected periodically and cleaned of expired contents » Update propagation Triggered by a change in the content Requires active content pushing to the surrogate servers in order to replace the content Used only for slowly varying contents » On-demand update Update for the content is requested only when there is a demand for the content » Invalidation Invalidation message sent to all surrogate servers for a specific content » Benefit: server end-user with up to date information/contents Distribution: Cache Replacement Cache Replacement » Objective: action to be taken when the object store is full » Most simple/popular replacement rules LRU (replace Least Recently Used object) » E.g., replace the object which was not requested for longest time LFU (replace Least Frequently Used object) » E.g., replace the object with fewer requests since it was stored » Cache pollution: highly popular contents from the past could prevent new popular contents from being cached (long term vs. short term popularity) » What to take into account for designing an efficient replacement strategy What is the probability of the object being requested in the future? What is the history of requests of that object in the past? When was the object requested last? What would is the (network) cost of the object? (e.g., bandwidth to the origin server for fetching it) What is the (storage) cost of the object? (i.e., object size) What are premium user requesting? Distribution: Delivery Selection and Delivery » Objective: right selection of the content to be delivered to the end-user » Full-site vs. Partial-site content selection Full-site: entire origin server content is outsourced to CDN (e.g., DNS configuration by content provider thus all requests are resolved by the CDN DNS server) » Pros: simplicity » Cons: huge storage required at surrogate servers Partial-site: partial replication of heavy objects only (embedded videos, huge contents) (e.g., objects have host names in a domain for which the CDN provider is authoritative) » Network and/or Device-based content selection Content to be delivered is chosen based on network and/or end-user terminal characteristics » If CDN could detect end-user device, it could adapt the content selection based on device characteristics (e.g., http://wurfl.sourceforge.net/) » Benefit: a correct content selection and delivery can dramatically reduce the client download time, the server load and the network usage CDN Management: Performance Measurement (I) Performance measurement of a CDN » » Most important metrics to measure the performance of a CDN » Cache hit ratio: ratio between the number of cached objects versus total objects requested (high is better ) » Origin server bandwidth: bandwidth used by the origin server (low is better ) » Latency: time for the user to receive the objects requested (e.g., a web page, startup delay of a video) (low is better ) » Surrogate server utilization: busy time of the surrogate server (high is better » Availability: how reliable a CDN is (e.g., measured with packetlosses, requests timed out, etc.) » Good performance measurement (via detailed content-access logs) is one of the differentiators of the most successful CDN Service Providers » it is about how you present it and how you make it usable to your customers (i.e., content providers) CDN Management: Performance Measurement (II) * source: http://www.akamai.com/html/technology/dataviz2.html * source: http://www.akamai.com/worldcup CDN Management: Performance Measurement (III) * source: http://www.akamai.com/html/technology/dataviz1.html * source: http://www.akamai.com/html/technology/nui/news/index.html Video: Akamai´s Network Operation and Control Center (NOCC) (from min 2:58 to 5:52) Content Cache Server Cache Server Server Content Content Provider Providers Content Server Centralized Model Content Content Provider Providers Content Distribution Revenues Global CDN Providers (Mobile) Carrier Network Cache Cache Server Server CDN Model (Mobile) Carrier Network Backhaul and Core are bottleneck