DNS and CDNs (Content Distribution Networks) Paul Francis Cornell Computer Science What do all of these have in common? http://www.cnn.com/news/story.html mailto://francis@cs.cornell.edu HTTP (web) Email sip://service@phone.verizon.com SIP (Session Initiation Protocol) They all have a DNS name somewhere http://www.cnn.com/news/story.html francis@cs.cornell.edu HTTP (web) Email sip://service@phone.verizon.com SIP (Session Initiation Protocol) Why is DNS so important? Names are easier to remember than IP addresses paul@129.48.55.233 ??? And in any event, IP addresses are not “dependable” They change often (dialup) They are not all unique DNS is the “core” of the Internet So “we” (humans, and applications) like to deal with dependable, stable, friendly DNS names The names get “mapped” into IP addresses by lower layers By the Domain Name System (DNS) Then the learned IP address is put into packets, and IP routing gets the packets across the Internet Picture of DNS query/reply Why all these dots? Why falcon.cs.cornell.edu? Why not “cornell-falcon” or something? It wasn’t always that way Twenty years ago, this was a valid email address: george@isi How did my computer learn the IP address of “isi”? The “host table” and DNS Before DNS, there was the host table This was a complete list of all the hosts in the Internet! It was copied every night to every machine on the Internet! At some point, this was perceived as a potential scaling bottleneck… So a distributed directory called the “Domain Name System” was invented (DNS) The host table (historic) Host Name IP Address mit-dlab 133.65.14.77 isi-mail 24.72.188.13 mit-lcs 133.65.29.1 … … Distributed Directory A primary goal of DNS was to have a distributed “host table”, so that each site could manage its own name-toaddress mapping But also, it should scale well! DNS is simple but powerful Only one type of query Query(domain name, RR type) • Resource Record (RR) type is like an attribute type Answer(values, additional RRs) Limited number of RR types Hard to make new RR types Not for technical reasons… Rather because each requires global agreement DNS is the core of the Internet Global name space Can be the core of a naming or identifying scheme Global directory service Can resolve a name to nearly every computer on the planet Important DNS RR types NS: Points to next Name Server down the tree A: Contains the IP address AAAA for IPv6 MX: Contains the name of the mail server Service-oriented RR types SRV: Contains addresses and ports of services on servers • One way to learn what port number to use NAPTR: Essentially a generalized mapping from one name space (i.e. phone numbers) to another (i.e. SIP URL) DNS tree structure NS RR “pointers” . edu. cornell.edu. cs.cornell.edu. com. cmu.edu. jp. us. mit.edu. eng.cornell.edu. foo.cs.cornell.edu bar.cs.cornell.edu A A 10.1.1.1 10.1.1.1 Primary and secondary servers cornell.edu. NS RRs point to both primary and secondary servers cs.cornell.edu. RRs are initially configured into primary server Primary server replicates RRs onto secondary servers periodically (updates are incremental) Resolver structure and configuration Static configuration of root servers . edu. cornell.edu. cs.cornell.edu. com. jp. Stub resolver resides on client host, points to configured recursive server cmu.edu. eng.cornell.edu. Resolver manages DNS queries on behalf of stub resolvers Resolver structure and configuration . edu. cornell.edu. cs.cornell.edu. com. jp. 2,3,4… Resolver makes iterative queries to servers cmu.edu. eng.cornell.edu. Resolver caches results for efficiency 1. Stub resolver sends recursive query N. Resolver returns final answer to stub resolver (which also caches result) DNS cache management All RRs have Time-to-live (TTL) values When TTL expires, cache entries are removed NS RRs tend to have long TTLs Cached for a long time Reduces load on higher level servers A RRs may have very short TTLs Order one minute for some web services Order one day for typical hosts Caching is the key to performance Without caching, the small number of machines at the top of the hierarchy would be overwhelmed But what if you want to change the IP address of a host? How do you change all those cached entries around the world? You can’t…you wait until they timeout on their own, then make your change Changing a DNS name Say your TTL was set to one day So, give the host two IP addresses for a while (the old one and the new one) This means that even if you change DNS now, some hosts will continue to use the old address for a day But DNS only answers with the new one After a day, the old one is cleaned out of caches, and you can remove it from the host DNS Issues DoS attacks on (13) root servers DoS = Denial of Service Mis-configuration issues But on the whole DNS is an incredible system, and is in many important respects is the “core” of the Internet http://www.cnn.com/news francis@cs.cornell.edu Next, Content Distribution Networks Idea here is to replicate a “web server” in many places over the Internet Latency to a single centralized web server farm may be too high A centralized web server farm may fail Content Routing Principle (a.k.a. Content Distribution Network) Hosting Center Backbone ISP Hosting Center Backbone ISP IX Backbone ISP IX Site ISP ISP S S ISP S S S S S S S Sites Content Routing Principle (a.k.a. Content Distribution Network) Hosting Center Backbone ISP CS Hosting OS Center Backbone ISP CS IX Content Origin here at Origin Server Backbone ISP CS IX Site ISP CS ISP S S ISPCS S S S S S S S Sites Content Servers distributed throughout the Internet Content Routing Principle (a.k.a. Content Distribution Network) Hosting Center Backbone ISP CS Hosting OS Center Backbone ISP CS IX Backbone ISP CS IX Site ISP CS ISP S S ISPCS S S S S S C S S Sites C Content is served from content servers nearer to the client Two basic types of CDN: cached and pushed Hosting Center Backbone ISP CS Hosting OS Center Backbone ISP CS IX Backbone ISP CS IX Site ISP CS ISP S S ISPCS S S C S S S C S S Sites Cached CDN Hosting Center Backbone ISP CS Hosting OS Center Backbone ISP CS IX 1. Client requests content. Backbone ISP CS IX Site ISP CS ISP S S ISPCS S S C S S S C S S Sites Cached CDN Hosting Center Backbone ISP CS Hosting OS Center Backbone ISP CS IX 1. Client requests content. 2. CS checks cache, if Backbone miss gets content ISP from origin server. CS IX Site ISP CS ISP S S ISPCS S S C S S S C S S Sites Cached CDN Hosting Center Backbone ISP CS Hosting OS Center Backbone ISP CS IX IX 1. Client requests content. 2. CS checks cache, if Backbone miss gets content ISP from origin server. CS 3. CS caches content, delivers to client. Site ISP CS ISP S S ISPCS S S C S S S C S S Sites Cached CDN Hosting Center 1. Client requests content. 2. CS checks cache, if Backbone Backbone miss gets content ISP ISP from origin server. CS CS 3. CS caches content, delivers to client. IX IX 4. Delivers content out Site of cache on subsequent ISP ISPCS requests. Backbone ISP CS ISP CS S S S S C Hosting OS Center S S S C S S Sites Pushed CDN Hosting Center Backbone ISP CS Hosting OS Center Backbone ISP CS IX 1. Origin Server pushes content out to all CSs. Backbone ISP CS IX Site ISP CS ISP S S ISPCS S S C S S S S S Sites C Pushed CDN Hosting Center Backbone ISP CS Hosting OS Center Backbone ISP CS IX 1. Origin Server pushes content out to all CSs. Backbone2. Request served from ISP CSs. CS IX Site ISP CS ISP S S ISPCS S S C S S S S S Sites C CDN benefits Content served closer to client Less latency, better performance Load spread over multiple distributed CSs More robust (to ISP failure as well as other failures) Handle flashes better (load spread over ISPs) But well-connected, replicated Hosting Centers can do this too CDN costs and limitations Cached CDNs can’t deal with dynamic/personalized content More and more content is dynamic “Classic” CDNs limited to images Managing content distribution is non-trivial Tension between content lifetimes and cache performance Dynamic cache invalidation Keeping pushed content synchronized and current What if lots of clients try to access the same CS? Hosting Center Backbone ISP CS Hosting OS Center Backbone ISP CS IX Backbone ISP CS IX Site ISP CS ISP S S ISPCS S S S S C C C CC C S S S Sites How can the CDN spread this load around? Hosting Center Backbone ISP CS Hosting OS Center Backbone ISP CS IX Backbone ISP CS IX Site ISP CS ISP S S ISPCS S S S S C C C CC C S S S Sites Guess what: DNS! Smart DNS server monitors load on the content servers When it answers a DNS request, it picks a server that is not overloaded (and near the client) The DNS answer has a small TTL (30 seconds – one minute) Small TTL allows the DNS load balancer to make fine-grained load decisions Can quickly offload a busy or even crashed content server How well do CDNs work? Hard to say… Some evidence suggests they are not so good a picking nearby servers Internet bandwidth is improving, so not as important to pick nearby servers Central hosting centers are easier to manage, and perform increasingly well In fact, Akamai is beginning to find it difficult to justify its service!