DNS and CDNs (Content Distribution Networks) Paul Francis Cornell Computer Science

advertisement
DNS and CDNs (Content
Distribution Networks)
Paul Francis
Cornell Computer Science
What do all of these have in
common?

http://www.cnn.com/news/story.html


mailto://francis@cs.cornell.edu


HTTP (web)
Email
sip://service@phone.verizon.com

SIP (Session Initiation Protocol)
They all have a DNS name
somewhere

http://www.cnn.com/news/story.html


francis@cs.cornell.edu


HTTP (web)
Email
sip://service@phone.verizon.com

SIP (Session Initiation Protocol)
Why is DNS so important?

Names are easier to remember than
IP addresses


paul@129.48.55.233 ???
And in any event, IP addresses are
not “dependable”
They change often (dialup)
 They are not all unique

DNS is the “core” of the Internet
So “we” (humans, and applications)
like to deal with dependable, stable,
friendly DNS names
 The names get “mapped” into IP
addresses by lower layers



By the Domain Name System (DNS)
Then the learned IP address is put
into packets, and IP routing gets the
packets across the Internet
Picture of DNS query/reply
Why all these dots?
Why falcon.cs.cornell.edu?
 Why not “cornell-falcon” or
something?

It wasn’t always that way

Twenty years ago, this was a valid
email address:


george@isi
How did my computer learn the IP
address of “isi”?
The “host table” and DNS





Before DNS, there was the host table
This was a complete list of all the hosts in
the Internet!
It was copied every night to every machine
on the Internet!
At some point, this was perceived as a
potential scaling bottleneck…
So a distributed directory called the
“Domain Name System” was invented
(DNS)
The host table (historic)
Host Name
IP Address
mit-dlab
133.65.14.77
isi-mail
24.72.188.13
mit-lcs
133.65.29.1
…
…
Distributed Directory
A primary goal of DNS was to have a
distributed “host table”, so that each
site could manage its own name-toaddress mapping
 But also, it should scale well!

DNS is simple but powerful

Only one type of query

Query(domain name, RR type)
• Resource Record (RR) type is like an attribute
type



Answer(values, additional RRs)
Limited number of RR types
Hard to make new RR types


Not for technical reasons…
Rather because each requires global
agreement
DNS is the core of the Internet

Global name space


Can be the core of a naming or
identifying scheme
Global directory service

Can resolve a name to nearly every
computer on the planet
Important DNS RR types


NS: Points to next Name Server down the tree
A: Contains the IP address



AAAA for IPv6
MX: Contains the name of the mail server
Service-oriented RR types

SRV: Contains addresses and ports of services on
servers
• One way to learn what port number to use

NAPTR: Essentially a generalized mapping from one
name space (i.e. phone numbers) to another (i.e. SIP
URL)
DNS tree structure
NS RR “pointers”
.
edu.
cornell.edu.
cs.cornell.edu.
com.
cmu.edu.
jp.
us.
mit.edu.
eng.cornell.edu.
foo.cs.cornell.edu
bar.cs.cornell.edu
A
A
10.1.1.1
10.1.1.1
Primary and secondary servers
cornell.edu.
NS RRs point to
both primary and
secondary servers
cs.cornell.edu.
RRs are initially
configured into
primary server
Primary server replicates
RRs onto secondary
servers periodically
(updates are incremental)
Resolver structure and
configuration
Static configuration
of root servers
.
edu.
cornell.edu.
cs.cornell.edu.
com.
jp.
Stub resolver
resides on client
host, points to
configured recursive
server
cmu.edu.
eng.cornell.edu.
Resolver manages
DNS queries on
behalf of stub
resolvers
Resolver structure and
configuration
.
edu.
cornell.edu.
cs.cornell.edu.
com.
jp.
2,3,4… Resolver
makes iterative
queries to servers
cmu.edu.
eng.cornell.edu.
Resolver
caches
results for
efficiency
1. Stub resolver
sends recursive
query
N. Resolver
returns final
answer to stub
resolver (which
also caches
result)
DNS cache management



All RRs have Time-to-live (TTL) values
When TTL expires, cache entries are
removed
NS RRs tend to have long TTLs



Cached for a long time
Reduces load on higher level servers
A RRs may have very short TTLs


Order one minute for some web services
Order one day for typical hosts
Caching is the key to
performance
Without caching, the small number of
machines at the top of the hierarchy
would be overwhelmed
 But what if you want to change the IP
address of a host? How do you
change all those cached entries
around the world?


You can’t…you wait until they timeout
on their own, then make your change
Changing a DNS name

Say your TTL was set to one day


So, give the host two IP addresses for a
while (the old one and the new one)


This means that even if you change DNS
now, some hosts will continue to use the old
address for a day
But DNS only answers with the new one
After a day, the old one is cleaned out of
caches, and you can remove it from the
host
DNS Issues

DoS attacks on (13) root servers

DoS = Denial of Service
Mis-configuration issues
 But on the whole DNS is an incredible
system, and is in many important
respects is the “core” of the Internet

http://www.cnn.com/news
 francis@cs.cornell.edu

Next, Content Distribution
Networks

Idea here is to replicate a “web
server” in many places over the
Internet
Latency to a single centralized web
server farm may be too high
 A centralized web server farm may fail

Content Routing Principle
(a.k.a. Content Distribution Network)
Hosting
Center
Backbone
ISP
Hosting
Center
Backbone
ISP
IX
Backbone
ISP
IX
Site
ISP
ISP
S
S
ISP
S
S
S
S
S
S
S
Sites
Content Routing Principle
(a.k.a. Content Distribution Network)
Hosting
Center
Backbone
ISP
CS
Hosting
OS
Center
Backbone
ISP
CS
IX
Content Origin here
at Origin Server
Backbone
ISP
CS
IX
Site
ISP CS
ISP
S
S
ISPCS
S
S
S
S
S
S
S
Sites
Content Servers
distributed
throughout the
Internet
Content Routing Principle
(a.k.a. Content Distribution Network)
Hosting
Center
Backbone
ISP
CS
Hosting
OS
Center
Backbone
ISP
CS
IX
Backbone
ISP
CS
IX
Site
ISP CS
ISP
S
S
ISPCS
S
S
S
S
S
C
S
S
Sites
C
Content is served
from content
servers nearer to
the client
Two basic types of CDN:
cached and pushed
Hosting
Center
Backbone
ISP
CS
Hosting
OS
Center
Backbone
ISP
CS
IX
Backbone
ISP
CS
IX
Site
ISP CS
ISP
S
S
ISPCS
S
S
C
S
S
S
C
S
S
Sites
Cached CDN
Hosting
Center
Backbone
ISP
CS
Hosting
OS
Center
Backbone
ISP
CS
IX
1. Client requests
content.
Backbone
ISP
CS
IX
Site
ISP CS
ISP
S
S
ISPCS
S
S
C
S
S
S
C
S
S
Sites
Cached CDN
Hosting
Center
Backbone
ISP
CS
Hosting
OS
Center
Backbone
ISP
CS
IX
1. Client requests
content.
2. CS checks cache, if
Backbone
miss gets content
ISP
from origin server.
CS
IX
Site
ISP CS
ISP
S
S
ISPCS
S
S
C
S
S
S
C
S
S
Sites
Cached CDN
Hosting
Center
Backbone
ISP
CS
Hosting
OS
Center
Backbone
ISP
CS
IX
IX
1. Client requests
content.
2. CS checks cache, if
Backbone
miss gets content
ISP
from origin server.
CS
3. CS caches content,
delivers to client.
Site
ISP CS
ISP
S
S
ISPCS
S
S
C
S
S
S
C
S
S
Sites
Cached CDN
Hosting
Center
1. Client requests
content.
2. CS checks cache, if
Backbone
Backbone
miss gets content
ISP
ISP
from origin server.
CS
CS
3. CS caches content,
delivers to client.
IX
IX
4. Delivers content out
Site
of cache on
subsequent
ISP
ISPCS
requests.
Backbone
ISP
CS
ISP CS
S
S
S
S
C
Hosting
OS
Center
S
S
S
C
S
S
Sites
Pushed CDN
Hosting
Center
Backbone
ISP
CS
Hosting
OS
Center
Backbone
ISP
CS
IX
1. Origin Server
pushes content out
to all CSs.
Backbone
ISP
CS
IX
Site
ISP CS
ISP
S
S
ISPCS
S
S
C
S
S
S
S
S
Sites
C
Pushed CDN
Hosting
Center
Backbone
ISP
CS
Hosting
OS
Center
Backbone
ISP
CS
IX
1. Origin Server
pushes content out
to all CSs.
Backbone2. Request served from
ISP
CSs.
CS
IX
Site
ISP CS
ISP
S
S
ISPCS
S
S
C
S
S
S
S
S
Sites
C
CDN benefits

Content served closer to client


Less latency, better performance
Load spread over multiple distributed CSs



More robust (to ISP failure as well as other
failures)
Handle flashes better (load spread over
ISPs)
But well-connected, replicated Hosting
Centers can do this too
CDN costs and limitations

Cached CDNs can’t deal with
dynamic/personalized content



More and more content is dynamic
“Classic” CDNs limited to images
Managing content distribution is non-trivial



Tension between content lifetimes and cache
performance
Dynamic cache invalidation
Keeping pushed content synchronized and
current
What if lots of clients try to
access the same CS?
Hosting
Center
Backbone
ISP
CS
Hosting
OS
Center
Backbone
ISP
CS
IX
Backbone
ISP
CS
IX
Site
ISP CS
ISP
S
S
ISPCS
S
S
S
S
C C C CC C
S
S
S
Sites
How can the CDN spread this
load around?
Hosting
Center
Backbone
ISP
CS
Hosting
OS
Center
Backbone
ISP
CS
IX
Backbone
ISP
CS
IX
Site
ISP CS
ISP
S
S
ISPCS
S
S
S
S
C C C CC C
S
S
S
Sites
Guess what: DNS!



Smart DNS server monitors load on the
content servers
When it answers a DNS request, it picks a
server that is not overloaded (and near the
client)
The DNS answer has a small TTL (30
seconds – one minute)


Small TTL allows the DNS load balancer to
make fine-grained load decisions
Can quickly offload a busy or even crashed
content server
How well do CDNs work?





Hard to say…
Some evidence suggests they are not so
good a picking nearby servers
Internet bandwidth is improving, so not as
important to pick nearby servers
Central hosting centers are easier to
manage, and perform increasingly well
In fact, Akamai is beginning to find it difficult
to justify its service!
Download