DNS Performance and the Effectiveness of Caching MIT Laboratory

advertisement
DNS Performance and the Effectiveness
of Caching
Jaeyeon Jung, Emil Sit, Hari Balakrishnan, Robert Morris
MIT Laboratory for Computer Science
November 2001
Motivation
✔ Identify the factors that affect client-perceived
DNS performance
– Response latency
– Errors and failure modes of DNS
✔ Evaluate the effectiveness of DNS caching
– How important is it for scalability?
– Unanticipated uses of DNS (e.g. Web server
selection)
– 18% of flows were due to DNS in MCI wide-area
backbone traces
Slide 1
DNS Lookup Sequence
App
2
Root
server
3
1
Local Network
5
Local
Recursive
DNS server
Local cache
Internet
4
.edu
server
Delegation
Stub Resolver
MIT
server
1. Host asks local server for address of www.mit.edu
2. Local DNS server doesn’t know, asks root.
Root refers to a .edu server
3. The .edu server refers to an MIT server
4. The MIT server responds with an address
5. The local server caches response
✔ Query / response [ referral answer]
Slide 2
Terminology
✔ Mapping in the DNS name space
–
–
record : name’s IP address
record : name of DNS server
✔ Caching in DNS
– Time To Live : expiration time set by the originator
of a name
– Negative caching
Slide 3
Questions
✔ What is the ratio of TCP connections to DNS A record
lookups?
✔ What is the number of DNS queries per lookup?
✔ DNS errors
– What percentage of lookups do never get an
answer?
– Performance of retransmission protocol
✔ DNS failures
✔ What is the effect of varying TTLs and degrees of
caching sharing on cache hit rate?
Slide 4
Key Findings
✔ TCP / DNS lookup ratio suggests that the hit rate of
DNS caches inside MIT is between 70% and 80%
✔ 23% of all client lookups in the most recent MIT trace
fail to elicit any answer
✔ 13% of lookups result in an answer that indicates a
failure. Most of these failures indicate
✔ % of TCP connections made to names with low TTL
values increased from 12% to 25% in 2000
✔ Setting all A-record TTL’s to a value as small as 10
minutes is not likely to degrade the scalability of DNS
Slide 5
The Data
✔ Collection Methodology
– Wide-area DNS query/response
– Outgoing TCP connections:
– Anonymized internal addresses
✔ Analysis Methodology
– Sliding window of 60 seconds
R1 a.root−servers.net
NS .edu
t1
Q1 a.root−servers.net
A www.mit.edu
t2 t3
Q2 .edu
t4
Q2 .edu
R2 .edu
NS mit.edu
t5 t6
R3 mit.edu
A www.mit.edu
t7
Q3 mit.edu
latency: t7 −t1
retransmission: 1
referral: 2
t
Slide 6
Traced Network Topology
Subnet #1
Collection
machine
Subnet #2
External network
Router
MIT LCS and AI
Subnet #24
✔
Subnet #3
– 24 internal subnetworks sharing the border router
– Data collected in January and December 2000
✔ Paper has KAIST analysis
Slide 7
Basic Statistics
00/01/03-10
2,530,430
23.5%
64.3%
11.1%
1.0%
6,039,582
4,521,348
4.62
Date
Total lookups
Unanswered
Answered with success
Answered with failure
Zero answer
Total query packets
TCP connections
#TCP : #valid answers
00/12/04-11
4,160,954
22.7%
63.6%
13.1%
0.5%
10,617,796
5,347,003
3.53
Slide 8
Unanswered Lookups
✔ Significant fraction of all DNS packets seen in the
wide-area Internet:
of total
5.5%
13.1%
4.9%
Zero referrals
Non-zero referrals
Loops
, 63%
– 59%
query packets
9.3%
10.3%
3.1%
- Unanswered lookups classified by type -
Slide 9
Retransmissions
100
80
CDF
60
40
mit-jan00 (Answered)
mit-dec00 (Answered)
mit-jan00 (Zero referral)
mit-dec00 (Zero referral)
20
0
0
2
4
6
8
10
12
14
Number of retransmissions
✔ 99.9% of answered lookups have
2 retransmissions
Slide 10
Suboptimal Retransmission Strategy
✔ Overly persistent retransmissions
– Average # of retransmissions:
,5
3.5
– # of retransmissions of worst 5%:
, 12
6
✔ Inappropriate setting for the number of retries or
excessive timeout value
– No retransmissions within 60 seconds:
, 19%
12%
Slide 11
Failures
7.68%
3.24%
11.14%
1.96%
✔ Inverse lookups for IP addresses with no inverse
mapping:
✔ Invalid queries :
✔ Non-existent top-level domains:
,
➔ Large number of distinct names makes negative caching
ineffective
Slide 12
DNS Caching
✔ How useful is it to share DNS caches among
many client machines?
➔ Locality of references among clients
✔ What is the likely impact of choice of TTL on
caching effectiveness?
➔ Locality of references in time
Slide 13
Name Popularity
1
Fraction of requests
0.8
0.6
0.4
0.2
mit-jan00
mit-dec00
kaist-may01
0
0
0.2
0.4
0.6
0.8
1
Fraction of query names
✔ The top 10% account for more than 68% of total
lookups
✔ A long tail : 9.0% are unique names
Slide 14
TTL Distribution
100
1 day
80
60
CDF
1 hour
40
15 minutes
20
5 minutes
0
1
10
100
mit-jan00
mit-dec00
kaist-may01
1000
10000
100000
TTL (sec)
✔ The fraction of accesses to short TTLs has doubled
➔ Increased deployment of DNS-based server
selection
Slide 15
Trace-driven Simulation
✔ TCP connection workload
✔ Database containing name/IP/TTL bindings
✔ Algorithm
– Randomly divide TCP clients into groups of size s
– For each new TCP connection, determine the group
g and look for a name n in the cache of group g
– If n exists and the cached TTL has not expired,
record a hit. Otherwise record a miss
Slide 16
Effect of Sharing on Hit rates
100
Hit rate (%)
80
60
40
20
mit-jan00
mit-dec00
kaist-may01
0
10
20
30
40 50 60
Group size
70
80
90
100
✔ Most of the benefit of sharing is obtained with as few as
10 or 20 clients per cache
Slide 17
Impact of TTL on Hit rates
100
Hit rate (%)
80
60
40
20
group size = 1
group size = 25
group size = 1233
0
0
500 1000 1500 2000 2500 3000 3500 4000 4500 5000
TTL (sec)
✔ Most of the benefit of caching is achieved with TTLs
less than about 1000 seconds.
✔ 5-min TTLs would increase DNS traffic by factor of 1.5
✔
record caching is critical
Slide 18
Conclusions
✔ About a quarter of all DNS lookups never get an
answer, which corresponds over 50% of DNS packets in
the wide-area Internet
✔ The DNS retransmission protocol appears to be overly
persistent, but in 10%-12% of cases, no retransmissions
occur.
✔ Setting all A-record TTL’s to a value as small as 10
minutes is not likely to degrade the scalability of DNS
in any noticeable way.
✔ The cacheability of
records enhances scalability by
reducing load on the root and top-level name servers
Slide 19
Download