DNS: Domain Name System EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev, Prayag Narula http://inst.eecs.berkeley.edu/~ee122/ Materials with thanks to Jennifer Rexford, Ion Stoica, Vern Paxson and other colleagues at Princeton and UC Berkeley 1 Announcements • HW #2 Problem 7 has been corrected 2 Goals of Today’s Lecture • Finish Transport – We left off talking about UDP – Ready to move on to TCP • Concepts & principles underlying the Domain Name System (DNS) • Inner workings of DNS • Security problems with DNS 3 Transmission Control Protocol (TCP) • Connection oriented – Explicit set-up and tear-down of TCP session • Stream-of-bytes service – Sends and receives a stream of bytes, not messages • Congestion control – Dynamic adaptation to network path’s capacity • Reliable, in-order delivery – TCP tries very hard to ensure byte stream (eventually) arrives intact o In the presence of corruption and loss • Flow control – Ensure that sender doesn’t overwhelm receiver 4 Reliable Delivery • How do we design for reliable delivery? – How do you converse on a noisy cell phone connection? • Positive acknowledgment (“Ack”) – Explicit confirmation by receiver o On cell phone, “OK” o But how do you know you heard correctly? – TCP acknowledgments are cumulative (“I’ve received everything up through sequence #N”) o With an option for acknowledging individual segments (“SACK”) • Negative acknowledgment (“Nack”) – “I’m missing the following: …” – How might the receiver tell something’s missing? Can they always do this? – (Only used by TCP in implicit fashion - “fast retransmit”) 5 Reliable Delivery, con’t • Timeout – If haven’t heard anything from receiver, send again – Problem: for how long do you wait? o TCP uses function of estimated RTT – Problem: what if no Ack for retransmission? o TCP (and other schemes) employs exponential backoff o Double timer up to maximum - tapers off load during congestion • A very different approach to reliability: send redundant data – Cell phone analogy: “Meet me at 3PM - repeat 3PM” – Forward error correction – Recovers from lost data nearly immediately! – But: only can cope with a limited degree of loss – And: adds load to the network (interesting tradeoff) 6 TCP Support for Reliable Delivery • Sequence numbers – – • Checksum – – – • Used to detect missing data ... and for putting the data back in order Used to detect corrupted data at the receiver …leading the receiver to drop the packet No error signal sent - recovery via normal retransmission Retransmission – – – Sender retransmits lost or corrupted data Timeout based on estimates of round-trip time (RTT) Fast retransmit algorithm for rapid retransmission 7 Efficient Transport Reliability 8 Automatic Repeat reQuest (ARQ) • Automatic Repeat Request – Receiver sends acknowledgment (ACK) when it receives packet – Sender waits for ACK and times out if does not arrive within some time period Receiver Timeout Sender • Simplest ARQ protocol – Stop and Wait – Send a packet, stop and wait until ACK arrives Time 9 How Fast Can Stop-and-Wait Go? • Suppose we’re sending from UCB to New York: – Bandwidth = 1 Mbps (megabits/sec) – RTT = 100 msec – Maximum Transmission Unit (MTU) = 1500 B = 12,000 b – No other load on the path and no packet loss • What (approximately) is the fastest we can transmit using Stop-and-Wait? • How about if Bandwidth = 1 Gbps? 10 Computation • Latency: 100msec = .1sec • Transmission time of data packet: – 12000bits/(1000000bits/sec) = .012sec • Throughput = 12000bits/.112sec ≈ 110kbits/sec • With linespeed of 1Gbits/sec – Transmission time negligible – Throughput ≈ 120kbits/sec 11 Allowing Multiple Packets in Flight • “In Flight” = “Unacknowledged” • Sender-side issue: how many packets (bytes)? • Receiver-side issue: how much buffer for data that’s “above a sequence hole”? – I.e., data that can’t be delivered since previous data is missing – Assumes service model is in-order delivery (like TCP) 12 Sliding Window • Allow a larger amount of data “in flight” – Allow sender to get ahead of the receiver – … though not too far ahead Sending process TCP Last byte written Last byte ACKed Receiving process TCP Sender Window Last byte can send Last byte read Next byte needed Receiver Window Last byte received 13 Sliding Window, con’t • Both sender & receiver maintain a window that governs amount of data in flight (sender) or notyet-delivered (receiver) • Left edge of window: – Sender: beginning of unacknowledged data – Receiver: beginning of undelivered data • For the sender: – Window size = maximum amount of data in flight o Determines rate o Sender must have at least this much buffer (maybe more) • For the receiver: – Window size = maximum amount of undelivered data o Receiver has this much buffer 14 Sliding Window • For the sender, when receives an acknowledgment for new data, window advances (slides forward) Sending process TCP Last byte written Last byte ACKed Sender Window Last byte can send 15 Sliding Window • For the sender, when receives an acknowledgment for new data, window advances (slides forward) Sending process TCP Last byte written Last byte ACKed Sender Window Last byte can send 16 Sliding Window • For the receiver, as the receiving process consumes data, the window slides forward Receiving process TCP Last byte read Next byte needed Receiver Window Last byte received 17 Sliding Window • For the receiver, as the receiving process consumes data, the window slides forward Receiving process TCP Last byte read Next byte needed Receiver Window Last byte received 18 Sliding Window, con’t • Sender: window advances when new data ack’d • Receiver: window advances as receiving process consumes data • What happens if sender’s window size exceeds the receiver’s window size? • Receiver advertises to the sender where the receiver window currently ends (“righthand edge”) – Sender agrees not to exceed this amount – It makes sure by setting its own window size to a value that can’t send beyond the receiver’s righthand edge 19 Performance with Sliding Window • Given previous UCB New York 1 Mbps path with 100 msec RTT and Sender (and Receiver) window = 100 Kb = 12.5 KB • How fast can we transmit? 20 Computation • Ignoring per-packet transmission time: • Throughput = 100000bits/.1sec ≈ 1Mbps – Links is fully utilized! – “Pipe is filled” • With linespeed of 1Gbits/sec, still 1Mbps • What size window would reach 1Gbps? • Bandwidth-delay product • 1 Gbps * 100 msec = 100 Mb • Note: large window = many packets in flight 21 Summary • IP packet forwarding – Based on longest-prefix match – End systems use subnet mask to determine if traffic destined for their LAN … o In which case they send directly, using ARP to find MAC address – … or for some other network o In which case they send to their local gateway (router) – This info either statically config’d or learned via DHCP • Transport protocols – Multiplexing and demultiplexing via port numbers – UDP gives simple datagram service – TCP gives reliable byte-stream service – Reliability immediately raises performance issues o Stop-and-Wait vs. Sliding Window 22 DNS 23 Host Names vs. IP addresses • Host names –Mnemonic name appreciated by humans –Variable length, full alphabet of characters –Provide little (if any) information about location –Examples: www.cnn.com and bbc.co.uk • IP addresses –Numerical address appreciated by routers –Fixed length, binary number –Hierarchical, related to host location –Examples: 64.236.16.20 and 212.58.224.131 24 Separating Naming and Addressing • Names are easier to remember – www.cnn.com vs. 64.236.16.20 (but not tiny urls) • Addresses can change underneath – Move www.cnn.com to 4.125.91.21 – E.g., renumbering when changing providers • Name could map to multiple IP addresses – www.cnn.com to multiple (8) replicas of the Web site – Enables o Load-balancing o Reducing latency by picking nearby servers o Tailoring content based on requester’s location/identity • Multiple names for the same address – E.g., aliases like www.cnn.com and cnn.com 25 Scalable (Name Address) Mappings • Originally: per-host file –Flat namespace –/etc/hosts –SRI (Menlo Park) kept master copy –Downloaded regularly • Single server doesn’t scale –Traffic implosion (lookups & updates) –Single point of failure –Amazing politics Needed a distributed, hierarchical collection of servers 26 Domain Name System (DNS) • Properties of DNS –Hierarchical name space divided into zones –Zones distributed over collection of DNS servers • Hierarchy of DNS servers –Root (hardwired into other servers) –Top-level domain (TLD) servers –Authoritative DNS servers • Performing the translations –Local DNS servers –Resolver software 27 Distributed Hierarchical Database unnamed root com edu org generic domains bar uk ac zw arpa country domains Top-Level Domains (TLDs) ac west east cam foo my usr my.east.bar.edu usr.cam.ac.uk inaddr 28 DNS Root • Located in Virginia, USA • How do we make the root scale? Verisign, Dulles, VA 29 DNS Root Servers • 13 root servers (see http://www.root-servers.org/) – Labeled A through M • Does this scale? A Verisign, Dulles, VA C Cogent, Herndon, VA D U Maryland College Park, MD G US DoD Vienna, VA K RIPE London H ARL Aberdeen, MD I Autonomica, Stockholm J Verisign E NASA Mt View, CA F Internet Software Consortium Palo Alto, CA M WIDE Tokyo B USC-ISI Marina del Rey, CA L ICANN Los Angeles, CA 30 DNS Root Servers • 13 root servers (see http://www.root-servers.org/) – Labeled A through M • Replication via any-casting (localized routing for addresses) E NASA Mt View, CA F Internet Software Consortium, Palo Alto, CA (and 37 other locations) A Verisign, Dulles, VA C Cogent, Herndon, VA (also Los Angeles, NY, Chicago) D U Maryland College Park, MD G US DoD Vienna, VA K RIPE London (plus 16 other locations) H ARL Aberdeen, MD I Autonomica, Stockholm J Verisign (21 locations) (plus 29 other locations) M WIDE Tokyo plus Seoul, Paris, San Francisco B USC-ISI Marina del Rey, CA L ICANN Los Angeles, CA 31 TLD and Authoritative DNS Servers • Top-level domain (TLD) servers – Generic domains (e.g., com, org, edu) – Country domains (e.g., uk, fr, cn, jp) – Special domains (e.g., arpa) – Typically managed professionally o Network Solutions maintains servers for “com” o Educause maintains servers for “edu” • Authoritative DNS servers – Provide public records for hosts at an organization – For the organization’s servers (e.g., Web and mail) – Can be maintained locally or by a service provider 32 Question • Could we replace DNS with a Google-like infrastructure? 33 Using DNS • Local DNS server (“default name server”) –Usually near the endhosts that use it –Local hosts configured with local server (e.g., /etc/resolv.conf) or learn server via DHCP • Client application –Extract server name (e.g., from the URL) –Do gethostbyname() to trigger resolver code • Server application –Extract client IP address from socket –Optional gethostbyaddr() to translate into name 34 Example root DNS server Host at cis.poly.edu wants IP address for gaia.cs.umass.edu 2 3 TLD DNS server 4 local DNS server dns.poly.edu 5 1 8 requesting host cis.poly.edu 7 6 authoritative DNS server dns.cs.umass.edu gaia.cs.umass.edu 35 Recursive vs. Iterative Queries • Recursive query – Ask server to get answer for you – E.g., request 1 and response 8 • Iterative query – Ask server who to ask next – E.g., all other request-response pairs root DNS server 2 3 TLD DNS server 4 local DNS server dns.poly.edu 5 1 8 requesting host 7 6 authoritative DNS server dns.cs.umass.edu cis.poly.edu 36 Reverse Mapping (Address Host) • How do we go the other direction, from an IP address to the corresponding hostname? • Addresses already have natural “quad” hierarchy: – 12.34.56.78 • But: quad notation has most-sig. hierarchy element on left, while www.cnn.com has it on the right • Idea: reverse the quads = 78.56.34.12 … – … and look that up in the DNS • Under what TLD? – Convention: in-addr.arpa – So lookup is for 78.56.34.12.in-addr.arpa 37 Distributed Hierarchical Database unnamed root com edu org generic domains bar uk ac zw arpa country domains ac inaddr west east cam 12 foo my usr 34 my.east.bar.edu usr.cam.ac.uk 56 38 12.34.56.0/24 DNS Caching • Performing all these queries takes time – And all this before actual communication takes place – E.g., 1-second latency before starting Web download • Caching can greatly reduce overhead – The top-level servers very rarely change – Popular sites (e.g., www.cnn.com) visited often – Local DNS server often has the information cached • How DNS caching works – DNS servers cache responses to queries – Responses include a “time to live” (TTL) field – Server deletes cached entry after TTL expires 39 Negative Caching • Remember things that don’t work – Misspellings like www.cnn.comm and www.cnnn.com – These can take a long time to fail the first time – Good to remember that they don’t work – … so the failure takes less time the next time around • But: negative caching is optional – And not widely implemented 40 DNS Resource Records DNS: distributed DB storing resource records (RR) RR format: (name, • Type=A – name is hostname – value is IP address value, type, ttl) • Type=CNAME – name is alias name for some “canonical” name E.g., www.cs.mit.edu is really • Type=NS – name is domain (e.g. foo.com) – value is hostname of authoritative name server for this domain • Type=PTR – name is reversed IP quads o E.g. 78.56.34.12.in-addr.arpa – value is corresponding hostname eecsweb.mit.edu – value is canonical name • Type=MX – value is name of mailserver associated with name – Also includes a weight/preference 41 DNS Protocol DNS protocol: query and reply messages, both with same message format Message header: • Identification: 16 bit # for query, reply to query uses same # • Flags: – Query or reply – Recursion desired – Recursion available – Reply is authoritative • Plus fields indicating size (0 or more) of optional header elements 16 bits 16 bits Identification Flags # Questions # Answer RRs # Authority RRs # Additional RRs Questions (variable # of resource records) Answers (variable # of resource records) Authority (variable # of resource records) Additional information (variable # of resource records) 42 Reliability • DNS servers are replicated – Name service available if at least one replica is up – Queries can be load-balanced between replicas • Usually, UDP used for queries – Need reliability: must implement this on top of UDP – Spec supports TCP too, but not always implemented • Try alternate servers on timeout – Exponential backoff when retrying same server • Same identifier for all queries – Don’t care which server responds 43 Inserting Resource Records into DNS • Example: just created startup “FooBar” • Get a block of address space from ISP – Say 212.44.9.128/25 • Register foobar.com at Network Solutions (say) – Provide registrar with names and IP addresses of your authoritative name server (primary and secondary) – Registrar inserts RR pairs into the com TLD server: o (foobar.com, dns1.foobar.com, NS) o (dns1.foobar.com, 212.44.9.129, A) • Put in your (authoritative) server dns1.foobar.com: – Type A record for www.foobar.com – Type MX record for foobar.com 44 Setting up foobar.com, con’t • In addition, need to provide reverse PTR bindings – E.g., 212.44.9.129 dns1.foobar.com • Normally, these would go in 9.44.212.in-addr.arpa • Problem: you can’t run the name server for that domain. Why not? – Because your block is 212.44.9.128/25, not 212.44.9.0/24 – And whoever has 212.44.9.0/25 won’t be happy with you owning their PTR records • Solution: ISP runs it for you – Now it’s more of a headache to keep it up-to-date :-( 45 DNS Measurements (MIT data from 2000) • What is being looked up? – ~60% requests for A records – ~25% for PTR records – ~5% for MX records – ~6% for ANY records • How long does it take? – Median ~100msec (but 90th percentile ~500msec) – 80% have no referrals; 99.9% have fewer than four • Query packets per lookup: ~2.4 46 DNS Measurements (MIT data from 2000) • Top 10% of names accounted for ~70% of lookups – Caching should really help! • 9% of lookups are unique – Cache hit rate can never exceed 91% • Cache hit rates ~ 75% – But caching for more than 10 hosts doesn’t add much 47 DNS Measurements (MIT data from 2000) • Does DNS give answers? – ~23% of lookups fail to elicit an answer! – ~13% of lookups result in NXDOMAIN (or similar) o Mostly reverse lookups – Only ~64% of queries are successful! o How come the web seems to work so well? • ~ 63% of DNS packets in unanswered queries! – Failing queries are frequently retransmitted – 99.9% successful queries have ≤2 retransmissions 48 Moral of the Story • If you design a highly resilient system, many things can be going wrong without you noticing it! 49 Security Analysis of DNS • What security issues does the design & operation of the Domain Name System raise? • Degrees of freedom: 16 bits 16 bits Identification Flags # Questions # Answer RRs # Authority RRs # Additional RRs Questions (variable # of resource records) Answers (variable # of resource records) Authority (variable # of resource records) Additional information (variable # of resource records) 50 Security Problem #1: Starbucks • As you sip your latte and surf the Web, how does your laptop find google.com? • Answer: it asks the local name server per Dynamic Host Configuration Protocol (DHCP) … – … which is run by Starbucks or their contractor – … and can return to you any answer they please – … including a “man in the middle” site that forwards your query to Google, gets the reply to forward back to you, yet can change anything they wish in either direction • How can you know you’re getting correct data? – Today, you can’t. (Though if site is HTTPS, that helps) – One day, hopefully: DNSSEC extensions to DNS 51 Security Problem #2: Cache Poisoning • Suppose you are a Bad Guy and you control the name server for foobar.com. You receive a request to resolve www.foobar.com and reply: ;; QUESTION SECTION: ;www.foobar.com. IN A Evidence of the attack disappears 5 seconds later! ;; ANSWER SECTION: www.foobar.com. 300 IN A 212.44.9.144 ;; AUTHORITY SECTION: foobar.com. foobar.com. 600 600 IN IN NS NS dns1.foobar.com. google.com. 5 IN A 212.44.9.155 ;; ADDITIONAL SECTION: google.com. A foobar.com machine, not google.com 52 Cache Poisoning, con’t • Okay, but how do you get the victim to look up www.foobar.com in the first place? • Perhaps you connect to their mail server and send – HELO www.foobar.com – Which their mail server then looks up to see if it corresponds to your source address (anti-spam measure) • Note, with compromised name server we can also lie about PTR records (address name mapping) – E.g., for 212.44.9.155 = 155.44.9.212.in-addr.arpa return google.com (or whitehouse.gov, or whatever) o If our ISP lets us manage those records as we see fit, or we happen to directly manage them 53 Cache Poisoning, con’t • Suppose Bad Guy is at Starbuck’s and they can sniff (or even guess) the identification field the local server will use in 16 bits 16 bits its next request: Identification Flags • They: – Ask local server for a (recursive) lookup of google.com – Locally spoof subsequent reply from correct name server using the identification field – Bogus reply arrives sooner than legit one • Local server duly caches the bogus reply! – Now: every future Starbuck customer is served the bogus answer out of the local server’s cache o In this case, the reply uses a large TTL 54 Summary • Domain Name System (DNS) – Distributed, hierarchical database – Distributed collection of servers – Caching to improve performance • DNS lacks authentication – Can’t tell if reply comes from the correct source – Can’t tell if correct source tells the truth – Malicious source can insert extra (mis)information – Malicious bystander can spoof (mis)information – Playing with caching lifetimes adds extra power to attacks 55