Naming Jennifer Rexford Advanced Computer Networks http://www.cs.princeton.edu/courses/archive/fall08/cos561/ Tuesdays/Thursdays 1:30pm-2:50pm Goals of Today’s Lecture • Names in the Internet • Domain Name System (DNS) – DNS server hierarchy – DNS queries and responses – DNS caching – Improving DNS reliability • DNS security vulnerabilities – DNS cache poisoning and home-network attacks • Use of DNS for (Web) server load balancing • Beyond today’s naming and DNS Names Names in the Internet • What gets named? – Hosts, especially servers – E.g., www.cnn.com or ftp.cs.princeton.edu • What format do names have? – Human-readable for ease of remembering – Decentralized, hierarchical allocation of names • Why are names separate from addresses? – Names are easier (for humans) to remember – Allows IP addresses to change over time – Allows many-to-one and one-to-many mapping Names in the Internet • When are names translated to addresses? – Before IP-level communication begins – To learn the IP address of the remote end-point • Who requests the translation? – The end-host initiating the communication • Can addresses be translated back to names? – Yes, useful for access control, customization of content, interpreting measurement data, etc. – Though not always necessary or possible Domain Name System (DNS) Proposed in 1983 by Paul Mockapetris Key Concepts Underlying DNS • Indirection – Use of names in place of addresses – Queries from local servers rather than end hosts • Hierarchy – For scalability • Many servers to handle the large load of queries – For decentralized control • Of assigning unique names • Of deploying and running DNS servers • Caching – Of information from each level in the hierarchy – On behalf of variety of users at an organization Variable-Depth Tree unnamed root com edu org generic domains bar uk ac zw arpa country domains ac inaddr west east cam 12 foo my usr 34 my.east.bar.edu usr.cam.ac.uk 56 12.34.56.0/24 DNS Root Servers • 13 root servers (see http://www.root-servers.org) – Labeled A through M E NASA Mt View, CA F Internet Software C. Palo Alto, CA (and 17 other locations) A Verisign, Dulles, VA C Cogent, Herndon, VA (also Los Angeles) D U Maryland College Park, MD G US DoD Vienna, VA K RIPE London (also Amsterdam, Frankfurt) H ARL Aberdeen, MD J Verisign (11 locations) I Autonomica, Stockholm (plus 3 other locations) B USC-ISI Marina del Rey, CA L ICANN Los Angeles, CA m WIDE Tokyo TLD and Authoritative DNS Servers • Top-level domain (TLD) servers – Generic domains (e.g., com, org, edu) – Country domains (e.g., uk, fr, ca, jp) – Typically managed professionally • Network Solutions maintains servers for “com” • Educause maintains servers for “edu” • Authoritative DNS servers – Provide public records for hosts at an organization – For organization’s servers (e.g., Web and mail) – Can be maintained locally or by a service provider Local DNS Server and End-Host Resolver • Local DNS server (“default name server”) – Usually near the end hosts who use it – Local hosts configured with local server (e.g., /etc/resolv.conf) or learn via DHCP • End-host resolver – Triggered by application making system call – E.g., gethostbyname() or gethostbyaddr() – Sends query to the local DNS server Example Host at cis.poly.edu wants IP address for gaia.cs.umass.edu root DNS server 2 3 TLD DNS server 4 local DNS server dns.poly.edu 5 1 8 requesting host cis.poly.edu 7 6 authoritative DNS server dns.cs.umass.edu gaia.cs.umass.edu Recursive vs. Iterative Queries • Recursive query – Ask server to get answer for you – E.g., request 1 and response 8 local DNS server • Iterative query – Ask server who to ask next – E.g., all other request-response pairs root DNS server 2 3 TLD DNS server 4 dns.poly.edu 5 1 8 requesting host cis.poly.edu 7 6 authoritative DNS server dns.cs.umass.edu DNS Caching • Performing all these queries take time – All before the actual communication takes place – E.g., 1 sec latency before starting Web download • Caching can substantially reduce overhead – The top-level servers very rarely change – Popular sites (e.g., www.cnn.com) visited often – Local DNS server often has the information cached • How DNS caching works – DNS servers cache responses to queries – Responses include a “time to live” (TTL) field – Server deletes the cached entry after TTL expires Negative Caching • Remember things that don’t work – Misspellings like www.cnn.comm and www.cnnn.com – These can take a long time to fail the first time – Good to remember that they don’t work • Benefits of negative caching – Reduce time to respond the next time – Avoid placing high load on other DNS servers DNS Resource Records (RRs) • Distributed database storing resource records RR format: (name, • Type=A – name is hostname – value is IP address • Type=NS – name is domain (e.g. foo.com) – value is hostname of authoritative name server for this domain value, type, ttl) • Type=CNAME – name is alias name for some “canonical” (the real) name – www.ibm.com is really east.backup2.ibm.com • Type=MX – value is name of the mail server associated with name Inserting Resource Records into DNS • Example: just created startup “FooBar” • Register foobar.com at Network Solutions – Provide registrar with names and addresses of your authoritative name server (primary and secondary) – Registrar inserts two RRs into the com TLD server: • (foobar.com, dns1.foobar.com, NS) • (dns1.foobar.com, 212.212.212.1, A) • Put in authoritative server dns1.foobar.com – Type A record for www.foobar.com – Type MX record for foobar.com DNS Protocol DNS protocol : query and reply messages, both with same message format Message header • Identification: 16 bit # for query, reply to query uses same # • Flags: – Query or reply – Recursion desired – Recursion available – Reply is authoritative Reliability • DNS servers are replicated – Name service available if at least one replica is up – Queries can be load balanced between replicas • UDP used for queries – Need reliability: must implement this on top of UDP • Try alternate servers on timeout – Exponential backoff when retrying same server • Same identifier for all queries – Don’t care which server responds Reliability: IP Anycast • Multiple replicas with same IP address – Replicas located at multiple geographic locations – Routing system directs query to “closest” replica • Used especially for the root DNS servers – Can add more servers and locations without adding new IP addresses for the root servers Root server 1.2.3.0/24 1.2.3.4 Root server 1.2.3.4 Bogus Queries at Root Server (Wessels03 Paper) • Many kinds of bogus queries – Undefined DNS query types – Name-to-address queries on IP addresses – Unknown TLD (e.g., “.elvis”) or ill-formed address (e.g., “209.17.66.80.196.200.64.in-addr.arpa”) – Queries on private IP addresses (e.g., 10.0.0.0) – Repeated queries (e.g., retransmissions due to packet filters dropping the DNS responses) • Less than 2% of queries were legitimate! DNS Security DNS Cache Poisoning • Suppose an attacker owns sub.example.com – And wants to control wikipedia.org’s domain • Receives a legitimate request for the address records of sub.example.com – “sub.example.com IN A” • Redirects to target domain & assigns address – “example.com. 3600 IN NS ns.wikipedia.org” – “ns.wikipedia.org IN A w.x.y.z” (a glue record) • Vulnerable server caches additional A record – Now attacker, who controls w.x.y.z can resolve queries for the entire wikipedia.org domain DNS Cache Poisoning (Continued) • DNS forgery is another approach – Beating the real answer to a recursive DNS query • DNS server tries to resolve www.wikipedia.org – Attacker sends a forged response – Challenging: needs to match 16-bit ID and port # • Overcoming the challenges – Some servers increment the id and use fixed port – Some servers accept queries from anyone • So attacker can send *queries* to the server for www.wikipedia.org to trigger the server to make more queries of its own Preventing DNS Cache Poisoning • Making DNS servers less trusting – Ignore records not directly relevant to the query • Making it harder to guess query id – Using cryptographically secure random numbers – (Some early servers use bad random number generators) • Disallowing DNS queries from outsider – Filtering DNS queries based on source IP address • Ensuring the authenticity of the data – DNSSEC using digital certificates (not widely deployed) • Ensuring the right transport or application connection to avoid talking to wrong endpoint – Using HTTPs or SSH with digital certificates Recent DNS Attack • Poisoning authoritative records – For the entire domain (e.g., bankofsteve.com) – Rather than an individual address • Even against well-protected servers – E.g., those that randomize the query id – By sending many, many requests to the server • Need to make sure query results aren’t cached – Send many queries for random domain names – E.g., www12345678.bankofsteve.com – Attack can be successful within (say) 10 seconds • The patch: randomize source port number, too http://unixwiz.net/techtips/iguide-kaminsky-dns-vuln.html DNS Attacks on Edge Networks • Many end hosts check local “hosts” file first – … before sending queries to local DNS server • Malware can add entries to this file – … to direct certain domains to different addresses • Many home networks have a local DNS server – … running on a local network router • Attacker can compromise the router – … and reconfigure the next DNS server – … or completely overwrite the firmware DNS-Based Load Balancing Directing Web Clients to Replicas: Different Names • Simple approach: different names – www1.cnn.com, www2.cnn.com, www3.cnn.com – But, this requires users to select specific replicas Web server www1 Web server www2 Directing Web Clients to Replicas: Different Addresses • More elegant approach: different IP addresses – Single name (e.g., www.cnn.com), multiple addresses – E.g., 64.236.16.20, 64.236.16.52, 64.236.16.84, … • Authoritative DNS server returns many addresses – And the local DNS server selects one address – Authoritative server may vary the order of addresses Web server 1.2.3.4 Web server 5.6.7.8 Directing Web Clients to Replicas: Finer Control • Web sites need greater flexibility – For load balancing over the Web server replicas – Directing Web clients to the closest server – Directing clients to customized version of content • Different DNS responses to different queries! Web server 1.2.3.4 Web server 5.6.7.8 Challenges of Fine-Grain Control • Frequent modification to DNS records – To exercise fine-grain control – To remove IP addresses for failed replicas • Inferring the Web client location – Based on the IP address of local DNS server – And mapping to topological or geographic location • Caching of query results at the local DNS server – Sending the same cached result to many users • Even setting small TTL is not fully effective – Many Web browsers cache the resolved address – And smaller TTLs add latency and DNS server load • Load balancing at machine level, not Web object Beyond Today’s Naming and DNS Problems with DNS and Naming/Addressing • Many levels of look-up is slow – Sometimes > 1 sec when all queries miss in cache • Cache expiry is clumsy – Low TTL leads to poor scaling and higher delays – High TTL leads to slow failover and poor control • Operates at the level of host names/addresses – Yet many apps (like CDNs) care about objects • Increasingly an address is not a host anyway – Multiple servers (anycast), front-end for a load balancer, NAT box, … Questions • Is hierarchical allocation necessary? – E.g., to ensure uniqueness? • Is hierarchical lookup necessary? – E.g., for scalability? • Are mnemonic names necessary? – E.g., for human readability? • Should the name correspond to a host? – E.g., rather than to an object? • Should the lookup map to a machine address? – E.g., rather than to a direction to follow?