Engineering a Content Delivery Network Bruce Maggs Network Deployment 175000+ Servers Current Installations 1300+ Networks 100+ Countries Akamai Statistics • Peak bit rate: over 33 trillion bps (4-14-2015) • Peak IPv4 HTTP daily requests: over 2.7 trillion (5-13-15) • Peak IPv6 HTTP daily requests: over 50 billion • 812 million unique IPv4 addresses connected to Akamai in Q1 2015 Part I: Services http://www.microsoft.com (DNS) http://www.facebook.com (images) http://windowsupdate.microsoft.com http://www.apple.com/quicktime/whatson http://www.fbi.gov (DDoS protection) Design Themes • Redundancy • Self-assessment • Fail-over at multiple levels • Robust algorithms FirstPoint – DNS (e.g., Microsoft!) • Selects from among several mirror sites operated by content provider Embedded Image Delivery (e.g., Facebook) Embedded URLs are Converted to ARLs <html> <head> <title>Welcome to xyz.com!</title> </head> ak <body> <img src=“ http://www.xyz.com/logos/logo.gif”> <img src=“ http://www.xyz.com/jpgs/background.jpg”> <h1>Welcome to our Web site!</h1> <a href=“page2.html”>Click here to enter</a> </body> </html> Akamai DNS Resolution 4 xyz.com 510.10.123.5 xyz.com’s nameserver akamai.net 8 a212.g.akamai.net 7 6 .com .net Root (Verisign) 9 15.15.125.6 ak.xyz.com select cluster 10 g.akamai.net 20.20.123.55 11 Akamai High-Level DNS Servers 12a212.g.akamai.net 30.30.123.5 Local Name Server End User 16 Browser’s Cache 14 3 1 2 15 OS 13 Akamai Low-Level DNS Servers select servers within cluster Live Streaming Architecture 1 x 2 3 4 Satellite Downlink Satellite Uplink X 1 2 3 4 1 2 3 4 Encoding Entry Point 1 X 2 X 3 X 4 X x1 2 3 4 Top-level reflectors Regions SiteShield (www.fbi.gov) A K A M A I Content provider’s website Hacker! A K A M A I A K A M A I Hacker! Hacker! Part II: Failures 1. Hardware 2. Network 3. Software 4. Configuration 5. Misperceptions 6. Attacks Hardware / Server Failures Linux boxes with large RAM and disk capacity, Windows servers Sample Failures: 1. Memory SIMMS jumping out of their sockets 2. Network cards screwed down but not in slot 3. Etc. Akamai Cluster Servers pool resources •RAM •Disk •Throughput View of Clusters buddy suspended hardware failure odd man out suspended datacenter Network Failures E.g., congestion at public and private peering points, misconfigured routers, inaccessible networks, etc., etc., etc. Core Points X 1 2 3 4 • Core point X is the first router at which all paths to nameservers 1, 2, 3, and 4 intersect. • X can be viewed as the straddling the core and the edge of the network. Core Points 500,000 nameservers reduced to 90,000 core points 7,000 account for 95% end-user load Engineering Methodology •C programming language (gcc). •Reliance on open-source code. •Large distributed testing systems. •Burn-in on “invisible” system. •Staged rollout to production. •Backwards compatibility. Perceived Failures Examples 1. 2. 3. 4. Personal firewalls Reporting tools Customer-side problems Third-party measurements