Internet-Scale Research at Universities Panel Session SAHARA Retreat, Jan 2002 Prof. Randy H. Katz, Bhaskaran Raman, Z. Morley Mao, Yan Chen Problem Statement Internet Destination Source Peering: exchange perf. info. Service cluster: compute cluster capable of running services • Overlay network for service composition • Want to study recovery algorithms • Lots of client sessions • Methodology for evaluation of design? – Simulation? • Slow, does not scale with #nodes, #client sessions • Does not bring out processing bottlenecks – Real testbed? • Cannot be large; setup and management problems • Non-repeatable, not good for controlled design study Our approach so far… • Emulation platform – Real implementation of software, but emulation of n/w parameters – Inspired by NistNET – Developed our own user-level implementation • Gave us better control – Runs on the Millennium cluster of workstations – Central bottleneck: 20,000 pkts/sec App Node 1 Emulator Rule for 12 Rule for 13 Lib Rule for 34 Node 2 Rule for 43 Node 3 Node 4 Parameters modeled • Overlay topology: – Generate 6,510-node physical network using GT-ITM – Choose subset of nodes for overlay network • Latency modeling: – Base latency according to edge weight – Variation in accordance with: RTT spikes are isolated • Outage period: – Using traces – Collected UDP-based measurements across 12 host pairs – Berkeley, Stanford, UNSW (Australia), UIUC, TU-Berlin (Germany), CMU – CDF of outage periods, used to model outage periods My experience in Internet measurement • Goal – collect client-Local DNS server associations – to evaluate DNS-based server selection • Built a measurement infrastructure • Three components – 1x1 pixel embedded transparent GIF image • <img src=http://xxx.rd.example.com/tr.gif height=1 width=1> – A specialized authoritative DNS server • Allows hostnames to be wild-carded – An HTTP redirector • Always responds with “302 Moved Temporarily” • Redirect to a URL with client IP address embedded My experience in Internet measurement 1. HTTP GET request for the image Client [10.0.0.1] 2. HTTP redirect to IP10-0-0-1.cs.example.com Redirector for xxx.rd.example.com Content server for the image 4. Request to resolve IP10-0-0-1.cs.example.com Local DNS server 5. Reply: IP address of content server Name server for *.cs.example.com My lessons • Common myths about Internet measurements – Measurements done from University sites are representative of the Internet – The following are good proximity metrics: • AS hop count • Router hop count – I can just quote some measurement results from previous papers • W/o carefully considering its applicability • A scalable measurement methodology helps ease of adoption Content Distribution Network (CDN) Dynamic clustering for efficient Web contents replication Network Topology: Use greedy algorithm for replica placement to reduce the response latency of end users Trace-driven simulation to find optimal granularity of replication Pure-random & transit-Stub models from GT-ITM A real AS-level topology from 7 widely-dispersed BGP peers Real world traces: Web Site Period Duration Total Requests Requests/day MSNBC 8-10/1999 10–11am 10,284,735 1,469,248 (1 hr) NASA 7/1995 All day 3,461,612 56,748 WorldCup 5-7/1998 All day 1,352,804,107 15,372,774 -- Cluster MSNBC Web clients with BGP prefix - BGP tables from a BBNPlanet router on 01/24/2001 - 10K clusters left, chooses top 10% covering >70% of requests -- Cluster NASA Web clients with domain names Wide-area Network Distance Estimation • Problem formulation: Given N end hosts that belong to different administrative domains, how to select a subset of them to be probes and build an overlay distance estimation service without knowing the underlying topology? • Solution: Internet Iso-bar – Cluster of hosts that perceive similar performance to Internet & select a monitor for each cluster for active and continuous probing – Clustering with congestion/path outage correlation – Evaluate the prediction accuracy and stability • Evaluation Methodology (I) – NLANR AMP data set • 119 sites on US (106 after filtering out most off sites) • Traceroute between every pair of hosts every minute • Clustering uses daily geometric mean of round-trip time (RTT) • Raw data: 6/24/00 – 12/3/01 Evaluation Methodology (II) • Keynote Website Perspective benchmarking – Measure Web site performance from more than 100 agents – Heterogeneous core network: various ISPs – Heterogeneous access network: • Dial up 56K, DSL and high-bandwidth business connections – Agents locations • • • • America (including Canada, Mexico): 67 agents in 29 cities from 15 ISPs Europe: 25 agents in 12 cities from 16 ISPs Asia: 8 agents in 6 cities from 8 ISPs Australia: 3 agents in 3 cities from 3 ISPs – 40 most popular Web servers for benchmarking • Side problem: how to reduce the number of agents and/or servers, but still represent the majority of end-user performance for reasonable long period? Discussion: Difficulties of Internet measurement • Results vary greatly depending on your measurement methodology – The number and identity of sites you measure • Commercial vs. educational sites – Your measurement location • Well-connected site vs. dialup site • Backbone vs. access network, server vs. client – Time when measurement is taken • Time of day, day of year • Transient effects – E.g., Network congestion, flash crowd – Frequency of measurements (for correlation studies) – Intrusiveness of the measurement • Does the measurement affect what you are measuring Discussion: Issues with Emulation • Emulation platform: modeling correlations in n/w behavior – What happens in one part of the Internet may have nonzero correlation with behavior of another part • Scale of topology – We have O(100) machines in department – O(1500) machines on campus – Is this believable?