Towards Street-Level ClientIndependent IP Geolocation Yong Wang, UESTC/Northwestern Daniel Burgener, Northwestern Marcel Flores, Northwestern Aleksandar Kuzmanovic, Northwestern Cheng Huang, Microsoft Research http://networks.cs.northwestern.edu Problem and Motivation How to accurately locate IP addresses on the Internet? Host-dependent solutions: – GPS – WiFi (e.g., Google My Location, Skyhook) Host-independent solutions: – Server cannot always expect clients’ cooperation • Security / access restrictions • Online service access analytics • Location-based online advertising 2 Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation A Scenario of Street-Level Online Advertising User’s location Local Businesses 3 Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Prior Work Constrained Based Geolocation [ToN 06] Median error distance = 228 km – Measure delays from active vantage points Topology Based Geolocation [IMC 06] Median error distance = 67 km – CBG + consider network topological information Octant [NSDI 07] Median error distance = 35.2 km – CBG + consider router’s location, geographical and demographics information 4 Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Methodology Highlights Our methodology is based on two insights – Websites often provide the actual geographical location of associated entities • E.g., universities, businesses, government offices, etc. • Develop methods to determine if web- or e-mail servers reside at the corresponding locations – Relative network delays highly correlate with geographical distances • Absolute network delay measurements are fundamentally limited in their ability to achieve fine-grained geolocation results 5 Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Institutional Network Example Web cloudsourcing 550 South 550 South Hill Street SuiteHill 890,Street Suite 890, Angeles, CA 90013 Los Angeles, CALos 90013 mail server to external network web server router IP subnet 6 Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation The Role of Relative Network Delays Measured delays: < < < 7 Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation A Case Study Target IP address: 38.100.25.196 Target postal address: 1850, K Street NW, Washington DC, DC, 20006 8 Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Three-Tier Geolocation System Tier 1 Goal: Find the coarsegrained region for the targeted IP Measured delays Geographical distances Create intersection 9 Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Three-Tier Geolocation System Tier 2 Goal: Use passive landmarks to determine finer-grained region for the targeted IP Populate the intersection with landmarks Estimate the delay between landmarks and the target D1 + D2 < D3 +D4 Create a new intersection 10 Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Three-Tier Geolocation System Tier 3 Goal: Geolocate the target IP using passive landmarks Select the landmark with the minimum delay to the target, and associate the target’s location with it. 10.6 km vs. 0.103 km Measured distance ∝ Geographical distance 11 Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Remaining Issues Verifying landmarks – Sweep-out most of the erroneous landmarks – Errors are still possible! Resilience to errors – The larger the error – the more resilient our method is – We prove that the likelihood that an erroneous landmark will affect the accuracy is small 12 Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Evaluation Three datasets – Planetlab dataset (Academic) – Collected dataset (Residential) – Online Maps dataset (In the wild) Factors impact the accuracy – Landmark density – Population density – Access networks 13 Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Dataset Characteristics Urban areas Rural areas The three datasets cover both urban areas and rural areas. 14 Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Baseline Results Error distance (km) Planetlab Residential Online Maps The best previous result Median 0.69 2.25 2.11 35.2 Maximum 5.24 8.1 13.2 276.8 15 Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Landmark Density Density sequence: The larger the number of landmarks we can discover in the vicinity of a target, the larger the probability we will be able to more accurately geolocate thePlanetlab targeted IP.> Residential > Online Maps 16 Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation The Role of Population Density The error distance is smallest in densely populated areas The error grows as the population density decreases Middle of “nowhere” Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation 17 The Role of Access Networks 2 km 700 meters Error distance (km) AT&T Comcast Verizon Median 1.68 2.38 1.48 Cable access networks (Comcast) have a much larger latency variance than DSL networks (AT&T and Verizon) 18 Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Conclusions A geolocation system able to geolocate IP addresses with more than an order of magnitude better precision than the best previous method Our methodology consists of two components – Mining landmarks from the Web and using Web or E-mail servers as landmarks – Using relative network distances as opposed to absolute network distances 19 Aleksandar Kuzmanovic Towards Street-Level Client-Independent IP Geolocation Thank You http://networks.cs.northwestern.edu