Towards Street-Level Client-Independent IP Geolocation

advertisement
Towards Street-Level ClientIndependent IP Geolocation
Yong Wang,
UESTC/Northwestern
Daniel Burgener,
Northwestern
Marcel Flores,
Northwestern
Aleksandar Kuzmanovic, Northwestern
Cheng Huang,
Microsoft Research
http://networks.cs.northwestern.edu
Problem and Motivation
How to accurately locate IP addresses on the
Internet?
Host-dependent solutions:
– GPS
– WiFi (e.g., Google My Location, Skyhook)
Host-independent solutions:
– Server cannot always expect clients’ cooperation
• Security / access restrictions
• Online service access analytics
• Location-based online advertising
2
Aleksandar Kuzmanovic
Towards Street-Level Client-Independent IP Geolocation
A Scenario of Street-Level Online Advertising
User’s location
Local Businesses
3
Aleksandar Kuzmanovic
Towards Street-Level Client-Independent IP Geolocation
Prior Work
Constrained Based Geolocation [ToN 06]
Median error distance = 228 km
– Measure delays from active vantage points
Topology Based Geolocation [IMC 06]
Median error distance = 67 km
– CBG + consider network topological information
Octant [NSDI 07]
Median error distance = 35.2 km
– CBG + consider router’s location, geographical and
demographics information
4
Aleksandar Kuzmanovic
Towards Street-Level Client-Independent IP Geolocation
Methodology Highlights
Our methodology is based on two insights
– Websites often provide the actual geographical
location of associated entities
• E.g., universities, businesses, government offices, etc.
• Develop methods to determine if web- or e-mail servers
reside at the corresponding locations
– Relative network delays highly correlate with
geographical distances
• Absolute network delay measurements are fundamentally
limited in their ability to achieve fine-grained geolocation
results
5
Aleksandar Kuzmanovic
Towards Street-Level Client-Independent IP Geolocation
Institutional Network Example
Web
cloudsourcing
550 South
550 South Hill Street
SuiteHill
890,Street Suite 890,
Angeles, CA 90013
Los Angeles, CALos
90013
mail server
to external
network
web server
router
IP subnet
6
Aleksandar Kuzmanovic
Towards Street-Level Client-Independent IP Geolocation
The Role of Relative Network Delays
Measured delays:
<
<
<
7
Aleksandar Kuzmanovic
Towards Street-Level Client-Independent IP Geolocation
A Case Study
Target IP address: 38.100.25.196
Target postal address: 1850, K Street NW,
Washington DC, DC, 20006
8
Aleksandar Kuzmanovic
Towards Street-Level Client-Independent IP Geolocation
Three-Tier Geolocation System
Tier 1
Goal: Find the coarsegrained region for the
targeted IP
Measured delays
Geographical
distances
Create intersection
9
Aleksandar Kuzmanovic
Towards Street-Level Client-Independent IP Geolocation
Three-Tier Geolocation System
Tier 2
Goal: Use passive
landmarks to determine
finer-grained region
for the targeted IP
Populate the intersection
with landmarks
Estimate the delay
between landmarks
and the target
D1 + D2 < D3 +D4
Create a new
intersection
10
Aleksandar Kuzmanovic
Towards Street-Level Client-Independent IP Geolocation
Three-Tier Geolocation System
Tier 3
Goal: Geolocate
the target IP using
passive landmarks
Select the landmark
with the minimum delay
to the target, and
associate the target’s
location with it.
10.6 km vs. 0.103 km
Measured distance ∝ Geographical distance
11
Aleksandar Kuzmanovic
Towards Street-Level Client-Independent IP Geolocation
Remaining Issues
Verifying landmarks
– Sweep-out most of the erroneous landmarks
– Errors are still possible!
Resilience to errors
– The larger the error – the more resilient our method
is
– We prove that the likelihood that an erroneous
landmark will affect the accuracy is small
12
Aleksandar Kuzmanovic
Towards Street-Level Client-Independent IP Geolocation
Evaluation
Three datasets
– Planetlab dataset (Academic)
– Collected dataset (Residential)
– Online Maps dataset (In the wild)
Factors impact the accuracy
– Landmark density
– Population density
– Access networks
13
Aleksandar Kuzmanovic
Towards Street-Level Client-Independent IP Geolocation
Dataset Characteristics
Urban areas
Rural areas
The three datasets cover both urban areas and rural areas.
14
Aleksandar Kuzmanovic
Towards Street-Level Client-Independent IP Geolocation
Baseline Results
Error distance (km)
Planetlab
Residential
Online
Maps
The best previous
result
Median
0.69
2.25
2.11
35.2
Maximum
5.24
8.1
13.2
276.8
15
Aleksandar Kuzmanovic
Towards Street-Level Client-Independent IP Geolocation
Landmark Density
Density sequence:
The larger the number of landmarks we can discover in the vicinity of
a target, the larger the probability we will be able to more accurately
geolocate thePlanetlab
targeted IP.> Residential > Online Maps
16
Aleksandar Kuzmanovic
Towards Street-Level Client-Independent IP Geolocation
The Role of Population Density
The error distance is smallest in densely populated areas
The error grows as the population density decreases
Middle of “nowhere”
Aleksandar Kuzmanovic
Towards Street-Level Client-Independent IP Geolocation
17
The Role of Access Networks
2 km
700 meters
Error distance (km)
AT&T
Comcast
Verizon
Median
1.68
2.38
1.48
Cable access networks (Comcast) have a much larger
latency variance than DSL networks (AT&T and Verizon)
18
Aleksandar Kuzmanovic
Towards Street-Level Client-Independent IP Geolocation
Conclusions
A geolocation system able to geolocate IP
addresses with more than an order of
magnitude better precision than the best
previous method
Our methodology consists of two components
– Mining landmarks from the Web and using Web or
E-mail servers as landmarks
– Using relative network distances as opposed to
absolute network distances
19
Aleksandar Kuzmanovic
Towards Street-Level Client-Independent IP Geolocation
Thank You
http://networks.cs.northwestern.edu
Download