Updating and Efficient Data View for Spatial Region by Geolocation Mapping Meiappane.A

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 5 - Apr 2014
Updating and Efficient Data View for Spatial Region by Geolocation
Mapping
Meiappane.A(1),Erwin Alexander.E(2), Prasath.K .A(3),
Department of Information Technology
Manakula Vinayagar Institute of Technology
Puducherry
Abstract
The Global Positioning System plays a vital role in
mapping any part or any place in the world. In spite of
these services there is flaws, in which some of the
locations are not identified which are moderately
connected to the spatial regions. And the mapping
process is inefficient in the existing system. In our
project, we implement a system using ip geolocation
which displays each and every location in an efficient
manner. This system eradicates the drawbacks of the
existing system by providing instant updation about the
location and easier data view and representation
through chart analysis which provides a pictorial
representation. It also has the advantage of taking less
time and provides the latitude and longitude view of the
location. It measures the right path for the user and get
the data easily.
Keywords: Geolocation, network topology, delay
measurement.
I. Introduction
Many applications will benefit from or be enabled by
knowing the geographical locations (or geolocations) of
Internet hosts. Such locality-aware applications include
local weather forecast, the choice of language to
display on web pages, targeted advertisement, page hit
account in different places, restricted content delivery
according to local policies, etc. Locality-aware peer
selection will also help P2P applications in bringing
better user experience as well as reducing networking
traffic. Traditional IP-geolocation mapping schemes,
are primarily delay-measurement based. In these
schemes, there are a number of landmarks with known
geolocations. The delays from a targeted client to the
landmarks are measured, and the targeted client is
mapped to a geolocation inferred from the measured
delays. However, most of the schemes are based on the
assumption of a linear correlation between networking
delay and the physical distance between targeted client
and landmark.
The strong correlation has been verified in some
regions of the Internet, such as North America and
ISSN: 2231-5381
estern Europe. But as pointed out in the literature, the
Internet connectivity around the world is very complex,
and such strong correlation may not hold for the
Internet everywhere. In this paper, we investigate the
delay-distance relationship in a particular large region
of the Internet (China), where the Internet connectivity
is moderate. The data set contains hundreds of
thousands of (delay, distance) pairs collected from
thousands of widely spread hosts. We have two
observations from the data set. First, the linearity
between the delay and distance in this region of Internet
is positive but very weak. Second, with high probability
the shortest delay comes from the closest distance, and
we call this phenomenon the “closest-shortest” rule.
Based on the observations, we develop a simple yet
novel IP-geolocation mapping scheme for moderately
connected Internet regions, called GeoGet. In GeoGet,
we map the targeted client to the geolocation of the
landmark that has the shortest delay. We take a large
number of web servers with wide coverage and known
geolocations as passive landmarks, which eliminates
the deploying cost of active landmarks. We implement
GeoGet in the moderately connected Internet region we
study (China). In the implementation, we collect a large
number of web servers and choose about 40,000 of
them as passive landmarks, whose geolocations can be
accurately obtained. The passive landmarks cover the
entire region we are interested. We deploy a
coordination server in combination of a website
providing video on demand (VOD) service, and attract
more than 5,000 clients from diverse geolocations to
visit and participate during our measurement interval.
The evaluation results show that when probing about
100 landmarks, GeoGet accurately maps 35.4 percent
targeted clients to city level, which outperforms
existing schemes such as GeoLim and GeoPing by 270
and 239 percent, respectively, and the median error
distance in terms of city in GeoGet is around 120 km,
outperforming GeoLim and GeoPing by about 37 and
70 percent, respectively.
The contributions of this paper are twofold.
First, by studying a large data set, we show that most of
the traditional IP-Geolocation mapping schemes cannot
work well for moderately connected Internet regions,
since the linear delay-distance correlation is weak in
http://www.ijettjournal.org
Page 248
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 5 - Apr 2014
this kind of Internet regions. Second, based on the
measurement results (MR), we develop and implement
GeoGet, which uses the closest-shortest rule and works
much better than traditional schemes in moderately
connected Internet regions. We acknowledge that we
are not the first to apply the closest shortest rule and the
mapping accuracy of GeoGet is still not very high.
However, we go a large step toward developing a better
IP-Geolocation system for moderately connected
Internet regions. We believe the accuracy will improve
significantly if probing more landmarks.
i. Review of Related Works
Traditional IP-geolocation mapping schemes are
primarily delay-measurement based. In these schemes,
there are a number of landmarks with known
geolocations. The delays from a targeted client to the
landmarks are measured, and the targeted client is
mapped to a geolocation inferred from the measured
delays. However, most of the schemes are based on the
assumption of a linear correlation between networking
delay and the physical distance between targeted client
and landmark. The strong correlation has been verified
in some regions of the Internet, such as North America
and Western Europe. But as pointed out in the
literature, the Internet connectivity around the world is
very complex, and such strong correlation may not hold
for the Internet everywhere.
II. Proposed System
In this paper, we investigate the delay-distance
relationship in a particular large region of the Internet
(China), where the Internet connectivity is moderate.
The data set contains hundreds of thousands of (delay,
distance) pairs collected from thousands of widely
spread hosts. We have two observations from the data
set. First, the linearity between the delay and distance
in this region of Internet is positive but very weak.
Second, with high probability the shortest delay comes
from the closest distance, and we call this phenomenon
the “closest-shortest” rule.
Based on the observations, we develop a simple yet
novel IP-geolocation mapping scheme for moderately
connected Internet regions, called Geo Get. In Geo Get,
we map the targeted client to the geolocation of the
landmark that has the shortest delay. We take a large
number of web servers with wide coverage and known
geolocations as passive landmarks, which eliminates
the deploying cost of active landmarks. We further use
JavaScript at targeted clients to generate HTTP/Get
probing for delay measurement, eliminating the need to
install client-side software. To control the measurement
cost, we step-by-step refine the geolocation of a
targeted client, down to city level. In practice, GeoGet
can be deployed in combination with a certain locality-
ISSN: 2231-5381
aware application such that the application can easily
obtain the geolocations of their clients
Fig(1). System Architecture of ip geolocation
i. Web server Landmarks
To collect web server landmarks, we have
crawled a huge number of web servers in China and got
their IP addresses. Then, we check the geolocations of
the web servers by multiple IP-geolocation mapping
databases. Only those web servers whose geolocations
are agreed by all the databases are chosen as passive
landmarks in our system. Therefore, the error rate of
the geolocations of the web server landmarks is neglect
able. Finally, we get 43,973 web servers as passive
landmarks. The web servers cover 336 cities, out of a
total number of 346 cities in China. The coverage ratio
is 97.1 percent. Considering that the Internet popularity
in China is uneven, there may be a few clients coming
from cities where we cannot find any webserver. But
overall, the coverage ratio is adequate for most localityaware applications.
.
ii. Coordination Server
The coordination server is the core part of our
system as it guides the targeted clients to perform all
the measurement tasks, and process the measurement
results. It consists of seven main components: three
databases as well as four engines.
Landmark database
LM Database stores the information of the
web server landmarks we use, including their IP
addresses, geolocations as well as some status
information, such as TIMEOUT errors reported by
clients.
http://www.ijettjournal.org
Page 249
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 5 - Apr 2014
Measurement result database
MR Database stores both the area-level and
city-level measurement results, including the IP
addresses of the targeted clients and the corresponding
landmarks probed, as well as the measured delays. All
the measurement results are indexed using the
corresponding 24 IP segment so that the measurement
progress of a given/24 IP segment can be quickly
checked.
Location result (LR) database
LR Database stores the geographical mapping
results of targeted clients, including the IP addresses of
targeted clients and the corresponding mapping cities.
Landmark selection engine
LMS Engine implements our landmark
selection algorithm described in Algorithm 1. Given a
targeted client, it first queries LM Database to get a set
of landmarks based on our landmark selection
algorithm, then queries MR Database to check the
measurement progress of the targeted client, and finally
sends the un probed landmarks to the targeted client for
delay measurement.
iv. Location based service I
Location Provider provisions a client device
with the device’s location. Client provides location to
LBS Provider LBS Provider renders a service (map,
nearby coffee, etc.)
An LBS requires five basic components: the service
provider's software application, a mobile network to
transmit data and requests for service, a content
provider to supply the end user with geo-specific
information, a positioning component (see GPS) and
the end user's mobile device. By law, location-based
services must be permission-based. That means that the
end user must opt-in to the service in order to use it. In
most cases, this means installing the LBS application
and accepting a request to allow the service to know the
device's location.
Measurement result processing (MRP) engine
MRP Engine is responsible for processing the
measurement results from targeted clients. It updates
the MR Database using the received measurement
delays and if a TIMEOUT error is reported, it also
updates the LM Database to record the event with the
corresponding landmark reported by the targeted client.
Fig(2). Internet Geolocation and Location-Based Services I
MAP (IP-geolocation mapping) engine.
If a targeted client finishes the city-level
probing, MAP Engine checks the corresponding
landmark information and delay information stored in
MR database, uses the closest-shortest rule to map the
client to a city, and adds the mapping result to LR
Databases
iii. Location based service
Location-based services (LBS) are a general
class of computer program-level services used to
include specific controls for location and time data as
control features in computer programs. As such LBS is
an information service and has a number of uses in
social networking today as an entertainment service,
which is accessible with mobile devices through the
mobile network and which uses information on the
geographical position of the mobile device.
ISSN: 2231-5381
To begin with, for LBS to be operational on a large
scale, mapping under the geographical information
system needs to be more comprehensive than it is
today. This raises significant challenges in India for
improving the breadth and the depth of the existing
coverage of GIS. In India GIS mapping is at a
rudimentary stage and is mainly available in the major
cities. Efforts are on to improve the coverage a
mapping company, has a fleet of mobile mapping vans
in Singapore and Taipei that roam the streets with
sensors and cameras, and geoposition the information
onto maps.
v. Location based service II
A location based service is a service based on
the geographical positions of mobile handheld devices
such as smart phones and tablet computers. Two of the
LBS examples are finding a nearby ethnic restaurant
and comparing prices of a product from different stores.
http://www.ijettjournal.org
Page 250
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 5 - Apr 2014
Fig(3). Internet Geolocation and Location-Based Services
Though location based services are popular,
most developers are not familiar with their
development because it used to be complicated and
difficult. However, since the introduction of HTML5
Geolocation and Go Maps APIs, LBS development
becomes a kind of Web development and many IT
developers are already familiar with Web development.
This research builds a simple LBS application by using
HTML5 Geolocation and Google Maps APIs.
III. Constraint Based Geolocation
CBG establishes a dynamic relationship
between IP addresses and geographic location. This
dynamic relationship results from a measurement-based
approach where landmarks cooperate in a distributed
and self-calibration manner, allowing CBG to adapt
itself to time-varying network conditions. This
contrasts with most previous work that relies on a static
relationship.
i. Multilateration with geographic distance
Constraints
The physical position of a given point can be
estimated using a sufficient number of distances or
angle measurements to some fixed points whose
positions are known. When dealing with distances, this
process is called multilateration. Similarly, when
dealing with angles, it is called multi triangulation.
Strictly speaking, triangulation refers to an angle based
position estimation process with three reference points.
However, quite often the same term is adopted for any
distance or angle-based position estimation. In spite of
the popularity of the term triangulation, we adopt the
more precise term multilateration through the paper.
The main problem that stems from using
ISSN: 2231-5381
multilateration is the accurate measurement of the
distances between the target point to be located and the
reference points. For example, the Global Positioning
System (GPS) uses multilateration to three satellites to
estimate the position of a given GPS receiver. In the
case of GPS, the distance between the GPS receiver
and a satellite is measured by timing how long it takes
for a signal sent from the satellite to arrive at the GPS
receiver. Precise measurement of time and time interval
is at the heart of GPS accuracy. In contrast to GPS, it is
a challenging problem to transform Internet delay
measurements to geographic distances accurately. This
is likely to be the reason why direct multilateration has
remained so far unexploited for the purposes of
geolocating Internet hosts. Hereafter, we explain the
CBG design principles that enable the multilateration
with geographic distance constraints.
ii. From delay measurements to distance
constraints.
Before we introduce how CBG converts from
delay measurements to geographic distance constraints,
let us first observe a sample scatter plot relating
geographic distance and network delay. Based on these
considerations, we propose a novel approach to
establish a dynamic relationship between network delay
and geographic distance. In order to illustrate this
approach, suppose the existence of great-circle paths
between the landmark Li and each one of the remaining
landmarks. Further, consider also that, when travelling
on these great circle paths, data are only subject to the
propagation delay of the communication medium.
iii. Single-point localization
Geo proposes three different techniques for
geolocalization, called GeoPing, GeoTrack and Geo
Cluster. Geo Ping maps the target node to the landmark
node that exhibits the closest latency characteristics,
based on a metric for similarity of network signatures.
Topology-based Geolocation (TBG) uses the maximum
transmission speed of packets in fiber to conservatively
determine the convex region where the target lies from
network latencies between the landmarks and the
target. It additionally uses inter-router latencies on the
landmarks to target network paths to find a physical
placement of the routers and target that minimizes
inconsistencies with the network latencies.
http://www.ijettjournal.org
Page 251
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 5 - Apr 2014
IV. Delay-Distance Relationship
Fig(4). Location representation in Octant
TBG relies on a global optimization that minimizes
average position error for the routers and target. This
process can introduce error in the target position in an
effort to reduce errors on the location of the
intermediate routers. Octant differs from TBG by
providing a geometric solution technique rather than
one based on global optimization. This enables Octant
to perform geolocalization in near real-time, where
TBG requires significantly more computational time
and resources.
iv. Client-dependent IP geolocation systems
GPS-based geolocation Global Positioning
System (GPS) devices, that have been embedded into
billions of mobile phones and computers at nowadays,
could precisely provide user’s location. However, GPS
technology differs from our geolocation strategy in the
sense that it is a ’client-side’ geolocation approach,
which means that the server does not know where the
user is, unless the user explicitly reports his
information back to the server. Cell tower and Wi-Fi based geolocation. Google My Location and Skyhook
introduced their cell tower-based and Wi-Fi -based
geolocation approaches. In particular, the cell towerbased geolocation offers users estimated locations by
triangulating from cell towers surrounding users, while
the Wi-Fi-based geolocation uses Wi-Fi access point
information instead of cell towers. Specifically, every
tower or Wi-Fi access point has a unique identification
and footprint. To find a user’s approximate location,
such methods calculate user’s position relative to the
unique identifications and footprints of nearby cell
towers or Wi-Fi access points. However, there are
many devices (IPs) bound with wired network on the
Internet. Such wireless geolocation methods are
necessarily incapable of geolocating these IPs, while
our method does not require any precondition on the
end devices and IPs..
ISSN: 2231-5381
There are many researches and discussions on
delay distance relationship. To find that delay-distance
correlation is strong within North America and within
Western Europe, but weak for the entire Internet as a
whole. As presented in the section above, most
previous work on IP-geolocation mapping is based on
the assumption of a strong delay-distance correlation.
In this section, we investigate the delay-distance
relationship from a large data set collected in a
particular region of the Internet, China, which is the
world’s largest country in terms of the number of
Internet users and the second largest in terms of the size
of IP address space. To be consistent with prior work,
we use round-trip delay (RT Delay) as the delay
measurement in this paper.
i. Data Set
Our data set is composed of (delay, distance)
pairs collected between 240 probing hosts and 6,000
webserver landmarks. Each (delay, distance) pair is
unique for a (probing host, landmark) pair. The probing
hosts and landmarks come from diverse geolocations in
China. The values of distance and delay are obtained as
follows: First, based on the known geolocations of the
probing hosts and landmarks, we calculate the
geographical distances of (probing host, landmark)
pairs using Vincenty’s formula, which assumes that the
figure of the Earth is an oblate spheroid, and
accordingly is more accurate than models assuming a
spherical Earth. Second, the delays of (probing host,
landmark) pairs are measured using HTTP request.
Fig(5). Graph performance of location used in ip geolocation
Each probing host measures the delay to a
landmark for 10 times, and takes the minimum value as
the delay between them. Not all probing hosts finish
measuring all landmarks because of server failure,
timeouts, or poor network connectivity.
To choose the (probing host, landmark) pairs
whose minimum delays are within 100 ms (larger value
http://www.ijettjournal.org
Page 252
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 5 - Apr 2014
is meaningless). Totally, we get 200,796 such (delay,
distance) pairs in our data set. To believe this large data
set represents the characteristic of the delay-distance
relationship in China. In the data set, the delay varies
from 1 to 100 ms, with the median value of 52 ms, and
the distance varies from 0 to 3,712 km, with the median
value of 887 km. There are no biased delay values or
distance values that have much higher percentage than
others.. As for mentioned, the baseline is drawn by the
ideal digital transmitting speed in fiber (2/3 of the light
speed), and the best line is fit tightly above all (delay,
distance) pairs. To observe that there is no obvious
delay distance correlation and the distance/delay slope
for the best line is about 64 km/ms. Further, there is no
empty region in the lower right part indicating that
there is no reliable nonzero lower bound distance for a
delay value. We further study the delay-distance
relationship in the data set, including delay-distance
correlation and closest shortest rule.
ii. Delay-Distance Correlation
To use correlation coefficient to quantify the
linearity of delay-distance correlation. Given all the
delay values and distance values in the data set, assume
the
deviation
of
delay
is
D(de)
the deviation of distance is D(di), and the covariance
between delay and distance is cov(de,di) then the
correlation coefficient of delay and distance, corr(de,di)
is calculated as follows
corr(de,di) = cov(de,di)/(sqrt(D(de)) * sqrt(D(di)))
The absolute value of corr(de,di) is within [0,
1] If that value is closer to 1, delay and distance are
more strongly correlated. The delay-distance
correlation coefficient differs a lot from different
regions of the Internet. Ziviani et al. calculate that the
delay-distance correlation coefficient is high for North
America and Western Europe, from about 0.73 to 0.89;
but the value for worldwide Internet is very low,
around 0.3. In our data set, the delay-distance
correlation coefficient of all the (delay, distance) pairs
is 0.3734. Therefore, the linearity between delay and
distance is positive but rather weak in the Internet
region we study. There are many reasons for the weak
delay-distance correlation, such as networking
congestion, circuitous paths, moderate inter-AS
connection as well as unevenness in Internet
connectivity within the region. Most of the reasons are
owing to the moderate Internet connectivity.
For moderately connected Internet regions, if
we use the best line to extract the distance from a delay
value, the result will be overly skewed for many cases.
Shows the CDF of the difference between extracted
ISSN: 2231-5381
distance and actual distance. The average for all (delay,
distance) pairs is2,152 km. This difference is too big
considering the maximum distance geographical
distance is only 3,712 km. Therefore, though traditional
IP-geolocation mapping schemes that depend on strong
delay-distance correlation work well for richly
connected Internet regions, they are not suitable for
moderately connected Internet regions.
Fig 5.2 CDF of the difference between extracted distance
V. Conclusion
A geolocation system able to geolocate IP
addresses with more than an order of magnitude better
precision than the best previous methodThe closest
shortest rule holds with high probability.In future work
in addition to proposed system to introduce the chart
analysis used to measure and analyze the given data
that should be clear. So that the data view
representation which involving to access the
representation in pictorial form.
The graphical way of image can be easily
understandable to the user. The time must be reduced
by developing this concept.The data must be consistent
and cleared directly view to the user. These deals with
used to view the map with the longitude and latitude. It
measures the right path for the user and get the data
easily.
References
[1] V. Aggarwal, A. Feldmann, and C. Scheideler, “Can ISPs and
P2P Users Cooperate for Improved Performance?” ACM SIGCOMM
Computer Comm. Rev., vol. 37, no. 3, pp. 29-40, 2007.
[2] R. Bindal et al., “Improving Traffic Locality in BitTorrent via
Biased Neighbor Selection,” Proc. IEEE Int’l Conf. Distributed
Computing Systems (ICDCS ’06), 2006.
[3] K.Xu et al., “LBMP: A Logarithm-Barrier-Based Multipath
Protocol for Internet Traffic Management,” IEEE Trans. Parallel and
Distributed Systems, vol. 22, no. 3, pp. 476-488, Mar. 2011..
[4] IP Address Geolocation to Identify Website Visitor’s
Geographysical Location, http://www.ip2location.com/. 2012.
[5] V. Padmanabhan and L. Subramanian, “An Investigation of
Geographic Mapping Techniques for Internet Hosts,” Proc. ACM
SIGCOMM ’01, 2001.
http://www.ijettjournal.org
Page 253
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 5 - Apr 2014
[6] A. Ziviani et al. “Improving the Accuracy of Measurement-Based
Geographic Location of Internet Hosts,” Computer Networks, vol.
47, no. 4, pp. 503-523, 2005.
[7] B. Gueye et al., “Constraint-Based Geolocation of Internet
Hosts,” Proc. ACM Internet Measurement Conf. (IMC ’04), 2004.
[8] E. Katz-Bassett et al. “Towards IP Geolocation Using Delay and
Topology Measurements,” Proc. ACM Internet Measurement Conf.
[9] B. Wong, I. Stoyanov, and E. Sirer, “Octant: A Comprehensive
Framework for the Geolocalization of Internet Hosts,” Proc. USENIX
Conf. Networked Systems Design and Implementation (NSDI ’07),
2007.
[10] T. Vincenty, “Direct and Inverse Solutions of Geodesics on the
Ellipsoid with Application of Nested Equations,” Survey Rev., vol.
22, no. 176, pp. 88-93, 1975
[11] B. Eriksson et al., “A Learning-Based Approach for IP
Geolocation,” Proc. Int’l Conf. Passive and Active Measurement
(PAM ’10), 2010.
[13] C. Guo et al., “Mining the Web and the Internet for Accurate IP
Address Geolocations,” Proc. IEEE INFOCOM ’09, 2009.
[14] J. Muir and P. Oorschot, “Internet Geolocation: Evasion and
Counterevasion,” ACM Computing Surveys, vol. 42, no. 1, 2009.
[15] S. Laki, P. Matray, and P. Haga, “Spotter: A Model Based
Active Geolocation Service,” Proc. IEEE INFOCOM ’11, 2011.
[16] M. Arif, S. Karunasekera, and S. Kulkarni, “GeoWeight:
Internet Host Geolocation Based on a Probability Model for Latency
Measurements,” Proc. 33rd Australasian Conf. Computer Science
(ACSC ’10), 2010.
[17] B. Gueye, S. Uhlig, and S. Fdida, “Investigating the Imprecision
of IP Block-Based Geolocation,” Proc. Int’l Conf. Passive and
ActiveNetwork Measurement (PAM ’07), 2007
[18] B. Gueye et al., “Leveraging Buffering Delay Estimation for
Geolocation of Internet Hosts,” Proc. Int’l IFIP-TC6 Conf.
Networking Technologies, Services, and Protocols (Networking ’06),
2006.
[19] Y. Wang et al., “Towards Street-Level Client-Independent IP
Geolocation,” Proc. USENIX Conf. Networked Systems Design and
Implementation (NSDI ’11), 2011
[20] T. Ng and H. Zhang, “Predicting Internet Network Distance with
Coordinates-Based Approaches,” Proc. IEEE INFOCOM ’02, 2002.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 254
Download