E cient Algorithms for answering m-Closest Keywords Query

advertisement
E cient Algorithms for
answering m-Closest Keywords
Query
by Tao Guo, Xin Cao, Gao Cong
Presented by Sagar V Dwibhashyam, Abhilash Reddy
1
Introduction
2.1
GPS devices
Massive growth of geo-spatial data and textual data across
like
Points of Interests (hotels, restaurants, businesses, etc)
Photos with tags
Geo locations in social photo sharing websites
Check in information
This above information generated has increased the
prominence of spatial keyword queries.
2.2
Spatial keyword
query
3.1
It gives the objects which matches the arguments
mentioned in the query by using the location information
and textual description of the objects existing in the
database.
m-closest keywords query(mCK) is one of the type of
spatial keyword query.
It is nding a set of closest keywords from the database.
3.2
The Existing algorithms almost took 1 hour to answer query
containing just 8 keywords in a database of 1 million objects.
3.3
M-Closest
keywords query
4.1
O is de ned as the set of all geo-textual objects in the
database.
o ∈ O , where o is the each object.
Each object has two attributes o.λ (location) o.ψ(textual).
mCK query processed m keywords.
The output will be diameter of a group of objects.
4.2
4.3
Problems to be
addressed
5.1
To nd the hardness of the above problem. Prove that mCK
query is NP hard by reducing it to the existing 3-SAT
problem.
Analyzing a greedy approach to solving the problem.
And then extending this greedy approach to three
approximation algorithms which yield better performance.
Then developing a exact algorithm.
5.2
mCK query nds the circle with the smallest diameter that
encloses a group of objects together covering all query
keywords.
Circle is called as “smallest keyword enclosing circle”
(SKECq) where q is the query.
SKECq is solvable in polynomial time. But it is not ef cient
(High time complexity).
5.3
Existing Solutions
for mCK Queries
6.1
bR*-tree based method
Virtual bR*-tree based
method
Spatial group keyword query
6.2
GKG Algorithm:
Greedy Approach
7.1
Greedy Keyword Group
Steps:
Find the most infrequent keyword among the keywords
in the query q.
Each object that contains this infrequent keyword, the
neighboring objects are searched whether if they
contain remaining keywords from the query.
So the objects with infrequent keyword and the
neighboring objects with query keywords form a group.
Once all the objects with infrequent keywords are
processed, which will give us number of group.
Then the group with smallest diameter is chosen as the
result.
Approximation ratio: 2
7.2
SKEC
8.1
MCC (Minimum Covering Circle): is the circle that encloses
them with the smallest diameter.
Keywords Enclosing Circle is a circle that encloses a group
of objects covering all the given keywords. One with the
smallest diameter the Smallest Keywords Enclosing Circle.
Object-across Keywords Enclosing Circle (KECo): Objects,
which are on the circumference of the circle.
Obtain group using GKG algorithm, where MCC will serve
as a upper bound for the diameter.
In this group, if it covers all query keywords, we return the
object.
Else, we nd smallest KECo and then update the smallest
diameter with this object.
Approximation ratio: 2/√3
8.2
SKECa
9.1
Given a set of keywords ψ and an object o, if there exists no
o-across keywords enclosing circle (KECo) with diameter,
D then no KECo exists whose diameter is smaller than D.
We use binary search to nd the diameter and position of
SKECo.
Here the Upper bound of the algorithm will be the rst
diameter found and the lower bound will be already be
calculated from the greedy algorithm (GKG).
Approximation Ratio: 2/√3 + e, where e is an arbitrary
small value.
9.2
SKECa+
10 . 1
In the above approach, if, on the earlier processed objects,
the circles found are large, the upper bound is loose for
sub-sequent search and the checking cost is high.
To overcome this problem, we do binary search on all the
objects with infrequent keyword.
In this algorithm, we rst do the binary search and then
nd object across keywords enclosing circle.
10 . 2
EXACT
11 . 1
In the EXACT Algorithm, we use the best features of the
previously given algorithms
First, we nd the object group using the SKECa+ algorithm.
The diameters of Minimum Covering circle that is used in
SKEC algorithm, are used for the above groups.
We check which of these groups have least diameter.
Then return that as a best suitable group.
11 . 2
Datasets
12 . 1
Typical Dataset consists of two les: Doc le and Loc le.
Doc le: Consists of the ID and the textual description
keywords.
Loc le: ID, latitude and longitude.
Query le: where you have sample query, which will be
processed.
12 . 2
Experimentation
13 . 1
NY -485,059 Objects
116,546 Unique keywords
1,143,013 Total keywords
Query le: where you hThe numbers of keywords in the
query are varied so evaluate the performance.
13 . 2
Applications: of
mCK query
14 . 1
It can be used in detecting geographic locations of web
resources such as documents or photos. Given a document
or a photo with some tags, we can issue a mCK query using
these tags.
mCK query has potential applications for location-based
service providers.
Customers of Apple products can submit ‘Apple store
subway’ to locate a retailer store to purchase products
nearby.
Tourist can nd a location where there are places,
attractions that she can walk from.
14 . 2
Thank You
15
Download