“Spatial Approximate String Search”, IEEE TRANSACTIONS ON

advertisement
Spatial Approximate String Search
ABSTRACT:
This work deals with the approximate string search in large spatial databases.
Specifically, we investigate range queries augmented with a string similarity search
predicate in both Euclidean space and road networks. We dub this query the spatial
approximate string (SAS) query. In Euclidean space, we propose an approximate
solution, the MHR-tree, which embeds min-wise signatures into an R-tree. The
min-wise signature for an index node u keeps a concise representation of the union
of q-grams from strings under the sub-tree of u. We analyze the pruning
functionality of such signatures based on the set resemblance between the query
string and the q-grams from the sub-trees of index nodes. We also discuss how to
estimate the selectivity of a SAS query in Euclidean space, for which we present a
novel adaptive algorithm to find balanced partitions using both the spatial and
string information stored in the tree. For queries on road networks, we propose a
novel exact method, RSASSOL, which significantly outperforms the baseline
algorithm in practice. The RSASSOL combines the q-gram based inverted lists and
the reference nodes based pruning. Extensive experiments on large real data sets
demonstrate the efficiency and effectiveness of our approaches.
EXISTING SYSTEM
Keyword search over a large amount of data is an important operation in a wide
range of domains. Felipe et al. has recently extended its study to spatial databases,
where keyword search becomes a fundamental building block for an increasing
number of real-world applications, and proposed the IR -Tree.
DISADVANTAGES OF EXISTING SYSTEM:
 A main limitation of the IR -Tree is that it only supports exact keyword
search.
 Exact Keyword Require For Searching the Results.
PROPOSED SYSTEM:
For RSAS queries, the baseline spatial solution is based on the Dijkstra’s
algorithm. Given a query point q, the query range radius r, and a string predicate,
we expand from q on the road network using the Dijkstra algorithm until we reach
the points distance r away from q and verify the string predicate either in a post-
processing step or on the intermediate results of the expansion. We denote this
approach as the Dijkstra solution. Its performance degrades quickly when the
query range enlarges and/or the data on the network increases. This motivates us to
find a novel method to avoid the unnecessary road network expansions, by
combining the prunings from both the spatial and the string predicates
simultaneously.
We demonstrate the efficiency and effectiveness of our proposed methods
for SAS queries using a comprehensive experimental evaluation. For ESAS
queries, our experimental evaluation covers both synthetic and real data sets of up
to 10 millions points and 6 dimensions. For RSAS queries, our evaluation is based
on two large, real road network datasets, that contain up to 175,813 nodes, 179,179
edges, and 2 millions points on the road network. In both cases, our methods have
significantly outperformed the respective baseline methods.
ADVANTAGES OF PROPOSED SYSTEM:
This is very helpful for Exact Result from Non Exact keywords .
SYSTEM CONFIGURATION:HARDWARE CONFIGURATION:-
 Processor
 Speed
-
Pentium –IV
1.1 Ghz
 RAM
-
256 MB(min)
 Hard Disk
-
20 GB
 Key Board
-
Standard Windows Keyboard
 Mouse
-
Two or Three Button Mouse
 Monitor
-
SVGA
SOFTWARE CONFIGURATION:-
 Operating System
: Windows XP
 Programming Language
: JAVA/J2EE.
 Java Version
: JDK 1.6 & above.
 Database
: MYSQL
REFERENCE:
Feifei Li, Member, IEEE, Bin Yao, Mingwang Tang, and Marios Hadjieleftheriou,
“Spatial
Approximate
String
Search”,
IEEE
TRANSACTIONS
ON
KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 6, JUNE 2013.
Download