E cient Algorithms for answering m-Closest Keywords Query by Tao Guo, Xin Cao, Gao Cong Presented by Sagar V Dwibhashyam, Abhilash Reddy 1 Introduction 2.1 GPS devices Massive growth of geo-spatial data and textual data across like Points of Interests (hotels, restaurants, businesses, etc) Photos with tags Geo locations in social photo sharing websites Check in information This above information generated has increased the prominence of spatial keyword queries. 2.2 Spatial keyword query 3.1 It gives the objects which matches the arguments mentioned in the query by using the location information and textual description of the objects existing in the database. m-closest keywords query(mCK) is one of the type of spatial keyword query. It is nding a set of closest keywords from the database. 3.2 The Existing algorithms almost took 1 hour to answer query containing just 8 keywords in a database of 1 million objects. 3.3 M-Closest keywords query 4.1 O is de ned as the set of all geo-textual objects in the database. o ∈ O , where o is the each object. Each object has two attributes o.λ (location) o.ψ(textual). mCK query processed m keywords. The output will be diameter of a group of objects. 4.2 4.3 Problems to be addressed 5.1 To nd the hardness of the above problem. Prove that mCK query is NP hard by reducing it to the existing 3-SAT problem. Analyzing a greedy approach to solving the problem. And then extending this greedy approach to three approximation algorithms which yield better performance. Then developing a exact algorithm. 5.2 mCK query nds the circle with the smallest diameter that encloses a group of objects together covering all query keywords. Circle is called as “smallest keyword enclosing circle” (SKECq) where q is the query. SKECq is solvable in polynomial time. But it is not ef cient (High time complexity). 5.3 Existing Solutions for mCK Queries 6.1 bR*-tree based method Virtual bR*-tree based method Spatial group keyword query 6.2 GKG Algorithm: Greedy Approach 7.1 Greedy Keyword Group Steps: Find the most infrequent keyword among the keywords in the query q. Each object that contains this infrequent keyword, the neighboring objects are searched whether if they contain remaining keywords from the query. So the objects with infrequent keyword and the neighboring objects with query keywords form a group. Once all the objects with infrequent keywords are processed, which will give us number of group. Then the group with smallest diameter is chosen as the result. Approximation ratio: 2 7.2 SKEC 8.1 MCC (Minimum Covering Circle): is the circle that encloses them with the smallest diameter. Keywords Enclosing Circle is a circle that encloses a group of objects covering all the given keywords. One with the smallest diameter the Smallest Keywords Enclosing Circle. Object-across Keywords Enclosing Circle (KECo): Objects, which are on the circumference of the circle. Obtain group using GKG algorithm, where MCC will serve as a upper bound for the diameter. In this group, if it covers all query keywords, we return the object. Else, we nd smallest KECo and then update the smallest diameter with this object. Approximation ratio: 2/√3 8.2 SKECa 9.1 Given a set of keywords ψ and an object o, if there exists no o-across keywords enclosing circle (KECo) with diameter, D then no KECo exists whose diameter is smaller than D. We use binary search to nd the diameter and position of SKECo. Here the Upper bound of the algorithm will be the rst diameter found and the lower bound will be already be calculated from the greedy algorithm (GKG). Approximation Ratio: 2/√3 + e, where e is an arbitrary small value. 9.2 SKECa+ 10 . 1 In the above approach, if, on the earlier processed objects, the circles found are large, the upper bound is loose for sub-sequent search and the checking cost is high. To overcome this problem, we do binary search on all the objects with infrequent keyword. In this algorithm, we rst do the binary search and then nd object across keywords enclosing circle. 10 . 2 EXACT 11 . 1 In the EXACT Algorithm, we use the best features of the previously given algorithms First, we nd the object group using the SKECa+ algorithm. The diameters of Minimum Covering circle that is used in SKEC algorithm, are used for the above groups. We check which of these groups have least diameter. Then return that as a best suitable group. 11 . 2 Datasets 12 . 1 Typical Dataset consists of two les: Doc le and Loc le. Doc le: Consists of the ID and the textual description keywords. Loc le: ID, latitude and longitude. Query le: where you have sample query, which will be processed. 12 . 2 Experimentation 13 . 1 NY -485,059 Objects 116,546 Unique keywords 1,143,013 Total keywords Query le: where you hThe numbers of keywords in the query are varied so evaluate the performance. 13 . 2 Applications: of mCK query 14 . 1 It can be used in detecting geographic locations of web resources such as documents or photos. Given a document or a photo with some tags, we can issue a mCK query using these tags. mCK query has potential applications for location-based service providers. Customers of Apple products can submit ‘Apple store subway’ to locate a retailer store to purchase products nearby. Tourist can nd a location where there are places, attractions that she can walk from. 14 . 2 Thank You 15