International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 7 – Sep 2014 An Efficient Web Service Based Mobile Search Engine 1 Santhosh kumar Gottumukkala,2D.D.D.Suribabu Final Year MTech student, Associate Professor Computer science and engineering, D.N.R College Of Engineering And Technology, Bhimavaram Abstract: Optimization of mobile search engine is an interesting research issue in the field of search engine optimization and information retrieval, although various traditional approaches of search engines proposed by various researchers, performance and time complexity are the prime concern factors to optimize the mobile search engine. We are proposing an efficient service oriented architecture based search engine with feature set of file relevance based ranking and implementation of cache to enhance the performance. I. INTRODUCTION Main objective of the search engine is to retrieve user interesting result from billions of related and unrelated information, from years of research various authors made improvements in mobile search engine development like String transformation to generate correct number of candidate set generations for input query forwarded by user. Globalization and localization are the techniques to improves search performance by giving priority to the local search results than global results.fpr example in India if a user enters a keyword, initial result may be “IRCTC” because most frequently used website for train enquires ,online ticket booking ,etc.. All the traditional approaches works based on file relevance score, it means it considers term frequency and inverse document frequency for ranking of document and gives priority to most ranked documents but the drawback with file relevance score based approaches gives importance to frequency but not to time stamp of the documents, then it does not give importance to recent documents, vice versa with time stamp base approaches. Personal data, i.e. browsing history, emails, etc., are mostly unstructured, for which it is hard to measure privacy. In addition, it is also difficult to incorporate unstructured data with search engines without summarization. So, for the purpose of both web personalization and privacy preservation, it is necessary for an algorithm to collect, summarize, and organize a user’s personal information into a structured user profile. Meanwhile, the notion of privacy is highly subjective and depends on the individuals involved. Things considered to be private by one person could be something that others ISSN: 2231-5381 would love to share. In this regard, the user should have control over which parts of the user profile is shared with the server. Privacy concerns are natural and important especially on the Internet. Some prior studies on Private Information Retrieval (PIR) [4], focuses on the problem of allowing the user to retrieve information while keeping the query private. Instead, this study targets preserving privacy of the user profile, while still benefiting from selective access to general information that the user agrees to release. To our knowledge, this problem has not been studied in the context of personalized search. One possible reason for this is that personal information, i.e. browsing history and emails, is mostly unstructured data, for which privacy is difficult to measure and quantify. II. RELATED WORK Various mobile Search engines developed from many Years of research work from the various researchers, but they still have pros and cons in optimization techniques, Specifically in mobile search engines, Only mining of results may not give optimal solutions apart from knowledge extraction , while user searches the query, time complexity and space complexity are also the factors while implementing search engines in mobiles.[7,8].The main drawbacks with traditional architectures are as follows Round trip (request and response) should be performed for each search (i.e. more execution time). language interoperability not achievable in traditional architectures Increases the redundancy or duplication and malfunctioning of business logic Less performance and many time and space complexity issues Search can be performed based on various factors like query clicks, time stamp based URLs, query reformulation based approaches, integration of query clicks, reformulation approaches, pattern based approaches and search history based techniques. Most of the search engines http://www.ijettjournal.org Page 325 International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 7 – Sep 2014 stores all the keywords which are entered by the user and computes the weight of the words and update their weight and removes some of the words which are not meeting the threshold value which is set up the programmers or unnecessary keywords like articles , prepositions, auxiliary verbs, etc.. When a user enters a query like “Android OS”, it stores the term and updates the weight. If user enters as “ OS Android” ,it treats as new keyword and results ca be retrieved based on the query and their weights and results and associated terms can be stored and recorded for future search [10] . Now a day’s pattern or feedback based search engines works efficiently by applying pattern mining algorithms over session based patterns to find frequently visited URLs of the user, these results makes the new user’s search simple and retrieves user interesting results. Some of the search results can be retrieved from page popularity or indexing of page, here it considers most frequently used URL while entering the same term by multiple users. In this paper we are introducing a file relevance score based personalized mobile search engine in service oriented architecture and with cache implementation In the PMSE’s client-server architecture, PMSE clients are responsible for storing the user click-throughs and the ontologies derived from the PMSE server. Simple tasks, such as updating click-thoughs and ontologies, creating feature vectors, and displaying re-ranked search results are handled by the PMSE clients with limited computational power. On the other hand, heavy tasks, such as RSVM training and re-ranking of search results, are handled by the PMSE server. Moreover, in order to minimize the data transmission between client and server, the PMSE client would only need to submit a query together with the feature vectors to the PMSE server, and the server would automatically return a set of re-ranked search results according to the preferences stated in the feature vectors. The data transmission cost is minimized, because only the essential data (i.e., query, feature vectors, ontologies and search results) are transmitted between client and server during the personalization process. ISSN: 2231-5381 II. PROPOSED WORK We are proposing an efficient mechanism of mobile search engine to meet complete user requirements or user satisfied results and retrieval of search results in optimal manner by the approaches of mining implementation, the previous or traditional search results based on spatial information like geo codes based search results for user search input query, search results can be depends on document weight of file relevance score and it can be computed with two parameters. TF(term frequency) and IDF (inverse document frequency ) and Cache implementation for the frequently accessed previous search results for specific input query to enhance the performance and to reduce the complexity issues from the both end points. It was proved that a relevant number of input queries or multiple queries were geo or location based input keywords or queries and they are concentrating on geo or location information, to retrieve such input queries that emphasizes on geo or location based information, so many number of location-based search implementations developed for location or spatial queries have been proposed. In our proposed system, it supports language interoperability (i.e. any standard language can communicate with other language) through SOA (service oriented application) and minimizes the chances of duplication of business logic by maintaining it at centralized location or centralized web application server instead of maintain the business logic or set of operations at multiple locations. Search engine performance can be improved by the simple cache implementation and file relevance based rank oriented results from files or documents. Web service is one of technology to create SOA (service oriented architecture) with three tier architecture, it minimizes duplication of operations by maintain the business logic at specific one location (centralized server). The main goal of the service oriented architecture is language interoperability (i.e. any standard language can communicate with other language even though both are different languages) and minimizes the damage chances from client end. http://www.ijettjournal.org Page 326 International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 7 – Sep 2014 Database Business Logic Wsdl with Soap protocol UI (VB.Net) UI (Java) UI (Android) Fig1: Web service Architecture Data Cache is a mechanism which increases the performance from user end and reduces over head from server end and stores frequently access results for future retrieval when user requested for same input query it reduces execution time i.e. (round trip over the input request and response time from server during the user input query can be minimized in terms of time complexity and minimizes additional overhead on server to process the same input keyword. If any user request with same input query which is requested before, query need not to process by server again and no need of a round trip , because previous search results retrieved from the web server before forwarded to user and it can be stored in data cache ,next search onwards input query results retrieved from cache storage instead of web server. ISSN: 2231-5381 Initially every document is preprocessed and eliminates inconsistent or un necessary keywords from document and compute document weight or file relevance score with term frequency (TF) and inverse document frequency (IDF). TF computes the number of occurrences or frequency of a search query or keyword in an individual file and IDF (Inverse document frequency) computes the number of occurrences or frequency the input search query in all files or documents which have keyword then file relevance score or document weight can be computed in terms of TF and IDF. http://www.ijettjournal.org Page 327 International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 7 – Sep 2014 3. Request Web service Data base 4. Result 2. Forward Request Mobile User 5. Results 1. New Account 7. Search Results Cache 8. Send Request 6. Store in cache 9. Result Fig2: Proposed Architecture Sequential Steps for Rank oriented results from Web service as follows 1. User makes a request with search query from Mobile 2. Request forwards to data cache and checks previous retrieval results, if same query results available then returns from data cache otherwise forwards request to business logic. 3. Service or business logic retrieves rank oriented results based on term frequency and inverse document frequency from the data sources. FileScore=TF*IDF FileScore= document weight or file relevance score TF is term frequency (number of occurrences of a keyword in a single document) IDF=Inverse document frequency (number of occurrences of a keyword in all documents) 4. Search results can stored in data Cache for future retrieval of same query 5. from cache, ranking based search results can be forwarded to mobile when user who makes same request. For experimental implementation we tested SOA(service oriented architecture) in C#.Net and ISSN: 2231-5381 Android for user interface and generation of soap objects. Set of operations or business logic is available in C#.net at server end. UI( user interface) can be android , input search keyword can be given through soap (simple object access protocol) objects with web service description language in abstract way of communication and calculations and retrieval can be done at web service for file relevance based results. IV. CONCLUSION We have been concluding our current research work with efficient file relevance based ranking oriented results in mobile search engine through service oriented architecture. Cache Implementation enhances the performance by minimizing round trip time or execution time of search query. If same query is processed by the same user before and Our experimental result shows efficient results than previous mechanisms. REFERENCES [1] E. Agichtein, E. Brill, and S. Dumais, “Improving Web SearchRanking by Incorporating User Behavior Information,” Proc. 29thAnn. Int’l ACM SIGIR Conf. Research and Development in InformationRetrieval (SIGIR), 2006. [2] E. Agichtein, E. Brill, S. Dumais, and R. Ragno, “Learning User Interaction Models for Predicting Web Search Result Preferences,”Proc. Ann. Int’l ACM SIGIR Conf. Research and Development inInformation Retrieval (SIGIR), 2006. http://www.ijettjournal.org Page 328 International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 7 – Sep 2014 [3] Y.-Y. Chen, T. Suel, and A. Markowetz, “Efficient Query Processing in Geographic Web Search Engines,” Proc. Int’l ACMSIGIR Conf. Research and Development in Information Retrieval(SIGIR), 2006. [4] K.W. Church, W. Gale, P. Hanks, and D. Hindle, “Using Statistics in Lexical Analysis,” Lexical Acquisition: Exploiting On-LineResources to Build a Lexicon, Psychology Press, 1991. [5] Q. Gan, J. Attenberg, A. Markowetz, and T. Suel, “Analysis of Geographic Queries in a Search Engine Log,” Proc.First Int’lWorkshop Location and the Web (LocWeb), 2008. [6] T. Joachims, “Optimizing Search Engines Using Click through Data,” Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and DataMining, 2002. [7] K.W.-T. Leung, D.L. Lee, and W.-C. Lee, “Personalized WebSearch with Location Preferences,” Proc. IEEE Int’l Conf. DataMining (ICDE), 2010. [8] K.W.-T. Leung, W. Ng, and D.L. Lee, “Personalized Concept-Based Clustering of Search Engine Queries,” IEEE Trans. Knowledge and Data Eng., vol. 20, no. 11, pp. 1505-1518, Nov. 2008. [9] H. Li, Z. Li, W.-C. Lee, and D.L. Lee, “A Probabilistic Topic-Based Ranking Framework for Location-Sensitive Domain Information Retrieval,” Proc. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), 2009. [10] B. Liu, W.S. Lee, P.S. Yu, and X. Li, “Partially Supervised Classification of Text Documents,” Proc. Int’l Conf. Machine Learning (ICML), 2002. ISSN: 2231-5381 http://www.ijettjournal.org Page 329