International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 1 – Oct 2014 An Efficient Language Interoperability based Search Engine for Mobile Users 1 Pilli Srivalli 1 Final M.Tech Student 1 Dept of CSE, MVGR college of Engineering ,Chintavalasa,AP,India. Abstract:Optimizing the search engines in mobile phones is still an important research issue in the field of knowledge and data engineering, even though various approaches available, performance and time complexity issues are the primary factors while implementation of the search engines, We are proposing an efficient personalized mobile search engine with efficient features of Mining, ranking and cache implementation over the service web services. I. INTRODUCTION Web search engines [1] have made enormous contributionsto the web and society. They make finding informationon the web quick and easy. However, they arefar from optimal. A major deficiency of generic searchengines is that they follow the ‘‘one size fits all’’ modeland are not adaptable to individual users. This istypically shown in cases such as these: 1. Different users have different backgrounds and interests.They may have completely different informationneeds and goals when providing exactly the samequery. For example, a biologist may issue ‘‘mouse’’to get information about rodents, while programmersmay use the same query to find informationabout computer peripherals. When such a query isissued, generic search engines will return a list ofdocuments on different topics. It takes time for auser to choose which information he/she reallywants, and this makes the user feel less satisfied.Queries like ‘‘mouse’’ are usually called ambiguousqueries. Statistics has shown that the vast majority ofqueries are short and ambiguous. Generic web searchusually fails to provide optimal results for ambiguous queries. 2. Users are not static. User information needs may change over time. Indeed, users will have differentneeds at different times based on current circumstances.For example, a user may use ‘‘mouse’’ tofind information about rodents when the user isviewing television news about a plague, but wouldwant to find information about computer mouseproducts when purchasing a new computer. Genericsearch engines are unable to distinguish betweensuchcases.Personalized web search is considered a promisingsolution to address these problems, since it canprovide different search results based upon the preferencesand information needs of users. It exploitsuser information and search context in learningto which sense a query refers. Consider the query‘‘mouse’’ mentioned ISSN: 2231-5381 above: Personalized web search[2]can disambiguate the query by gathering the following user information: 1. The user is a computer programmer, not a biologist. 2. The user has just input a query ‘‘keyboard,’’ but not‘ ‘biology’’ or ‘‘genome.’’ Before entering this query, the user had just viewed a web page with many words related to computer mouse, such as ‘‘computing, ’input device,’’ and ‘‘keyboard.’’ The World-Wide Web[3,4] has reached a size where it is becoming increasingly challenging to satisfy certain information needs. While search engines are still able to index a reasonable subset of the (surface) web, the pages a user is really looking for are often buried under hundreds of thousands of less interesting results. Thus, search engine users are in danger of drowning in information. Adding additional terms to standard keyword searches often fails to narrow down results in the desired direction. A natural approach is to add advanced features that allow users to express other constraints or preferences in an intuitive manner, resulting in the desired documents to be returned among the first results. In fact, search engines have added a variety of such features, often under a special advanced search interface, but mostly limited to fairly simple conditions on domain, link structure, or modification date. We expect that geographic search engines[5], i.e., search engines that support geographic preferences, will have a major impact on search technology and their business models. First, geographic search engines provide a very useful tool. They allow users to express in a single query what might take multiple queries with a standard search engine. A user of a standard search engine looking for a yoga school in or close to Brooklyn, New York, might have to try queries such as yoga ‘‘new york’ ’but this might yield inferior results as there are many ways to refer to a particular area, and since a purely text-based engine has no notion of geographical closeness (e.g., a result across the bridge to Manhattan or nearby in Queens might also be acceptable). Second, geographic search is a fundamental enabling technology for location-based services, including http://www.ijettjournal.org Page 18 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 1 – Oct 2014 electronic commerce via cellular phones and other mobile devices[6]. Third, geographic search supports locally targeted web advertising, thus attracting advertisement budgets of small businesses with a local focus. Other opportunities arise from mining geographic properties of the web, e.g., for market research and competitive intelligence. II. RELATED WORK Various Search engines developed from the so many Year of research from the various researchers, But they still have the vulnerabilities in optimization, Specifically in personalized mobile search engines, Only the mining of results may not give the optimal results to the user search query, Time complexity and space complexity are also the factors while implementing the personalized mobile search engines[7,8]. Round trip should be performed for each search Lack of language interoperability Increases the redundancy and malfunctioning of business logic Less performance We define popularity factors that attempt to capture search history and the preferences of millions of search engine users. Currently, Web users interact with search engines by providing several search keywords[9] and selecting Web pages from the search results. We attempt to capture as much usage information as possible and to make use of captured information. The first factor to be defined is the keyword popularity. When a user entered keywords and clicked search, the search engine will store the keywords and update their weights. Some words called stop words are removed before storing the keywords in the database. For instance, when a user types “department of computer science”, the word “of” is not stored as the search key. The order of the words is taking to consideration. For instance, the term “computer science” is store as it is in that order. If a user type “science computer” then a new entry will be create tocapturethis new terms. Each of the terms, be it a single word or several words, will be associated with a weight that records the frequency that the terms have been used. The second factor to be defined is the keyword to Webpage popularity. After the search engine returns the search ISSN: 2231-5381 results to the user, the user will select Web pages for viewing. The relationships between the search keywords and the selected Web pages will be recorded[10]. The relationships capture the preferences of the users. Some search engines, such as Google, currently cannot capture the relationships. Using Google, for example, when a user clicks on a link on the search results, the browser directly goes to retrieve the Web pages based on the given URL. The search engine does not know what link has been clicked. To allow the search engine to know what link clicked, each click needs to be passed through the search engine. The search keywords and the destination URL is embedded on each link provided on the search results. When a user clicks a link, the browser passes these data to the search engine. The search engine records the data and then redirects the browser to go to retrieve the destination Web page. The third factor to be defined is the Web page popularity. There are several ways to define the Web page popularity. The most obvious way is to define it as the number of times a Web page has been selected. When a user clicks on a link on the search results, the Web page associated with the link is recorded. This information can be collected when the second factor described above is collected. This method to define Web page popularityshouldbe accompanied by measuring the amount of time auser spent on reading the Web page. This information canbe collected by determining the difference between two-time stamps of two consecutive clicks. Whenever a user clicks on a link, the time is recorded by the search engine. The assumption is that the user clicks on a link, reads the retrieved Web page, and then clicks on another link. In here, we introduce a new way to define the Web page popularity by counting the number of popular keywords contained in the page. The idea is that if a Web page contains large number of popular keywords, then it should be considered as more popular. All these ways of defining the Web page popularity can be combined to from a comprehensive one. III. PROPOSED WORK We are proposing an efficient mechanism of mobile search engine to meet complete user requirements or user satisfied http://www.ijettjournal.org Page 19 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 1 – Oct 2014 results and retrieval of search results in optimal manner by the approaches of mining implementation, the previous or traditional search results based on spatial information like geo codes based search results for user search input query, search results can be depends on document weight of file relevance score and it can be computed with two parameters. TF(term frequency) and IDF (inverse document frequency ) and Cache implementation for the frequently accessed previous search results for specific input query to enhance the performance and to reduce the complexity issues from the both end points. It was proved that a relevant number of input queries or multiple queries weregeo or location based input keywords or queries and they are concentrating on geo or location information, to retrieve such input queries that emphasizes on geo or location based information, so many number of locationbased search implementations developed for location or spatial queries have been proposed. In our proposed system, it supports language interoperability (i.e. any standard language can communicate with other language) through SOA (service oriented application) and minimizes the chances of duplication of business logic by maintaining it at centralized location or centralized web application server instead of maintain the business logic or set of operations at multiple locations. Search engine performance can be improved by the simple cache implementation and file relevance based rank oriented results from files or documents. Web service is one of technology to createSOA (service oriented architecture) with three tier architecture, it minimizes duplication of operations by maintain the business logic at specific one location (centralized server). The main goal of the service oriented architecture is language interoperability (i.e. any standard language can communicate with other language even though both are different languages) and minimizes the damage chances from client end. Database Business Logic Wsdl with Soap protocol UI (VB.Net) UI (Java) UI (Android) Fig1: Web service Architecture Data Cache is a mechanism which increases the performance from user end and reduces over head from server end and stores frequently access results for future retrieval when user requested for same input query it reduces execution time i.e. (round trip over the input request and response time from server during the user input ISSN: 2231-5381 query can be minimized in terms of time complexity and minimizes additional overhead on server to process the same input keyword. If any user request with same input query which is requested before, query need not to process by server again and no need of a round trip , because previous search results retrieved from the web http://www.ijettjournal.org Page 20 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 1 – Oct 2014 server before forwarded to user and it can be stored in data cache ,next search onwards input query results retrieved from cache storage instead of web server. Initially every document is preprocessed and eliminates inconsistent or un necessary keywords from document and compute document weight or file relevance score with term frequency (TF) and inverse document frequency (IDF). TF computes the number of occurrences or frequency of a search query or keyword in an individual file and IDF (Inverse document frequency) computes the number of occurrences or frequency the input search query in all files or documents which have keyword then file relevance score or document weight can be computed in terms of TF and IDF. 3. Request Web service Data base 4. Result 2. Forward Request Mobile User 5. Results 1. New Account 7. Search Results Cache 8. Send Request 6. Store in cache 9. Result Fig2: Proposed Architecture Sequential Steps for Rank oriented results from Web service as follows 1. User makes a request with search query from Mobile 2.Request forwards to data cache and checks previous retrieval results, if same query results available then returns from data cache otherwise forwards request to business logic. 3. Service or business logic retrieves rank oriented results based on term frequency and inverse document frequency from the data sources. FileScore=TF*IDF FileScore= document weight or file relevance score TF is term frequency (number of occurrences of a keyword in a single document) ISSN: 2231-5381 IDF=Inverse document frequency (number of occurrences of a keyword in all documents) 4. Search results can stored in data Cache for future retrieval of same query 5. from cache, ranking based search results can be forwarded to mobile when user who makes same request. For experimental implementation we tested SOA(service oriented architecture) in C#.Net and Android for user interface and generation of soap objects. Set of operations or business logic is available in C#.net at server end.UI( user interface) can be android , input search keyword can be given through soap (simple object access protocol) objects with web service description language in abstract way of communication and calculations and retrieval can be done at web service for file relevance based results. http://www.ijettjournal.org Page 21 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 1 – Oct 2014 IV. CONCLUSION We have been concluding our current research work with efficient file relevance based ranking oriented results in mobile search engine through service oriented architecture. Cache Implementation enhances the performance by minimizing round trip time or execution time of search query. If same query is processed by the same user before and Our experimental result shows efficient results than previous mechanisms. IEEE Trans. Knowledge and Data Eng., vol. 20, no. 11, pp. 1505-1518, Nov. 2008. [9] H. Li, Z. Li, W.-C. Lee, and D.L. Lee, “A Probabilistic Topic-Based Ranking Framework for Location-Sensitive Domain Information Retrieval,” Proc. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), 2009. [10] B. Liu, W.S. Lee, P.S. Yu, and X. Li, “Partially Supervised Classification of Text Documents,” Proc. Int’l Conf. Machine Learning (ICML), 2002. BIOGRAPHIES IV. CONCLUSION We have been concluding our current research work with efficient file relevance based ranking oriented results in mobile search engine through service oriented architecture. Cache Implementation enhances the performance by minimizing round trip time or execution time of search query. If same query is processed by the same user before and Our experimental result shows efficient results than previous mechanisms REFERENCES [1] E. Agichtein, E. Brill, and S. Dumais, “Improving Web SearchRanking by Incorporating User Behavior Information,” Proc. 29thAnn.Int’l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), 2006. [2] E. Agichtein, E. Brill, S. Dumais, and R. Ragno, “Learning User Interaction Models for Predicting Web Search Result Preferences,”Proc. Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), 2006. [3] Y.-Y. Chen, T. Suel, and A. Markowetz, “Efficient Query Processing in Geographic Web Search Engines,” Proc. Int’l ACMSIGIR Conf. Research and Development in Information Retrieval(SIGIR), 2006. [4] K.W. Church, W. Gale, P. Hanks, and D. Hindle, “Using Statistics in Lexical Analysis,” Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, Psychology Press, 1991. [5] Q. Gan, J. Attenberg, A. Markowetz, and T. Suel, “Analysis of Geographic Queries in a Search Engine Log,” Proc.FirstInt’lWorkshop Location and the Web (LocWeb), 2008. [6] T. Joachims, “Optimizing Search Engines Using ClickthroughData,” Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, 2002. [7] K.W.-T. Leung, D.L. Lee, and W.-C.Lee, “Personalized Web Search with Location Preferences,” Proc. IEEE Int’l Conf. Data Mining (ICDE), 2010. [8] K.W.-T. Leung, W. Ng, and D.L. Lee, “Personalized Concept-Based Clustering of Search Engine Queries,” ISSN: 2231-5381 Mr. P.S.SITARAMA RAJU, well known and excellent Teacher received M.Tech (CSE) from CENTRAL UNIVERSITY, Hyderabad. He is working as professor (H.O.D) Dept of CSE at MaharajVijayaramGajapathi Raj College of Engineering. He has 161/2 years of industrial and teaching experience and to his credit couple of publications both national and international conferences/journals. His area of interest includes Object Oriented software & languages, System Architecture System Software. Pilli Srivalli is a student of MaharajVijayaramGajapathi Raj college of Engineering,Chintavalasa. Presently she is pursuing M.Tech [Computer Science] from this college and she received her M.C.A from Godavari Institute of Engineering and Technology, affiliated to JNTU Kakinada, Rajahmundry in the year 2011. Her area of interest includes Programming and DBMS all current trends techniques in Computer science. http://www.ijettjournal.org Page 22