An Efficient Web Service Based Mobile Search Engine Santhosh kumar Gottumukkala, D.D.D.Suribabu

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 7 – Sep 2014
An Efficient Web Service Based Mobile Search
Engine
1
Santhosh kumar Gottumukkala,2D.D.D.Suribabu
Final Year MTech student, Associate Professor
Computer science and engineering, D.N.R College Of Engineering And Technology, Bhimavaram
Abstract: Optimization of mobile search engine is an
interesting research issue in the field of search engine
optimization and information retrieval, although various
traditional approaches of search engines proposed by
various researchers, performance and time complexity are
the prime concern factors to optimize the mobile search
engine. We are proposing an efficient service oriented
architecture based search engine with feature set of file
relevance based ranking and implementation of cache to
enhance the performance.
I. INTRODUCTION
Main objective of the search engine is to retrieve
user interesting result from billions of related and unrelated
information, from years of research various authors made
improvements in mobile search engine development like
String transformation to generate correct number of
candidate set generations for input query forwarded by
user.
Globalization and localization are the techniques
to improves search performance by giving priority to the
local search results than global results.fpr example in India
if a user enters a keyword, initial result may be “IRCTC”
because most frequently used website for train enquires
,online ticket booking ,etc..
All the traditional approaches works based on file
relevance score, it means it considers term frequency and
inverse document frequency for ranking of document and
gives priority to most ranked documents but the drawback
with file relevance score based approaches gives
importance to frequency but not to time stamp of the
documents, then it does not give importance to recent
documents, vice versa with time stamp base approaches.
Personal data, i.e. browsing history, emails, etc.,
are mostly unstructured, for which it is hard to measure
privacy. In addition, it is also difficult to incorporate
unstructured data with search engines without
summarization. So, for the purpose of both web
personalization and privacy preservation, it is necessary for
an algorithm to collect, summarize, and organize a user’s
personal information into a structured user profile.
Meanwhile, the notion of privacy is highly subjective and
depends on the individuals involved. Things considered to
be private by one person could be something that others
ISSN: 2231-5381
would love to share. In this regard, the user should have
control over which parts of the user profile is shared with
the server.
Privacy concerns are natural and important
especially on the Internet. Some prior studies on Private
Information Retrieval (PIR) [4], focuses on the problem of
allowing the user to retrieve information while keeping the
query private. Instead, this study targets preserving privacy
of the user profile, while still benefiting from selective
access to general information that the user agrees to
release. To our knowledge, this problem has not been
studied in the context of personalized search. One possible
reason for this is that personal information, i.e. browsing
history and emails, is mostly unstructured data, for which
privacy is difficult to measure and quantify.
II. RELATED WORK
Various mobile Search engines developed from many
Years of research work from the various researchers, but
they still have pros and cons in optimization techniques,
Specifically in mobile search engines, Only mining of
results may not give
optimal solutions apart from
knowledge extraction , while user searches the query, time
complexity and space complexity are also the factors while
implementing search engines in mobiles.[7,8].The main
drawbacks with traditional architectures are as follows
 Round trip (request and response) should be
performed for each search (i.e. more execution
time).
 language interoperability
not achievable in
traditional architectures
 Increases the redundancy or duplication and
malfunctioning of business logic
 Less performance and many time and space
complexity issues
Search can be performed based on various factors like
query clicks, time stamp based URLs, query reformulation
based approaches, integration of query clicks,
reformulation approaches, pattern based approaches and
search history based techniques. Most of the search engines
http://www.ijettjournal.org
Page 325
International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 7 – Sep 2014
stores all the keywords which are entered by the user and
computes the weight of the words and update their weight
and removes some of the words which are not meeting the
threshold value which is set up the programmers or
unnecessary keywords like articles , prepositions, auxiliary
verbs, etc..
When a user enters a query like “Android OS”, it
stores the term and updates the weight. If user enters as “
OS Android” ,it treats as new keyword and results ca be
retrieved based on the query and their weights and results
and associated terms can be stored and recorded for future
search [10] .
Now a day’s pattern or feedback based search engines
works efficiently by applying pattern mining algorithms
over session based patterns to find frequently visited URLs
of the user, these results makes the new user’s search
simple and retrieves user interesting results. Some of the
search results can be retrieved from page popularity or
indexing of page, here it considers most frequently used
URL while entering the same term by multiple users. In
this paper we are introducing a file relevance score based
personalized mobile search engine in service oriented
architecture and with cache implementation
In the PMSE’s client-server architecture, PMSE clients
are responsible for storing the user click-throughs and the
ontologies derived from the PMSE server. Simple tasks,
such as updating click-thoughs and ontologies, creating
feature vectors, and displaying re-ranked search results are
handled by the PMSE clients with limited computational
power. On the other hand, heavy tasks, such as RSVM
training and re-ranking of search results, are handled by the
PMSE server. Moreover, in order to minimize the data
transmission between client and server, the PMSE client
would only need to submit a query together with the
feature vectors to the PMSE server, and the server would
automatically return a set of re-ranked search results
according to the preferences stated in the feature vectors.
The data transmission cost is minimized, because only the
essential data (i.e., query, feature vectors, ontologies and
search results) are transmitted between client and server
during the personalization process.
ISSN: 2231-5381
II. PROPOSED WORK
We are proposing an efficient mechanism of
mobile search engine to meet complete user requirements
or user satisfied results and retrieval of search results in
optimal manner by the approaches of mining
implementation, the previous or traditional search results
based on spatial information like geo codes based search
results for user search input query, search results can be
depends on document weight of file relevance score and it
can be computed with two parameters. TF(term frequency)
and IDF (inverse document frequency ) and Cache
implementation for the frequently accessed previous search
results for specific input query to enhance the performance
and to reduce the complexity issues from the both end
points. It was proved that a relevant number of input
queries or multiple queries were geo or location based
input keywords or queries and they are concentrating on
geo or location information, to retrieve such input queries
that emphasizes on geo or location based information, so
many number of location-based search implementations
developed for location or spatial queries have been
proposed. In our proposed system, it supports language
interoperability (i.e. any standard language can
communicate with other language) through SOA (service
oriented application) and minimizes the chances of
duplication of business logic by maintaining it at
centralized location or centralized web application server
instead of maintain the business logic or set of operations
at multiple locations. Search engine performance can be
improved by the simple cache implementation and file
relevance based rank oriented results from files or
documents.
Web service is one of technology to create
SOA (service oriented architecture) with three tier
architecture, it minimizes duplication of operations by
maintain the business logic at specific one location
(centralized server). The main goal of the service oriented
architecture is language interoperability (i.e. any standard
language can communicate with other language even
though both are different languages) and minimizes the
damage chances from client end.
http://www.ijettjournal.org
Page 326
International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 7 – Sep 2014
Database
Business
Logic
Wsdl with Soap protocol
UI (VB.Net)
UI (Java)
UI (Android)
Fig1: Web service Architecture
Data Cache is a mechanism which increases the
performance from user end and reduces over head from
server end and stores frequently access results for future
retrieval when user requested for same input query it
reduces execution time i.e. (round trip over the input
request and response time from server during the user input
query can be minimized in terms of time complexity and
minimizes additional overhead on server to process the
same input keyword. If any user request with same input
query which is requested before, query
need not to
process by server again and no need of a round trip ,
because previous search results retrieved from the web
server before forwarded to user and it can be stored in data
cache ,next search onwards input query results retrieved
from cache storage instead of web server.
ISSN: 2231-5381
Initially every document is preprocessed and eliminates
inconsistent or un necessary keywords from document and
compute document weight or file relevance score with term
frequency (TF) and inverse document frequency (IDF). TF
computes the number of occurrences or frequency of a
search query or keyword in an individual file and IDF
(Inverse document frequency) computes the number of
occurrences or frequency the input search query in all files
or documents which have keyword then file relevance
score or document weight can be computed in terms of TF
and IDF.
http://www.ijettjournal.org
Page 327
International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 7 – Sep 2014
3. Request
Web service
Data base
4. Result
2. Forward Request
Mobile User
5. Results
1. New Account
7. Search Results
Cache
8. Send Request
6. Store in cache
9. Result
Fig2: Proposed Architecture
Sequential Steps for Rank oriented results from Web service as follows
1. User makes a request with search query from Mobile
2. Request forwards to data cache and checks previous
retrieval results, if same query results available then returns
from data cache otherwise forwards request to business
logic.
3. Service or business logic retrieves rank oriented results
based on term frequency and inverse document frequency
from the data sources.
FileScore=TF*IDF
FileScore= document weight or file relevance score
TF is term frequency (number of occurrences of a keyword
in a single document)
IDF=Inverse document frequency (number of occurrences
of a keyword in all documents)
4. Search results can stored in data Cache for future
retrieval of same query
5. from cache, ranking based search results can be
forwarded to mobile when user who makes same request.
For experimental implementation we tested
SOA(service oriented architecture)
in C#.Net and
ISSN: 2231-5381
Android for user interface and generation of soap objects.
Set of operations or business logic is available in C#.net at
server end. UI( user interface) can be android , input search
keyword can be given through soap (simple object access
protocol) objects with web service description language in
abstract way of communication and calculations and
retrieval can be done at web service for file relevance
based results.
IV. CONCLUSION
We have been concluding our current research work
with efficient file relevance based ranking oriented results
in mobile search engine through service oriented
architecture. Cache Implementation enhances the
performance by minimizing round trip time or execution
time of search query. If same query is processed by the
same user before and Our experimental result shows
efficient results than previous mechanisms.
REFERENCES
[1] E. Agichtein, E. Brill, and S. Dumais, “Improving Web
SearchRanking by Incorporating User Behavior Information,” Proc.
29thAnn. Int’l ACM SIGIR Conf. Research and Development in
InformationRetrieval (SIGIR), 2006.
[2] E. Agichtein, E. Brill, S. Dumais, and R. Ragno, “Learning User
Interaction Models for Predicting Web Search Result Preferences,”Proc.
Ann. Int’l ACM SIGIR Conf. Research and Development inInformation
Retrieval (SIGIR), 2006.
http://www.ijettjournal.org
Page 328
International Journal of Engineering Trends and Technology (IJETT) – Volume 15 Number 7 – Sep 2014
[3] Y.-Y. Chen, T. Suel, and A. Markowetz, “Efficient Query Processing
in Geographic Web Search Engines,” Proc. Int’l ACMSIGIR Conf.
Research and Development in Information Retrieval(SIGIR), 2006.
[4] K.W. Church, W. Gale, P. Hanks, and D. Hindle, “Using Statistics in
Lexical Analysis,” Lexical Acquisition: Exploiting On-LineResources to
Build a Lexicon, Psychology Press, 1991.
[5] Q. Gan, J. Attenberg, A. Markowetz, and T. Suel, “Analysis of
Geographic Queries in a Search Engine Log,” Proc.First Int’lWorkshop
Location and the Web (LocWeb), 2008.
[6] T. Joachims, “Optimizing Search Engines Using Click through Data,”
Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and DataMining,
2002.
[7] K.W.-T. Leung, D.L. Lee, and W.-C. Lee, “Personalized WebSearch
with Location Preferences,” Proc. IEEE Int’l Conf. DataMining (ICDE),
2010.
[8] K.W.-T. Leung, W. Ng, and D.L. Lee, “Personalized Concept-Based
Clustering of Search Engine Queries,” IEEE Trans. Knowledge and Data
Eng., vol. 20, no. 11, pp. 1505-1518, Nov. 2008.
[9] H. Li, Z. Li, W.-C. Lee, and D.L. Lee, “A Probabilistic Topic-Based
Ranking Framework for Location-Sensitive Domain Information
Retrieval,” Proc. Int’l ACM SIGIR Conf. Research and Development in
Information Retrieval (SIGIR), 2009.
[10] B. Liu, W.S. Lee, P.S. Yu, and X. Li, “Partially Supervised
Classification of Text Documents,” Proc. Int’l Conf. Machine Learning
(ICML), 2002.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 329
Download