An Efficient Language Interoperability based Search Engine for Mobile Users Pilli Srivalli

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 1 – Oct 2014
An Efficient Language Interoperability based Search
Engine for Mobile Users
1
Pilli Srivalli
1
Final M.Tech Student
1
Dept of CSE, MVGR college of Engineering ,Chintavalasa,AP,India.
Abstract:Optimizing the search engines in mobile phones is still
an important research issue in the field of knowledge and data
engineering, even though various approaches available,
performance and time complexity issues are the primary factors
while implementation of the search engines, We are proposing an
efficient personalized mobile search engine with efficient features
of Mining, ranking and cache implementation over the service
web services.
I. INTRODUCTION
Web search engines [1] have made enormous
contributionsto the web and society. They make finding
informationon the web quick and easy. However, they
arefar from optimal. A major deficiency of generic
searchengines is that they follow the ‘‘one size fits all’’
modeland are not adaptable to individual users. This
istypically shown in cases such as these:
1. Different users have different backgrounds and
interests.They
may
have
completely
different
informationneeds and goals when providing exactly the
samequery. For example, a biologist may issue ‘‘mouse’’to
get information about rodents, while programmersmay use
the same query to find informationabout computer
peripherals. When such a query isissued, generic search
engines will return a list ofdocuments on different topics. It
takes time for auser to choose which information he/she
reallywants, and this makes the user feel less
satisfied.Queries like ‘‘mouse’’ are usually called
ambiguousqueries. Statistics has shown that the vast
majority ofqueries are short and ambiguous. Generic web
searchusually fails to provide optimal results for
ambiguous queries.
2. Users are not static. User information needs may change
over time. Indeed, users will have differentneeds at
different times based on current circumstances.For
example, a user may use ‘‘mouse’’ tofind information
about rodents when the user isviewing television news
about a plague, but wouldwant to find information about
computer mouseproducts when purchasing a new
computer. Genericsearch engines are unable to distinguish
betweensuchcases.Personalized web search is considered a
promisingsolution to address these problems, since it
canprovide different search results based upon the
preferencesand information needs of users. It exploitsuser
information and search context in learningto which sense a
query refers. Consider the query‘‘mouse’’ mentioned
ISSN: 2231-5381
above: Personalized web search[2]can disambiguate the
query by gathering the following user information:
1. The user is a computer programmer, not a biologist.
2. The user has just input a query ‘‘keyboard,’’ but not‘
‘biology’’ or ‘‘genome.’’ Before entering this query, the
user had just viewed a web page with many words related
to computer mouse, such as ‘‘computing, ’input device,’’
and ‘‘keyboard.’’
The World-Wide Web[3,4] has reached a size
where it is becoming increasingly challenging to satisfy
certain information needs. While search engines are still
able to index a reasonable subset of the (surface) web, the
pages a user is really looking for are often buried under
hundreds of thousands of less interesting results. Thus,
search engine users are in danger of drowning in
information. Adding additional terms to standard keyword
searches often fails to narrow down results in the desired
direction. A natural approach is to add advanced features
that allow users to express other constraints or preferences
in an intuitive manner, resulting in the desired documents
to be returned among the first results. In fact, search
engines have added a variety of such features, often under a
special advanced search interface, but mostly limited to
fairly simple conditions on domain, link structure, or
modification date.
We expect that geographic search engines[5], i.e.,
search engines that support geographic preferences, will
have a major impact on search technology and their
business models. First, geographic search engines provide a
very useful tool. They allow users to express in a single
query what might take multiple queries with a standard
search engine. A user of a standard search engine looking
for a yoga school in or close to Brooklyn, New York, might
have to try queries such as yoga ‘‘new york’ ’but this might
yield inferior results as there are many ways to refer to a
particular area, and since a purely text-based engine has no
notion of geographical closeness (e.g., a result across the
bridge to Manhattan or nearby in Queens might also be
acceptable). Second, geographic search is a fundamental
enabling technology for location-based services, including
http://www.ijettjournal.org
Page 18
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 1 – Oct 2014
electronic commerce via cellular phones and other mobile
devices[6].
Third, geographic search supports locally targeted
web advertising, thus attracting advertisement budgets of
small businesses with a local focus. Other opportunities
arise from mining geographic properties of the web, e.g.,
for market research and competitive intelligence.
II. RELATED WORK
Various Search engines developed from the so many Year
of research from the various researchers, But they still have
the vulnerabilities in optimization, Specifically in
personalized mobile search engines, Only the mining of
results may not give the optimal results to the user search
query, Time complexity and space complexity are also the
factors while implementing the personalized mobile search
engines[7,8].




Round trip should be performed for each search
Lack of language interoperability
Increases the redundancy and malfunctioning of
business logic
Less performance
We define popularity factors that attempt to
capture search history and the preferences of millions
of search engine users. Currently, Web users interact
with search engines by providing several search
keywords[9] and selecting Web pages from the search
results. We attempt to capture as much usage
information as possible and to make use of captured
information. The first factor to be defined is the
keyword popularity. When a user entered keywords
and clicked search, the search engine will store the
keywords and update their weights. Some words called
stop words are removed before storing the keywords in
the database. For instance, when a user types
“department of computer science”, the word “of” is not
stored as the search key.
The order of the words is taking to consideration.
For instance, the term “computer science” is store as it
is in that order. If a user type “science computer” then
a new entry will be create tocapturethis new terms.
Each of the terms, be it a single word or several words,
will be associated with a weight that records the
frequency that the terms have been used. The second
factor to be defined is the keyword to Webpage
popularity. After the search engine returns the search
ISSN: 2231-5381
results to the user, the user will select Web pages for
viewing. The relationships between the search
keywords and the selected Web pages will be
recorded[10].
The relationships capture the preferences of the
users. Some search engines, such as Google, currently
cannot capture the relationships. Using Google, for
example, when a user clicks on a link on the search
results, the browser directly goes to retrieve the Web
pages based on the given URL. The search engine does
not know what link has been clicked. To allow the
search engine to know what link clicked, each click
needs to be passed through the search engine. The
search keywords and the destination URL is embedded
on each link provided on the search results. When a
user clicks a link, the browser passes these data to the
search engine. The search engine records the data and
then redirects the browser to go to retrieve the
destination Web page.
The third factor to be defined is the Web page
popularity. There are several ways to define the Web
page popularity. The most obvious way is to define it
as the number of times a Web page has been selected.
When a user clicks on a link on the search results, the
Web page associated with the link is recorded. This
information can be collected when the second factor
described above is collected. This method to define
Web page popularityshouldbe accompanied by
measuring the amount of time auser spent on reading
the Web page. This information canbe collected by
determining the difference between two-time stamps
of two consecutive clicks. Whenever a user clicks on a
link, the time is recorded by the search engine. The
assumption is that the user clicks on a link, reads the
retrieved Web page, and then clicks on another link.
In here, we introduce a new way to define the
Web page popularity by counting the number of
popular keywords contained in the page. The idea is
that if a Web page contains large number of popular
keywords, then it should be considered as more
popular. All these ways of defining the Web page
popularity can be combined to from a comprehensive
one.
III. PROPOSED WORK
We are proposing an efficient mechanism of mobile search
engine to meet complete user requirements or user satisfied
http://www.ijettjournal.org
Page 19
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 1 – Oct 2014
results and retrieval of search results in optimal manner by
the approaches of mining implementation, the previous or
traditional search results based on spatial information like
geo codes based search results for user search input query,
search results can be depends on document weight of file
relevance score and it can be computed with two
parameters. TF(term frequency) and IDF (inverse
document frequency ) and Cache implementation for the
frequently accessed previous search results for specific
input query to enhance the performance and to reduce the
complexity issues from the both end points. It was proved
that a relevant number of input queries or multiple queries
weregeo or location based input keywords or queries and
they are concentrating on geo or location information, to
retrieve such input queries that emphasizes on geo or
location based information, so many number of locationbased search implementations developed for location or
spatial queries have been proposed. In our proposed
system, it supports language interoperability (i.e. any
standard language can communicate with other language)
through SOA (service oriented application) and minimizes
the chances of duplication of business logic by maintaining
it at centralized location or centralized web application
server instead of maintain the business logic or set of
operations at multiple locations. Search engine
performance can be improved by the simple cache
implementation and file relevance based rank oriented
results from files or documents.
Web service is one of technology to
createSOA (service oriented architecture) with three tier
architecture, it minimizes duplication of operations by
maintain the business logic at specific one location
(centralized server). The main goal of the service oriented
architecture is language interoperability (i.e. any standard
language can communicate with other language even
though both are different languages) and minimizes the
damage chances from client end.
Database
Business
Logic
Wsdl with Soap protocol
UI (VB.Net)
UI (Java)
UI (Android)
Fig1: Web service Architecture
Data Cache is a mechanism which increases the
performance from user end and reduces over head from
server end and stores frequently access results for future
retrieval when user requested for same input query it
reduces execution time i.e. (round trip over the input
request and response time from server during the user input
ISSN: 2231-5381
query can be minimized in terms of time complexity and
minimizes additional overhead on server to process the
same input keyword. If any user request with same input
query which is requested before, query
need not to
process by server again and no need of a round trip ,
because previous search results retrieved from the web
http://www.ijettjournal.org
Page 20
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 1 – Oct 2014
server before forwarded to user and it can be stored in data
cache ,next search onwards input query results retrieved
from cache storage instead of web server.
Initially every document is preprocessed and eliminates
inconsistent or un necessary keywords from document and
compute document weight or file relevance score with term
frequency (TF) and inverse document frequency (IDF). TF
computes the number of occurrences or frequency of a
search query or keyword in an individual file and IDF
(Inverse document frequency) computes the number of
occurrences or frequency the input search query in all files
or documents which have keyword then file relevance
score or document weight can be computed in terms of TF
and IDF.
3. Request
Web service
Data base
4. Result
2. Forward Request
Mobile User
5. Results
1. New Account
7. Search Results
Cache
8. Send Request
6. Store in cache
9. Result
Fig2: Proposed Architecture
Sequential Steps for Rank oriented results from Web service as follows
1. User makes a request with search query from Mobile
2.Request forwards to data cache and checks previous
retrieval results, if same query results available then returns
from data cache otherwise forwards request to business
logic.
3. Service or business logic retrieves rank oriented results
based on term frequency and inverse document frequency
from the data sources.
FileScore=TF*IDF
FileScore= document weight or file relevance score
TF is term frequency (number of occurrences of a keyword
in a single document)
ISSN: 2231-5381
IDF=Inverse document frequency (number of occurrences
of a keyword in all documents)
4. Search results can stored in data Cache for future
retrieval of same query
5. from cache, ranking based search results can be
forwarded to mobile when user who makes same request.
For experimental implementation we tested SOA(service
oriented architecture)
in C#.Net and Android for user
interface and generation of soap objects. Set of operations
or business logic is available in C#.net at server end.UI(
user interface) can be android , input search keyword can
be given through soap (simple object access protocol)
objects with web service description language in abstract
way of communication and calculations and retrieval can
be done at web service for file relevance based results.
http://www.ijettjournal.org
Page 21
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 1 – Oct 2014
IV. CONCLUSION
We have been concluding our current research work
with efficient file relevance based ranking oriented results
in mobile search engine through service oriented
architecture. Cache Implementation enhances the
performance by minimizing round trip time or execution
time of search query. If same query is processed by the
same user before and Our experimental result shows
efficient results than previous mechanisms.
IEEE Trans. Knowledge and Data Eng., vol. 20, no. 11, pp.
1505-1518, Nov. 2008.
[9] H. Li, Z. Li, W.-C. Lee, and D.L. Lee, “A Probabilistic
Topic-Based Ranking Framework for Location-Sensitive
Domain Information Retrieval,” Proc. Int’l ACM SIGIR
Conf. Research and Development in Information Retrieval
(SIGIR), 2009.
[10] B. Liu, W.S. Lee, P.S. Yu, and X. Li, “Partially
Supervised Classification of Text Documents,” Proc. Int’l
Conf. Machine Learning (ICML), 2002.
BIOGRAPHIES
IV. CONCLUSION
We have been concluding our current research work
with efficient file relevance based ranking oriented results
in mobile search engine through service oriented
architecture. Cache Implementation enhances the
performance by minimizing round trip time or execution
time of search query. If same query is processed by the
same user before and Our experimental result shows
efficient results than previous mechanisms
REFERENCES
[1] E. Agichtein, E. Brill, and S. Dumais, “Improving Web
SearchRanking by Incorporating User Behavior
Information,” Proc. 29thAnn.Int’l ACM SIGIR Conf.
Research and Development in Information Retrieval
(SIGIR), 2006.
[2] E. Agichtein, E. Brill, S. Dumais, and R. Ragno,
“Learning User Interaction Models for Predicting Web
Search Result Preferences,”Proc. Ann. Int’l ACM SIGIR
Conf. Research and Development in Information Retrieval
(SIGIR), 2006.
[3] Y.-Y. Chen, T. Suel, and A. Markowetz, “Efficient
Query Processing in Geographic Web Search Engines,”
Proc. Int’l ACMSIGIR Conf. Research and Development
in Information Retrieval(SIGIR), 2006.
[4] K.W. Church, W. Gale, P. Hanks, and D. Hindle,
“Using Statistics in Lexical Analysis,” Lexical Acquisition:
Exploiting On-Line Resources to Build a Lexicon,
Psychology Press, 1991.
[5] Q. Gan, J. Attenberg, A. Markowetz, and T. Suel,
“Analysis of Geographic Queries in a Search Engine Log,”
Proc.FirstInt’lWorkshop Location and the Web (LocWeb),
2008.
[6] T. Joachims, “Optimizing Search Engines Using
ClickthroughData,” Proc. ACM SIGKDD Int’l Conf.
Knowledge Discovery and Data Mining, 2002.
[7] K.W.-T. Leung, D.L. Lee, and W.-C.Lee, “Personalized
Web Search with Location Preferences,” Proc. IEEE Int’l
Conf. Data Mining (ICDE), 2010.
[8] K.W.-T. Leung, W. Ng, and D.L. Lee, “Personalized
Concept-Based Clustering of Search Engine Queries,”
ISSN: 2231-5381
Mr. P.S.SITARAMA RAJU, well known
and excellent Teacher received M.Tech
(CSE) from CENTRAL UNIVERSITY,
Hyderabad. He is working as professor
(H.O.D)
Dept
of
CSE
at
MaharajVijayaramGajapathi Raj College of
Engineering. He has 161/2 years of
industrial and teaching experience and to his
credit couple of publications both national and international
conferences/journals. His area of interest includes Object
Oriented software & languages, System Architecture System
Software.
Pilli
Srivalli
is
a
student
of
MaharajVijayaramGajapathi Raj college of
Engineering,Chintavalasa. Presently she is
pursuing M.Tech [Computer Science] from
this college and she received her M.C.A
from Godavari Institute of Engineering and
Technology, affiliated to JNTU Kakinada,
Rajahmundry in the year 2011. Her area of interest includes
Programming and DBMS all current trends techniques in
Computer science.
http://www.ijettjournal.org
Page 22
Download