(Online): 2347-1697

advertisement
Reviewed Paper
Volume 2
Issue 3
November 2014
International Journal of Informative & Futuristic Research
ISSN (Online): 2347-1697
QueRIE : System for Personalized Query
Recommendation
Paper ID
IJIFR/ V2/ E3/ 007
Key Words
Page No.
502- 507
Subject Area
Computer Engineering
Data Mining, Metadata, Database, Sky Server, Ad-Hoc, Cryptography, QueRIE
Narayan Bhausaheb Vikhe
1
Prof. Mrs. M.M.Naoghare2
Department of Computer Engineering,
Sir Visvesvaraya Institute of Technology,
Nashik - Maharashtra
Department of Computer Engineering,
Sir Visvesvaraya Institute of Technology,
Nashik - Maharashtra
Abstract
In this article, we have discussed about Query languages are computer
languages used to make queries into databases and information systems. The
difference is that a database QueRIE language attempts to give factual
answers to factual questions, while an information retrieval query language
attempts to find documents containing information that is relevant to an
area of inquiry. Database management systems (DBMSs) are computer
software applications that interact with the user, other applications, and the
database itself to capture and analyze data. The data are typically organized
to model aspects of reality in a way that supports processes requiring
information. Database management systems are often classified according
to the database model that they support.
1. Introduction
Database systems provide the critical infrastructure to access and analyze large volumes of data in a
variety of applications. Prominent examples include large-scale data warehouses that support
business-intelligence tools, systems for ad-hoc analytics over big data, and services for scientific-data
exploration, such as the Genome browser 1 or SkyServer 2, which allow scientists to QueRIE large
databases of scientific data over a web-enabled interface. Relational database users employ a QueRIE
interface (typically, a web-based client) to issue a series of SQL queries that aim to analyse the data
and mine it for interesting information. First-time users may not have the necessary knowledge to
www.ijifr.com
Copyright © IJIFR 2014
502
ISSN (Online): 2347-1697
International Journal of Informative & Futuristic Research (IJIFR)
Volume 2, Issue 3, November 2014
15th Edition, Page No: 502-507
know where to start their exploration. Other times, users may simply overlook queries that retrieve
important information. In this work we describe a framework to assist non-expert users by providing
personalized QueRIE recommendations.
QueRIE is built on a simple premise that is inspired by Web recommender systems: If users A
and B have posed similar queries, then the other queries of B may be of interest to user A and vice
versa. In other words, we can recommend the queries of user B in order to help user A in their
exploration of the database. A database is an organized collection of data. A general-purpose DBMS
is designed to allow the definition, creation, querying, update, and administration of databases. Wellknown DBMSs include MySQL, Oracle, PostgreSQL and so many. A database is not generally
portable across different DBMSs, but different DBMSs can interoperate by using standards such as
SQL and ODBC or JDBC to allow a single application to work with more than one DBMS for
example, modelling the availability of rooms in hotels in a way that supports finding a hotel with
vacancies. DAS in addition had the purposes of immigration control including issuing visas Such as
fragment/splattered structures, random sampling, index hash table and Advanced Cryptographic
Techniques (ACT) to avoid the information loss and corrupt the Meta data, etc
2. Justification Of Problem
The tuple -based approach constructs large (and relatively dense) summaries and, most importantly,
requires real-time calculations of the similarities between the session summary S 0 of the current user
and these of past users. The fragment-based approach clearly captures information at a coarser level
of detail, and hence it is expected to miss interesting correlations between users.
Figure1: QueRIE Tactics/ Overview
3 Review of Literature
•
Hive - A petabyte scale data warehouse using HADOOP (A. Thusoo et al.): says that
HADOOP is a popular open-source map-reduce implementation which is being used in companies
like Yahoo, Facebook etc. to store and process extremely large data sets on commodity hardware.
However, the map-reduce programming model is very low level and requires developers to write
custom programs which are hard to maintain and reuse. Hive, an open-source data warehousing
solution built on top of HADOOP. Hive supports queries expressed in a SQL-like declarative
language - HiveQL, which are compiled into map-reduce jobs that are executed using HADOOP.
•
QueRIE: A recommender system supporting interactive database exploration (S. Mittal, J. S.
V. Varman): mentioned that the demonstration presents QueRIE, a recommender system that supports
Narayan Bhausaheb Vikhe*, Prof. Mrs. M.M.Naoghare**: QueRIE - System for
Personalized Query Recommendation
503
ISSN (Online): 2347-1697
International Journal of Informative & Futuristic Research (IJIFR)
Volume 2, Issue 3, November 2014
15th Edition, Page No: 502-507
interactive database exploration. This system aims at assisting non-expert users of scientific databases
by generating personalized query recommendations. Drawing inspiration from Web recommender
systems, QueRIE tracks the querying behavior of each user and identifies potentially “interesting”
parts of the database related to the corresponding data analysis task by locating those database parts
that were accessed by similar users in the past. It then generates and recommends the queries that
cover those parts to the user.
•
Amazon.com recommendations: Item-to-item collaborative filtering (G. Linden, B. Smith, and
J. York) wrote that we can use recommendation algorithms to personalize the online store for each
customer. The store radically changes based on customer interests, showing programming titles to a
software engineer and baby toys to a new mother. There are three common approaches to solving the
recommendation problem: traditional collaborative filtering, cluster models, and search-based
methods. Here, we compare these methods with our algorithm, which we call item-to-item
collaborative filtering. Unlike traditional collaborative filtering, our algorithm's online computation
scales independently of the number of customers and number of items in the product catalog. Our
algorithm produces recommendations in real-time, scales to massive data sets, and generates high
quality recommendations.
•
Personalized queries under a generalized preference model (G. Koutrika and Y. Ioannidis)
find the solution for problem that we face regularly that we present a preference model that combines
expressivity and concision. In addition, we provide efficient algorithms for the selection of
preferences related to a QueRIE, and an algorithm for the progressive generation of personalized
results, which are ranked based on user interest. Several classes of ranking functions are provided for
this purpose. We present results of experiments both synthetic and with real users (a) demonstrating
the efficiency of our algorithms, (b) showing the benefits of QueRIE personalization, and (c)
providing insight as to the appropriateness of the proposed ranking functions.
4 Methodology
The proposed methodology includes the following steps:
1. Kullback–Leibler divergence Theorem:
ie For discrete probability distributions P and Q, the KL divergence of Q from P is defined
to be
In words, it is the expectation of the logarithmic difference between the probabilities P and
Q, where the expectation is taken using the probabilities P. The KL divergence is only defined if
P and Q both sum to 1 and if
implies
for all i (absolute continuity). If
the quantity
appears in the formula, it is interpreted as zero because
2. Agglomerative Clustering Algorithm:
Agglomerative: This is a "bottom up" approach: each observation starts in its own cluster,
and pairs of clusters are merged as one moves up the hierarchy.
Narayan Bhausaheb Vikhe*, Prof. Mrs. M.M.Naoghare**: QueRIE - System for
Personalized Query Recommendation
504
ISSN (Online): 2347-1697
International Journal of Informative & Futuristic Research (IJIFR)
Volume 2, Issue 3, November 2014
15th Edition, Page No: 502-507
Divisive: This is a "top down" approach: all observations start in one cluster, and splits are
performed recursively as one moves down the hierarchy
Figure 2: Agglomerative & Divisive System
The methodology is based on KL divergence is a special case of a broader class of divergences called
f-divergences. It was originally introduced by Solomon Kullback and Richard Leibler in 1951 as the
directed divergence between two distributions. It can be derived from a Bregman divergence.
Agglomerative Clustering Algorithm forms clusters in a bottom-up manner.
5 Understanding Database System
A database is an organized collection of data. Formally, "database" refers to the data themselves and
supporting data structures. Databases are created to operate large quantities of information by
inputting, storing, retrieving and managing that information. Databases are set up so that one set of
software programs provides all users with access to all the data
Fig. Database system
Working on different types & from different places database To ensure that the distributive
databases are up to date and current, there are two processes: replication and duplication. Replication
involves using specialized software that looks for changes in the distributive database. t may be stored
in multiple computers located in the same physical location, or may be dispersed over a network of
interconnected computers.
Narayan Bhausaheb Vikhe*, Prof. Mrs. M.M.Naoghare**: QueRIE - System for
Personalized Query Recommendation
505
ISSN (Online): 2347-1697
International Journal of Informative & Futuristic Research (IJIFR)
Volume 2, Issue 3, November 2014
15th Edition, Page No: 502-507
Figure 3: Distributed database
Figure 4. System Architecture
6 Conclusion & future Work
There are many interesting directions we would like to explore in the future. We would like to
measure the impact the QueRIE relaxation process has in the quality of recommendations. Exploring a
sequence-based approach is another interesting direction for future work, but it requires
a careful reconsideration of several aspects of our frame- work. For instance, pure sequence
information may not be sufficient to discover user similarities. Instead, we may have to consider the
relative changes between queries in the sequence, e.g., that selection predicates becomes more
selective as queries progress, in order to properly detect similarities. We also plan to focus on
relational databases that have a form-based interface. While the fragment-based approach seems as a
Narayan Bhausaheb Vikhe*, Prof. Mrs. M.M.Naoghare**: QueRIE - System for
Personalized Query Recommendation
506
ISSN (Online): 2347-1697
International Journal of Informative & Futuristic Research (IJIFR)
Volume 2, Issue 3, November 2014
15th Edition, Page No: 502-507
straightforward selection for such environments, new challenges related to the formulation of session
similarity, the synthesis of recommendations and their presentation arise. Finally, as we aim at
developing a more generic and scalable system, we are currently working on integrating alternative
techniques for generating recommendations, such as matrix factorization methods.
References
[1] A. Thusoo et al., “Hive - A petabyte scale data warehouse using hadoop,” in Proc. IEEE 26th ICDE, Long
Beach, CA, USA, Mar. 2010, pp. 996–1005.
[2] G. Chatzopoulou, M. Eirinaki, and N. Polyzotis, “Collaborative filtering for interactive database
exploration,” in Proc. 21st Int. Conf. SSDBM, New Orleans, LA, USA, 2009, pp. 3–18.
[3] S. Mittal, J. S. V. Varman, G. Chatzopoulou, M. Eirinaki, and N. Polyzotis, “QueRIE: A recommender
system supporting inter- active database exploration,” in Proc. IEEE ICDM, Sydney, NSW, Australia, 2010.
[4] J. Akbarnejad et al., “SQL QueRIE recommendations,” PVLDB, vol. 3, no. 2, pp. 1597–1600, 2010.
[5] N. Alon, Y. Matias, and M. Szegedy, “The space complexity of approximating the frequency moments,” in
Proc. 28th STOC, New York, NY, USA, 1996.
[6] E. Cohen, “Size-estimation framework with applications to tran- sitive closure and reachability,” J. Comput.
Syst. Sci., vol. 55, no. 3, pp. 441–453, 1997.
[7] G. Linden, B. Smith, and J. York, “Amazon.com recommendations: Item-to-item collaborative filtering,”
IEEE Internet Comput., vol. 7, no. 1, pp. 76–80, Jan./Feb. 2003.
[8] N. Koudas, C. Li, A. K. H. Tung, and R. Vernica, “Relaxing join and selection queries,” in Proc. 33nd Int.
Conf. VLDB, Seoul, Korea, 2006, pp. 199–210.
[9] Kullback(1959), Information Theory and Statistics, Dover Press. ISBN 0-486-69684-7.
[10] Burnham, K. P. and Anderson D. R. (2002) Model Selection and Multimodel Inference: A Practical
Information-Theoretic Approach, Second Edition (Springer Science, New York) ISBN 978-0-387-95364-9.
[11] Database Systems: The Complete Book. Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer D. Widom
[12] Jeffrey Ullman 1997: First course in database systems, Prentice–Hall Inc., Simon & Schuster, Page 1,
ISBN 0-13-861337-0.
[13] Tsitchizris, D. C. and F. H. Lochovsky (1982). Data Models. Englewood-Cliffs, Prentice–Hall.
[14] Beynon-Davies P. (2004). Database Systems 3rd Edition. Palgrave, Basingstoke, UK. ISBN 1-4039-1601-2
[15] Raul F. Chong, Michael Dang, Dwaine R. Snow, Xiaomei Wang (3 July 2008). "Introduction to DB2".
Retrieved 17 March 2013.. This article quotes a development time of 5 years involving 750 people for DB2
release 9 alone
[16] C. W. Bachmann (November 1973), The Programmer as Navigator, CACM (Turing Award Lecture 1973)
[17] "DB-Engines Ranking". January 2013. Retrieved 22 January 2013.
[18] Proctor, Seth (2013). "Exploring the Architecture of the NuoDB Database, Part 1". Retrieved 2013-07-12.
[19] "TeleCommunication Systems Signs up as a Reseller of TimesTen; Mobile Operators and Carriers Gain
Real-Time Platform for Location-Based Services". Business Wire. 2002-06-24.
[20] "database, n". OED Online. Oxford University Press. June 2013. Retrieved July 12, 2013.
[21] IBM Corporation. "IBM Information Management System (IMS) 13 Transaction and Database Servers
delivers high performance and low total cost of ownership". Retrieved Feb 20, 2014.
[22] Ken North, "Sets, Data Models and Data Independence", Dr. Dobb's, 10 March 2010
[23] www.google.com
[24]www.wikipedia.com
Narayan Bhausaheb Vikhe*, Prof. Mrs. M.M.Naoghare**: QueRIE - System for
Personalized Query Recommendation
507
Download