Abstract

advertisement
ABSTRACT
As probabilistic data management is becoming one of the main
research focuses and keyword search is turning into a more popular
query means, it is natural to think how to support keyword queries on
probabilistic
XML
data.
With
regards
to
keyword
query
on
deterministic XML documents, ELCA (Exclusive Lowest Common
Ancestor) semantics allows more relevant fragments rooted at the
ELCAs to appear as results and is more popular compared with other
keyword query result semantics (such as SLCAs). In this paper, we
investigate how to evaluate ELCA results for keyword queries on
probabilistic XML documents. After defining probabilistic ELCA
semantics in terms of possible world semantics, we propose an
approach to compute ELCA probabilities without generating possible
worlds. Then we develop an efficient stack-based algorithm that can
find all probabilistic ELCA results and their ELCA probabilities for a
given keyword query on a probabilistic XML document. Finally, we
experimentally evaluate the proposed ELCA algorithm and compare it
with its SLCA counterpart in aspects of result effectiveness, time and
space efficiency, and scalability.
Modules:
Data storage and search:
we describe an approach based on tree-based association
rules(tars)
mined
rules,
which
provide
approximate,
intensional information on both the structure and the
contents of xml documents and can be stored in xml format
as well. There are two main approaches to xml document
access: keyword-based search and query-answering. the
idea of mining association rules to provide summarized
representations of xml documents has been investigated in
many proposals either by using languages xquery.
file organization blacks
We do not store the data in a single file because, in hadoop
and mapreduce framework, a file is the smallest unit of
input to a mapreduce job and, in the absence of caching, a
file is always read from the disk. if we have all the data in
one file, the whole file will be input to jobs for each query.
Instead, we divide the data into multiple smaller files.
User index based search:
We introduce indexes on tars to further speed up the access
to mined trees - and in general of intentional query
answering. In general, path indexes are proposed to quickly
answer queries that follow some frequent path template,
and are built by indexing only those paths having highly
frequent queries. We start from a different perspective: we
want to provide quick, and often approximate, answers also
to casual queries.
Query plan generation:
We define the query plan generation problem, and show that
generating the best (i.e., least cost) query plan for the ideal
model as well as for the practical is computationally
expensive. then, we will present a heuristic and a greedy
approach to generate an approximate solution to generate
the best plan.
Running example:
We will use the following query as a running example in this
section.
Running example
select ?v, ?x, ?y, ?z where{
?x xml : type ub : graduatestudent
?y xml : type ub : university
?z ?v ub : department
?x ub : memberof ?z
?x ub : undergraduatedegreefrom ?y }
5. Time Base Search:
Then we develop an efficient stack-based algorithm that can find all
probabilistic ELCA results and their ELCA probabilities for a given
keyword query on a probabilistic XML document. Finally, we
experimentally evaluate the proposed ELCA algorithm and compare it
with its SLCA counterpart in aspects of result effectiveness, time.
Existing System:
Semantic web technologies are being developed to present data in
standardized way such that such data can be retrieved and
understood by both human and machine. Historically, web pages are
published in plain html files which are not suitable for reasoning.
1. No user data privacy
2. Existing commercial tools and technologies do not scale well in
cloud
3. Computing settings.
Proposed System:
Integrates the functionalities proposed in our approach. Given an XML
document, it enables users to extract intensional knowledge and
compose traditional queries as well as queries over the intensional
knowledge, receiving both extensional and intensional answers. Users
formulate
XQueries
over
the
original
data,
and
queries
are
automatically translated and executed on the intensional knowledge.
Propose
an
approach
to
compute
ELCA
probabilities
without
generating possible worlds. Then we develop an efficient stack-based
algorithm that can find all probabilistic ELCA results and their ELCA
probabilities for a given keyword query on a probabilistic XML
document. Finally, we experimentally evaluate the proposed ELCA
algorithm and compare it with its SLCA counterpart in aspects of
result effectiveness, time.
ALGORITHM:
In this section, we introduce an algorithm, PrELCA, to put the
conceptual idea in the previous section into procedural computation
steps. We start with indexing probabilistic XML data, and then
introduce PrELCA algorithm, in the end, we discuss why it is reluctant
to find effective upper bounds for ELCA probabilities, and it turns out
that PrELCA algorithm may be the only acceptable solution.
SYSTEM REQUIREMENTS:
Hardware Requirements:
•
System
•
Hard Disk
•
Floppy Drive : 1.44 Mb.
: Pentium IV 2.4 GHz.
: 40 GB.
•
Monitor
: 15 VGA Colour.
•
Mouse
: Sony.
•
Ram
: 512 Mb.
Software Requirements:
•
Operating system
: Windows 7.
•
Coding Language
: ASP.Net 4.0 with C#
• Data Base
: SQL Server 2008.
Download