Efficient Retrieval of Aggregate Data in Uncertain and Probabilistic Databases Mr.V.V.Kheradkar

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 8- Feb 2014
Efficient Retrieval of Aggregate Data in
Uncertain and Probabilistic Databases
Mr.V.V.Kheradkar#1, Prof.U.L.Kulkarni*2
#
Student M.E.C.S.E. DYPCOE, Shivaji University, Kolhapur Maharashtra
Associate Professor VIT College of Engineering, Wadala, Mumbai University Maharashtra.
Abstract— Aggregation is handled in the Trio system for
uncertain and probabilistic data. Because of “exact” aggregation
in uncertain database can produce exponentially sized results. It
provides three alternatives such as, (1) Low bound on the
aggregate value (2) High bound on the value and (3) Expected
value. Above variants return a single result efficient to compute
for aggregation queries. The current system provides efficient
way of handling uncertain and probabilistic database. It
identifies uncertain and certain data. It shows the way of
handling distinct queries on uncertain database. It provides
efficient way for getting better performance in the case of
handling multiple aggregation variants, multiple queries and sub
queries. We show how system provides efficient way and get
better performance through some result analysis. It provides
formal definitions, semantics to implement aggregation on non
correlated data.
I. INTRODUCTION
TRIO is a prototype database management system
designed specifically for storing, querying data with
uncertainty and lineage developed at Stanford [1]. Trio’s
query language is extension of SQL [13]. Previous papers and
the Trio system shows work on select-project-join queries,
some set operations, some aggregate function and lineage [2],
[3], [4]. In this paper, we focus on efficient way of handling
distinct queries, multiple queries and sub queries. We provide
some identification for uncertain and certain database. The
uncertain or probabilistic data is based on possible-instances.
The result size for a query with aggregation can grow
exponentially with data size. We review the different
aggregate variants implement on number of possible instances
[1], [14], [15].
The variants for each aggregate function return a single
value over uncertain data, instead of a set of possible values.
The variants such as,
1. LOW is a function returning the lowest possible value of
the aggregate result
2. HIGH is a function returning the highest possible value
3. EXPECTED is a function returning the expected value.
Our system provides way for handling above multiple
aggregate variants in same query. The size of the interval
between the low and high values of an aggregate is useful
indicator of the degree of uncertainty in the data.
Paper proceed as follows,
In Section I, we cover primarily a quick introduction to
ULDBs [2] and TRIO along with running example [16]. In
section II, we cover need of work.
ISSN: 2231-5381
In Section III, we shows system for retrieval of data in
uncertain and probabilistic database
In section IV, we review computation of confidence value
in case of queries with joins [3]. It shows identification of
uncertain and certain database and creation of parse tree for
SQL and TRISQL statement. It shows processing of TRISQL
queries. TRISQL statement is same as TRIQL just an
extension with extra work.
In Section V, we provide way of handling distinct queries.
It shows efficient way of handling sub queries and multiple
queries.
In Section VI, we reviews different aggregate TRISQL
variants (LOW, HIGH and EXPECTED) of aggregate
functions. It provides way of handling multiple aggregate
variants in same query instead of running as separately.
In Section VII, we show some resulting graph. That graph
shows a better performance of system in case of handling
multiple aggregate variants in same query, handling sub
queries by parallel run instead of sequential run and handling
multiple queries run by cursor instead of normal run.
In Section VIII, we conclude with a discussion of
future directions.
Related work is discussed next. Section II covers
need of work.
A. RELATED WORK
The trio system is used to support the uncertain and
probabilistic databases. It contains the data whose value is not
unique. Like DBMS, trio system has Data Model (UncertaintyLineage Databases (ULDB)), Query Language (TriQL) and
System (Tri-One) [2][3][4][9].
Trio Data Model (Uncertainty-Lineage Databases (ULDB)
model) is a model for uncertain and probabilistic data [2]. An
uncertain relation is a multi set of x-tuples. Each x-tuple is
comprised of one or more alternatives. Each alternative is a
regular tuple associated with probabilistic confidence value in
[0,1]. In a single x-tuple, if ∑ is the sum of the confidence
values of all alternatives, then 0 ≤ ∑ ≤ 1. If ∑ < 1, then the
entire x-tuple may not exist. It is denoted by ø, whose
confidence is (1 -∑) [1]. An Uncertain database represents a
set of possible instances [2][3][4][9]. TriQL defines the
semantics of any relational query over a ULDB. TriQL uses
the same syntax as SQL. The query is processed using query
evaluation algorithm [2]. The Trio system is built entirely on
top of a conventional relational DBMS. ULDBs are
http://www.ijettjournal.org
Page 443
International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 8- Feb 2014
represented in relational tables. TriQL queries and commands
are rewritten automatically into SQL commands [3] [4] [9].
Aggregation in trio system consists of exhaustive
aggregation and grouped aggregation. In exhaustive
aggregation, aggregate functions applied to the entire table
refer as full-table aggregation. An aggregation query on
uncertain relation produces a result for each possible instance.
These aggregation applied to the possible instances of the
input relation has variants respectively [1] [6] [14].
Low aggregate returns one x-tuple having one alternative
that is the lowest non-NULL alternative in aggregate values.
TriQL supports five low aggregate functions such as
LCOUNT, LSUM, LAVG, LMIN, and LMAX [1].
High aggregate functions such as HCOUNT, HSUM,
HAVG, HMIN, and HMAX, returning the highest possible
aggregate value [1].
Expected aggregate functions are such as ECOUNT,
ESUM, EAVG, EMIN, and EMAX. An expected aggregate is
the weighted average of all the non-NULL alternatives in the
corresponding exhaustive result [1].
We review Trio’s Uncertainty-Lineage Database (ULDB)
data model [2] through a running example. Each tuple in a
ULDB relation consists of a set of mutually-exclusive
alternatives, with associated confidence values. Intuitively, the
tuple takes the value of one of its alternatives, and the
probability of taking a particular alternative is given by its
confidence value [10]. Consider a relation Photo (Number,
Name) that lists names of people observed in a set of photos.
Suppose Photo 1 clearly has Arm and also one other person
who isn’t very visible. This person looks most like Boby (60%
chance), but may also be Charley (30%) or Don (10%). We
represent the relation as follows; “||” separates alternatives
[16].
ID
11
12
B. Handling different types of join on uncertain databasesThe current system applied queries on a single table. For
handling multiple tables require appropriate join. Current
system implement join if they are correlated. But in case of
non correlated data it does not produce output by simple cross
join. It is difficult to make join on non-related uncertain data.
C. Handling DISTINCT aggregates on uncertain databasesThe current system applied queries on uncertain database.
But if we use distinct aggregates in query then it does not
produce appropriate result. Handling distinct aggregates are
difficult.
1)
2)
3)
4)
5)
III. SYSTEM FOR RETRIEVAL OF DATA IN UNCERTAIN AND
PROBABILISTIC DATABASE The system consists of following works,
To get better performance by doing sophisticated
translation of queries with multiple aggregations.
Computation of histograms for different aggregate quires.
Developing efficient algorithms for aggregation not only in
the presence of correlations induced by joins, but also in
the presence of arbitrary lineage.
Improving SQL clauses for handling DISTINCT
aggregates and joins on uncertain database.
The low, high, and expected values of aggregates are
useful to provide more information, such as variance and
histograms
Photo (Number, Name)
(1, Arm) :1.0
(1, Boby) :0.6 || (1, Charley) :0.3 || (1, Don) :0.1
Fig 1: Relation Photo(Number, Name)
The sum of the confidence values of all alternatives in a
tuple must be at most 1; if the sum is less than 1 (say, 0:8),
then there is a 20% chance that none of the alternatives is
present in the relation. We shall identify an alternative of a
tuple by a pair (i, j) where i is the tuple ID, and j is the index
of the alternative in tuple i.
Fig 2: System for Uncertain and Probabilistic Database
Above figure represent system design for efficient
retrieval of data in uncertain and probabilistic database.
System works as follows, it consists of four phases Translation phase, Execution phase, Post processing phase and
II. NEED OF WORK
Result phase.
In translation phase, input is TRISQL queries. This phase
A. Handling multiple aggregation and Sub queries translate TRISQL queries to SQL queries. During translation,
The current system uses multiple aggregations in a single it create parse tree and mutiple sql statements.In Execution
query but they are executed sequentially. So it takes more time. phase, If sub query then it send SQL query parallel to DMBS
Aggregation applies on each possible instance of uncertain Else it Send SQL query to DBMS. Cursors and different views
database. It produces aggregation result for each instance. It is are created. It gets result according to query. In Post
difficult to apply multiple aggregations on same instance at processing phase, it processes the data to get relevant data
time. So it applies aggregation one after another. So it takes from uncertain database. During processing it apply the
more time to produce output.
constraints on retrieved uncertain data. Get confidence query
ISSN: 2231-5381
http://www.ijettjournal.org
Page 444
International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 8- Feb 2014
if required. Get result of Aggregate functions. In result phase,
if it is stored data then stored in table. If it is transient Data
then retrieve data at runtime.
IV. PROCESS QUERY ON UNCERTAIN OR CERTAIN DATABASEUser will enter any query for execution. System first
produce parse tree for inputted SQL or TRISQL queries.
TRISQL queries are newly defined query based on SQL but
not standard SQL queries. Some of TRISQL queries are start
with where, start with having, start with histogram, start with
different aggregate variants etc. For such queries it requires
parse tree to generate valid query. System generated parse tree
for any inputted query. This parse tree is useful in the case of
efficient retrieval data in multiple queries and sub queries.
Before processing a query, uncertain relation are translated
into encoded certain relation. Consider a uncertain relation T
(A1, . . . , An). Relation T is stored in a conventional relational
table with four additional attributes: T_enc(xid, aid, conf,
certain, A1, . . . , An). Each alternative of each x-tuple in T is
stored as its own tuple in T_enc. For encoding details, see [1].
The system implementation translates TriSQL queries over
uncertain relations into SQL queries over this encoding. We
have translated different TRISQL queries into SQL. This SQL
queries are executed on encoded table.
Translated SQL query is executed. It produces result
according to query. Result table may be certain or uncertain.
System identify certain and uncertain attribute, and certain and
uncertain database also. Identification is done by rule of
uncertainty. For identification of certain and uncertain
attribute, system appends * to attribute indicates that it is
uncertain attribute and other than this it is certain attribute. For
identification of certain and uncertain table, system set color
Yellow indicate certain and Green for uncertain.
Database consist of either certain or uncertain data. Certain
data are consisting of single instance. Uncertain database is set
of possible instances. Uncertain database is uncertain about
value. This uncertainty is represented by confidence value.
Uncertain data are represented by confidence value. It is need
to compute the confidence value for each alternative while
retrieving result of each query. After processing a query, it
displays confidence value of resulting row along with query
result. Current implementation computes the confidence value
of result. If query on single table then result contain
confidence value from that table. If query on multiple table i.e.
join table then the confidence value of result tuple from join is
disjunction of i.e. product of the corresponding tuple
confidences in the base relation.
V. EFFICIENT WAY OF HANDLING DISTINCT
QUERIES, SUB QUERIES AND MULTIPLE QUERIESThis part covers efficient way of handling distinct queries,
sub queries and multiple queries
A. Efficient way of handling distinct queriesUncertain database is uncertain about value. This
uncertainty is represented by confidence value. Previous
ISSN: 2231-5381
section is represents how to compute confidence value.
Distinct queries eliminates duplicate. Duplicate eliminating
queries i.e. queries with DISTINCT produce conjunctive
lineage. Resulting tuples from distinct queries indicates either
one or second or possibly both present. The confidence value
of result tuple from distinct is conjunction of corresponding
tuple confidences in the base relation. The resulting tuple is
probability of at least one present. Conjunction represent as
Pr(A) OR Pr(B) i.e.conf(A) OR Conf(B). The confidence
value of derived tuple calculated by following formula,
Pr(A OR B ) = Pr(A) + Pr(B) – Pr(A ^ B)
From above formula, the confidence value of derived tuple
is the summation of corresponding tuple confidences in the
base relation minus product of corresponding tuple
confidences in the base relation. Same formula is applied for
all result tuple from join. Same procedure is applied for all
types of queries which contain DISTINCT keyword. If queries
on joining of multiple table and DISTINCT keyword, then
computation of confidence value on joining of multiple table
is done by using way described in section 4 of computation
confidence value and computation of confidence value on
DISTINCT queries is done by above procedure of
computation confidence value of DISTINCT queries.
B. Efficient way of handling sub queriesThis part provides the efficient way of handling sub
queries. Sub queries are executed serially one after another but
current implementation handles sub queries by parallel run.
Sub queries run by parallel run. It uses thread to run each
query of sub queries. Parallel run uses thread to allow multiple
queries of sub queries run simultaneously. Later we collect
result of each thread to main query. Main query display result
of sub query. We collect time required to run each query of
sub query by thread. So the total time required to run sub
query is,
Total_time = Time to run main query (Master) + Sum of time
required for each thread;
This formula display the total time required to run sub
queries by using parallel run. The time required to run sub
queries by without using parallel run is more than the time
required to run same sub queries by with using parallel run.
From above, sub queries are executed by using parallel run
is the efficient way of handling sub queries.
C. Efficient way of handling multiple queriesThis part provides the efficient way of handling multiple
queries simultaneously. Multiple queries are executed
simultaneously by using thread but current implementation
handles multiple queries by using cursor. Multiple queries are
executed simultaneously by using thread uses thread to run
each query of multiple queries. Same multiple queries run
simultaneously by using cursor.
Procedure is created for the purpose of run multiple
queries using cursor. In this procedure cursors are created for
each query of multiple queries. Cursors are created for storing
result of each query. Later we collect result of each cursor to
main result. Main result display result of each cursor. We
http://www.ijettjournal.org
Page 445
International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 8- Feb 2014
collect time required to run each query of multiple query by
using cursor. So the total time required to run multiple queries
simultaneously is,
Total_time = Sum of time required for each cursor.
This formula display the total time required to run
multiple queries by using cursor. The time required to run
multiple queries by without using cursor is more than the time
required to run same multiple queries by with using cursor.
From above, multiple queries are executed by using
cursor is the efficient way of handling multiple queries.
VI. HANDLING DIFFERENT AGGREGATION VARIANTS
AND MULTIPLE AGGREGATION VARIANTS IN
SINGLE QUERY –
This part covers handling different aggregate variants on
uncertain database and handling multiple aggregations in same
queries.
A. Handling different aggregate variants on uncertain
database
An uncertain relation is a multi set of x-tuples. Each xtuple is comprised of one or more alternatives. Each
alternative is a regular tuple associated with probabilistic
confidence value in [0,1], see details [2].An Uncertain
database represents a set of possible instances. The uncertain
or probabilistic data is based on possible-instances. The result
size for a query with aggregation can grow exponentially with
data size. Different aggregation results implement on number
of possible instances.
The variants for each aggregate function return a single
value over uncertain data, instead of a set of possible values.
The variants such as,
1. LOW is a function returning the lowest possible value of
the aggregate result
2. HIGH is a function returning the highest possible value
3. EXPECTED is a function returning the expected value
Uncertain relation is encoded in the regular relational table,
detail see [2] [3]. The implementation translates TriSQL
queries over uncertain relations into SQL queries over
encoding relation. We have implemented different aggregate
variants by a simple translation from TriSQL queries to
queries on the encoded table. We have implemented low
aggregate functions: LCOUNT, LSUM, LAVG, LMIN and
LMAX, High aggregate functions: HCOUNT, HSUM, HMIN
and HMAX and Expected aggregate functions: ECOUNT and
ESUM.
The following represent implementation of above
aggregate variants. For each translation, we show a
description of what the query is computing. For each
implementation, we calculate the total time required to run
aggregate variants.
LCOUNT: Query simply counts the number of certain xtuples.
HCOUNT: Query simply counts the number of (certain and
uncertain) x-tuples.
LSUM: We sum the minimum values from all of the certain
x-tuples.
ISSN: 2231-5381
LMIN: Query simply shows the least value present in the
relation.
HMIN: Query simply shows the highest value present in the
relation.
HSUM: We sum the maximum values from all of the certain
x-tuples.
ECOUNT: We get the expected-count by summing all nonconfidence values in the entire relation.
ESUM: We can compute the expected-sum by adding up all
non-alternatives, weighted by their confidence values.
LMAX: If the relation contains uncertain x-tuples only, then
the lowest MAX is the lowest value present in the relation. If
the relation contains one or more certain x-tuples, then the
lowest MAX is the largest of the minimum values from each
certain x-tuple.
LAVG: The average of the minimum values from each certain
x-tuple.
HMAX: Query simply the highest value present in the
relation.
More details procedure of aggregate variants shown in [1].
B. Handling multiple aggregate variants in single query on
uncertain database
Above part implement all aggregate variants according to
specified procedure. It produces result and required time to
run. In this part, we can handle multiple aggregate variants run
in single query instead of separate query.
Multiple aggregation variants in single query is efficiently
handled by parallel run. Separate each aggregate variant from
single query. Run separated each aggregate variants according
to specified procedure of aggregate variants described in
details [1].We collect time required to run each aggregate.
Total time to run multiple aggregations in single query is,
Total_time= Sum of time required for each aggregate variants;
So, The time required to run multiple aggregate variants in
single query by parallel run is less than the time required to
run each as separate aggregate variants. From above, multiple
aggregate variants in single query are executed by using
parallel run is the efficient way than handling same multiple
aggregate variants as a separate aggregate variants.
VII.
EXPERIMENT RESULTS
We conducted experiments for analyse the performance of
system. In this section, we analyse the better performance of
system in the case of multiple aggregate variants in same
query, sequential and parallel run of sub queries and multiple
queries run sequentially and by cursor. We report the
performance results of system on real-life data sets.
A. Multiple aggregate variants in same query
The implementation translates TriSQL queries over
uncertain relations into SQL queries over encoding. We have
implemented different aggregate variants by a simple
translation from TriSQL queries to queries on the encoded
table. We have implemented low aggregate functions:
LCOUNT, LSUM, LAVG, LMIN and LMAX, High
http://www.ijettjournal.org
Page 446
International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 8- Feb 2014
aggregate functions: HCOUNT, HSUM, HMIN and HMAX
and Expected aggregate functions: ECOUNT and ESUM.
Above all TRISQL aggregate variants are not directly
executed as same as simple SQL query of aggregate function.
These aggregate variants not run on database because they do
not have standard SQL syntax. They are TRISQL queries.
This aggregate variants run individually and also at the same
time in same query.
This part show, result analysis of time required for running
each aggregate variant separately and time required running
multiple aggregate variants in one query. Figure 3 shows
result of various aggregate variants. It shows time required to
run each aggregate variant separately in milliseconds. It shows
total sum of time required to run each aggregate variants
query separately and total time to run same Multiple
Aggregate variants in one query in milliseconds.
B. Sequential and parallel run of sub queries
The implementation translates TriSQL queries over
uncertain relations into SQL queries over encoding. We have
implemented different aggregate variants by a simple
We have implemented sequential and parallel run of sub
queries. Handling sub queries by sequential and parallel run
covered in section 5. This part shows, result analysis of
sequential and parallel run of sub queries. It shows how
parallel run is efficient for handling sub queries than
sequential run.
Figure 5 shows result of various sample sub queries. It
shows time required to run each sample sub query in
milliseconds. It shows time required to run sample sub queries
by sequential run in milliseconds and time required to run
same sample sub queries by parallel run in milliseconds.
Fig 5: Result of sub queries by sequential run and parallel run
Fig 3: Result of aggregate variants run separately and parallel in same query
Fig 6: Performance of system for sub queries by sequential and parallel run
Fig 4: Performance of system for multiple aggregate variants
In figure 4, we show Performance of system for multiple
aggregate variants. The X-axis corresponds to the different
aggregate variants function and Y-axis corresponds to time
required in millisecond. By observing Graph 1.1 we can say
that as, the total time required to run aggregate variants
separately is more than the time required to run multiple
aggregate variants in one query. Multiple aggregate variants in
same query provide the efficient way of handling multiple
aggregate variants.
ISSN: 2231-5381
In figure 6, we show Performance of system for sub
queries by sequential and parallel run. The X-axis corresponds
to different sample sub queries and Y-axis corresponds to time
required in millisecond. The graph show time required to run
sub query by sequential and parallel run. By observing graph
of figure 6, we can say that as, the total time required to run
sub queries by sequential run is more than the time required to
run same sub queries by parallel run. Sub queries are executed
by using parallel run is the efficient way of handling sub
queries.
http://www.ijettjournal.org
Page 447
International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 8- Feb 2014
C. Multiple query run separately (without cursor) and by
cursor
We have implemented multiple queries run by cursor.
Efficient way of handling multiple queries is covered in
section 5. This part show, analysis of multiple queries runs by
using cursor and without cursor. It shows how multiple
queries run by cursor is efficient for handling multiple queries
than without cursor (sequential run).
Fig 7: Result of multiple queries by without cursor and with cursor
Figure 7 shows result of various samples multiple queries.
It shows time required to run each sample multiple query in
milliseconds. It shows time required to run sample multiple
queries by using cursor in milliseconds and time required to
run same sample multiple queries by using without cursor
(sequential run) in milliseconds.
VIII.
CONCLUSIONS AND FUTURE WORK
The current system provides efficient way of handling
uncertain and probabilistic database. It identifies uncertain
and certain data. System produce parse tree for inputted SQL
or TRISQL queries. TRISQL queries are newly defined
query based on SQL but not standard SQL queries. Some of
TRISQL queries are start with where, start with having, start
with histogram, start with different aggregate variants etc.
This parse tree is useful in the case of efficient retrieval data
in multiple queries and sub queries. We show the way of
handling distinct queries on uncertain database. We provide
efficient way for getting better performance in the case of
handling multiple aggregation variants, multiple queries and
sub queries. We show how system provides efficient way and
get better performance through some result analysis. From
above result analysis we conclude that,
 Multiple aggregate variants in same query provide the
efficient way of handling multiple aggregate variants
instead of separately.
 Sub queries are executed by using parallel run is the
efficient way of handling sub queries.
 Multiple queries are executed by using cursor is the
efficient way of handling multiple queries.
We identify following are some direction to the some
future work,
 A first obvious extension to our work is a handling system
with more aggregating function on uncertain database. In
the current system, we have implemented only limited
different aggregate variants. It need of system to handle
remaining aggregate variants.
 Uncertain database is set of possible instances. Uncertain
database is uncertain about value. This uncertainty is
represented by confidence value. Uncertain database
takes more space for storing such set of possible
instances. It need to Reduce space usage required for
uncertain database.
 The current system efficiently handles multiple queries by
cursor and multiple aggregate variants by parallel run.
There is extension to this work on execute parallel
queries with distributed computing
REFERENCES
Fig 8: Performance of system for multiple queries by using without cursor
(sequential) and with cursor
In figure 8, we show Performance of system for multiple
queries by using without cursor (sequential) and with cursor.
The X-axis corresponds to different sample multiple queries
and Y-axis corresponds to time required in millisecond. The
graph show time required to run multiple queries by using
without cursor (sequential) and with cursor. By observing
graph of figure 8, we can say that as, the total time required to
run multiple queries by without cursor (sequential run) is more
than the time required to run same multiple queries by cursor.
Multiple queries are executed by using cursor is the efficient
way of handling multiple queries.
ISSN: 2231-5381
[1]
[2]
[3]
[4]
[5]
[6]
Raghotham Murthy, Robert Ikeda and Jennifer Widom, “Making
Aggregation Work in Uncertain and Probabilistic Databases”, IEEE
Transactions on knowledge and data engineering, Aug, 2011.
O. Benjelloun, A. Das Sarma, A.Y. Halevy, and J. Widom, “ULDBs:
Databases with Uncertainty and Lineage”, Proc. Int’l Conf. on Very
Large Data Bases (VLDB), 2006.
Jennifer Widom Dept. of Computer Science Stanford University, “Trio:
A System for Data, Uncertainty and Lineage”, 2009.
M. Mutsuzaki, M. Theobald, A. de Keijzer, J. Widom, P. Agrawal,O.
Benjelloun, A. Das sarma, R. Murthy, and T. Sugihara, ”Trio-One:
Layering Uncertainty and Lineage on a Conventional DBMS”, Proc.
Conf. Innovative Data Systems Research (CIDR),2007.
D. Barbara, H. Garcia-Molina, and D. Porter, “The Management of
Probabilistic Data”, IEEE Transactions on Knowledge and Data
Engineering, 1992.
S. McClean, B. Scotney, and M. Shapcott, “Aggregation of Imprecise
and Uncertain Information in Databases”, IEEE Transactions on
knowledge and data engineering, Nov./Dec.2001.
http://www.ijettjournal.org
Page 448
International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 8- Feb 2014
[7]
[8]
[9]
[10]
L. Chen and A. Dobra, “Efficient Processing of Aggregates in
Probabilistic Databases”, Technical Report REP-2008-454, Univ. Of
Florida,2008.
N.N. Dalvi and D. Suciu, “Efficient Query Evaluation on Probabilistic
Databases”, Proc. Int’l Conference on Very Large Data Bases (VLDB),
2004.
J. Widom, “Trio: A System for Integrated Management of Data,
Accuracy, and Lineage”, Proc. Conf. Innovative Data Systems
Research (CIDR), pp. 262-276, 2005.
Anish Das Sarma, Martin Theobald, and Jennifer Widom, “Exploiting
Lineage for Confidence Computation in Uncertain and Probabilistic
Databases” , 2009.
ISSN: 2231-5381
[11]
[12]
[13]
[14]
General
SQL
Parser
User
Guide
Version
1.0
and
http://www.sqlparser.com.
PL/SQL User's Guide and Reference
Trio Online Resources: TriQL Language Manual, Online Demo and
Open-Source Distribution, http://www.infolab.stanford.edu/trio, 2009.
R. Murthy and J. Widom, “Making Aggregation Work in Uncertain
and Probabilistic Databases,”
Proc. Workshop Management of
Uncertain Data at Int’l Conf. Very Large Data Bases (VLDB), pp. 7690, 2007.
http://www.ijettjournal.org
Page 449
Download