International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 8- Feb 2014 Efficient Retrieval of Aggregate Data in Uncertain and Probabilistic Databases Mr.V.V.Kheradkar#1, Prof.U.L.Kulkarni*2 # Student M.E.C.S.E. DYPCOE, Shivaji University, Kolhapur Maharashtra Associate Professor VIT College of Engineering, Wadala, Mumbai University Maharashtra. Abstract— Aggregation is handled in the Trio system for uncertain and probabilistic data. Because of “exact” aggregation in uncertain database can produce exponentially sized results. It provides three alternatives such as, (1) Low bound on the aggregate value (2) High bound on the value and (3) Expected value. Above variants return a single result efficient to compute for aggregation queries. The current system provides efficient way of handling uncertain and probabilistic database. It identifies uncertain and certain data. It shows the way of handling distinct queries on uncertain database. It provides efficient way for getting better performance in the case of handling multiple aggregation variants, multiple queries and sub queries. We show how system provides efficient way and get better performance through some result analysis. It provides formal definitions, semantics to implement aggregation on non correlated data. I. INTRODUCTION TRIO is a prototype database management system designed specifically for storing, querying data with uncertainty and lineage developed at Stanford [1]. Trio’s query language is extension of SQL [13]. Previous papers and the Trio system shows work on select-project-join queries, some set operations, some aggregate function and lineage [2], [3], [4]. In this paper, we focus on efficient way of handling distinct queries, multiple queries and sub queries. We provide some identification for uncertain and certain database. The uncertain or probabilistic data is based on possible-instances. The result size for a query with aggregation can grow exponentially with data size. We review the different aggregate variants implement on number of possible instances [1], [14], [15]. The variants for each aggregate function return a single value over uncertain data, instead of a set of possible values. The variants such as, 1. LOW is a function returning the lowest possible value of the aggregate result 2. HIGH is a function returning the highest possible value 3. EXPECTED is a function returning the expected value. Our system provides way for handling above multiple aggregate variants in same query. The size of the interval between the low and high values of an aggregate is useful indicator of the degree of uncertainty in the data. Paper proceed as follows, In Section I, we cover primarily a quick introduction to ULDBs [2] and TRIO along with running example [16]. In section II, we cover need of work. ISSN: 2231-5381 In Section III, we shows system for retrieval of data in uncertain and probabilistic database In section IV, we review computation of confidence value in case of queries with joins [3]. It shows identification of uncertain and certain database and creation of parse tree for SQL and TRISQL statement. It shows processing of TRISQL queries. TRISQL statement is same as TRIQL just an extension with extra work. In Section V, we provide way of handling distinct queries. It shows efficient way of handling sub queries and multiple queries. In Section VI, we reviews different aggregate TRISQL variants (LOW, HIGH and EXPECTED) of aggregate functions. It provides way of handling multiple aggregate variants in same query instead of running as separately. In Section VII, we show some resulting graph. That graph shows a better performance of system in case of handling multiple aggregate variants in same query, handling sub queries by parallel run instead of sequential run and handling multiple queries run by cursor instead of normal run. In Section VIII, we conclude with a discussion of future directions. Related work is discussed next. Section II covers need of work. A. RELATED WORK The trio system is used to support the uncertain and probabilistic databases. It contains the data whose value is not unique. Like DBMS, trio system has Data Model (UncertaintyLineage Databases (ULDB)), Query Language (TriQL) and System (Tri-One) [2][3][4][9]. Trio Data Model (Uncertainty-Lineage Databases (ULDB) model) is a model for uncertain and probabilistic data [2]. An uncertain relation is a multi set of x-tuples. Each x-tuple is comprised of one or more alternatives. Each alternative is a regular tuple associated with probabilistic confidence value in [0,1]. In a single x-tuple, if ∑ is the sum of the confidence values of all alternatives, then 0 ≤ ∑ ≤ 1. If ∑ < 1, then the entire x-tuple may not exist. It is denoted by ø, whose confidence is (1 -∑) [1]. An Uncertain database represents a set of possible instances [2][3][4][9]. TriQL defines the semantics of any relational query over a ULDB. TriQL uses the same syntax as SQL. The query is processed using query evaluation algorithm [2]. The Trio system is built entirely on top of a conventional relational DBMS. ULDBs are http://www.ijettjournal.org Page 443 International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 8- Feb 2014 represented in relational tables. TriQL queries and commands are rewritten automatically into SQL commands [3] [4] [9]. Aggregation in trio system consists of exhaustive aggregation and grouped aggregation. In exhaustive aggregation, aggregate functions applied to the entire table refer as full-table aggregation. An aggregation query on uncertain relation produces a result for each possible instance. These aggregation applied to the possible instances of the input relation has variants respectively [1] [6] [14]. Low aggregate returns one x-tuple having one alternative that is the lowest non-NULL alternative in aggregate values. TriQL supports five low aggregate functions such as LCOUNT, LSUM, LAVG, LMIN, and LMAX [1]. High aggregate functions such as HCOUNT, HSUM, HAVG, HMIN, and HMAX, returning the highest possible aggregate value [1]. Expected aggregate functions are such as ECOUNT, ESUM, EAVG, EMIN, and EMAX. An expected aggregate is the weighted average of all the non-NULL alternatives in the corresponding exhaustive result [1]. We review Trio’s Uncertainty-Lineage Database (ULDB) data model [2] through a running example. Each tuple in a ULDB relation consists of a set of mutually-exclusive alternatives, with associated confidence values. Intuitively, the tuple takes the value of one of its alternatives, and the probability of taking a particular alternative is given by its confidence value [10]. Consider a relation Photo (Number, Name) that lists names of people observed in a set of photos. Suppose Photo 1 clearly has Arm and also one other person who isn’t very visible. This person looks most like Boby (60% chance), but may also be Charley (30%) or Don (10%). We represent the relation as follows; “||” separates alternatives [16]. ID 11 12 B. Handling different types of join on uncertain databasesThe current system applied queries on a single table. For handling multiple tables require appropriate join. Current system implement join if they are correlated. But in case of non correlated data it does not produce output by simple cross join. It is difficult to make join on non-related uncertain data. C. Handling DISTINCT aggregates on uncertain databasesThe current system applied queries on uncertain database. But if we use distinct aggregates in query then it does not produce appropriate result. Handling distinct aggregates are difficult. 1) 2) 3) 4) 5) III. SYSTEM FOR RETRIEVAL OF DATA IN UNCERTAIN AND PROBABILISTIC DATABASE The system consists of following works, To get better performance by doing sophisticated translation of queries with multiple aggregations. Computation of histograms for different aggregate quires. Developing efficient algorithms for aggregation not only in the presence of correlations induced by joins, but also in the presence of arbitrary lineage. Improving SQL clauses for handling DISTINCT aggregates and joins on uncertain database. The low, high, and expected values of aggregates are useful to provide more information, such as variance and histograms Photo (Number, Name) (1, Arm) :1.0 (1, Boby) :0.6 || (1, Charley) :0.3 || (1, Don) :0.1 Fig 1: Relation Photo(Number, Name) The sum of the confidence values of all alternatives in a tuple must be at most 1; if the sum is less than 1 (say, 0:8), then there is a 20% chance that none of the alternatives is present in the relation. We shall identify an alternative of a tuple by a pair (i, j) where i is the tuple ID, and j is the index of the alternative in tuple i. Fig 2: System for Uncertain and Probabilistic Database Above figure represent system design for efficient retrieval of data in uncertain and probabilistic database. System works as follows, it consists of four phases Translation phase, Execution phase, Post processing phase and II. NEED OF WORK Result phase. In translation phase, input is TRISQL queries. This phase A. Handling multiple aggregation and Sub queries translate TRISQL queries to SQL queries. During translation, The current system uses multiple aggregations in a single it create parse tree and mutiple sql statements.In Execution query but they are executed sequentially. So it takes more time. phase, If sub query then it send SQL query parallel to DMBS Aggregation applies on each possible instance of uncertain Else it Send SQL query to DBMS. Cursors and different views database. It produces aggregation result for each instance. It is are created. It gets result according to query. In Post difficult to apply multiple aggregations on same instance at processing phase, it processes the data to get relevant data time. So it applies aggregation one after another. So it takes from uncertain database. During processing it apply the more time to produce output. constraints on retrieved uncertain data. Get confidence query ISSN: 2231-5381 http://www.ijettjournal.org Page 444 International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 8- Feb 2014 if required. Get result of Aggregate functions. In result phase, if it is stored data then stored in table. If it is transient Data then retrieve data at runtime. IV. PROCESS QUERY ON UNCERTAIN OR CERTAIN DATABASEUser will enter any query for execution. System first produce parse tree for inputted SQL or TRISQL queries. TRISQL queries are newly defined query based on SQL but not standard SQL queries. Some of TRISQL queries are start with where, start with having, start with histogram, start with different aggregate variants etc. For such queries it requires parse tree to generate valid query. System generated parse tree for any inputted query. This parse tree is useful in the case of efficient retrieval data in multiple queries and sub queries. Before processing a query, uncertain relation are translated into encoded certain relation. Consider a uncertain relation T (A1, . . . , An). Relation T is stored in a conventional relational table with four additional attributes: T_enc(xid, aid, conf, certain, A1, . . . , An). Each alternative of each x-tuple in T is stored as its own tuple in T_enc. For encoding details, see [1]. The system implementation translates TriSQL queries over uncertain relations into SQL queries over this encoding. We have translated different TRISQL queries into SQL. This SQL queries are executed on encoded table. Translated SQL query is executed. It produces result according to query. Result table may be certain or uncertain. System identify certain and uncertain attribute, and certain and uncertain database also. Identification is done by rule of uncertainty. For identification of certain and uncertain attribute, system appends * to attribute indicates that it is uncertain attribute and other than this it is certain attribute. For identification of certain and uncertain table, system set color Yellow indicate certain and Green for uncertain. Database consist of either certain or uncertain data. Certain data are consisting of single instance. Uncertain database is set of possible instances. Uncertain database is uncertain about value. This uncertainty is represented by confidence value. Uncertain data are represented by confidence value. It is need to compute the confidence value for each alternative while retrieving result of each query. After processing a query, it displays confidence value of resulting row along with query result. Current implementation computes the confidence value of result. If query on single table then result contain confidence value from that table. If query on multiple table i.e. join table then the confidence value of result tuple from join is disjunction of i.e. product of the corresponding tuple confidences in the base relation. V. EFFICIENT WAY OF HANDLING DISTINCT QUERIES, SUB QUERIES AND MULTIPLE QUERIESThis part covers efficient way of handling distinct queries, sub queries and multiple queries A. Efficient way of handling distinct queriesUncertain database is uncertain about value. This uncertainty is represented by confidence value. Previous ISSN: 2231-5381 section is represents how to compute confidence value. Distinct queries eliminates duplicate. Duplicate eliminating queries i.e. queries with DISTINCT produce conjunctive lineage. Resulting tuples from distinct queries indicates either one or second or possibly both present. The confidence value of result tuple from distinct is conjunction of corresponding tuple confidences in the base relation. The resulting tuple is probability of at least one present. Conjunction represent as Pr(A) OR Pr(B) i.e.conf(A) OR Conf(B). The confidence value of derived tuple calculated by following formula, Pr(A OR B ) = Pr(A) + Pr(B) – Pr(A ^ B) From above formula, the confidence value of derived tuple is the summation of corresponding tuple confidences in the base relation minus product of corresponding tuple confidences in the base relation. Same formula is applied for all result tuple from join. Same procedure is applied for all types of queries which contain DISTINCT keyword. If queries on joining of multiple table and DISTINCT keyword, then computation of confidence value on joining of multiple table is done by using way described in section 4 of computation confidence value and computation of confidence value on DISTINCT queries is done by above procedure of computation confidence value of DISTINCT queries. B. Efficient way of handling sub queriesThis part provides the efficient way of handling sub queries. Sub queries are executed serially one after another but current implementation handles sub queries by parallel run. Sub queries run by parallel run. It uses thread to run each query of sub queries. Parallel run uses thread to allow multiple queries of sub queries run simultaneously. Later we collect result of each thread to main query. Main query display result of sub query. We collect time required to run each query of sub query by thread. So the total time required to run sub query is, Total_time = Time to run main query (Master) + Sum of time required for each thread; This formula display the total time required to run sub queries by using parallel run. The time required to run sub queries by without using parallel run is more than the time required to run same sub queries by with using parallel run. From above, sub queries are executed by using parallel run is the efficient way of handling sub queries. C. Efficient way of handling multiple queriesThis part provides the efficient way of handling multiple queries simultaneously. Multiple queries are executed simultaneously by using thread but current implementation handles multiple queries by using cursor. Multiple queries are executed simultaneously by using thread uses thread to run each query of multiple queries. Same multiple queries run simultaneously by using cursor. Procedure is created for the purpose of run multiple queries using cursor. In this procedure cursors are created for each query of multiple queries. Cursors are created for storing result of each query. Later we collect result of each cursor to main result. Main result display result of each cursor. We http://www.ijettjournal.org Page 445 International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 8- Feb 2014 collect time required to run each query of multiple query by using cursor. So the total time required to run multiple queries simultaneously is, Total_time = Sum of time required for each cursor. This formula display the total time required to run multiple queries by using cursor. The time required to run multiple queries by without using cursor is more than the time required to run same multiple queries by with using cursor. From above, multiple queries are executed by using cursor is the efficient way of handling multiple queries. VI. HANDLING DIFFERENT AGGREGATION VARIANTS AND MULTIPLE AGGREGATION VARIANTS IN SINGLE QUERY – This part covers handling different aggregate variants on uncertain database and handling multiple aggregations in same queries. A. Handling different aggregate variants on uncertain database An uncertain relation is a multi set of x-tuples. Each xtuple is comprised of one or more alternatives. Each alternative is a regular tuple associated with probabilistic confidence value in [0,1], see details [2].An Uncertain database represents a set of possible instances. The uncertain or probabilistic data is based on possible-instances. The result size for a query with aggregation can grow exponentially with data size. Different aggregation results implement on number of possible instances. The variants for each aggregate function return a single value over uncertain data, instead of a set of possible values. The variants such as, 1. LOW is a function returning the lowest possible value of the aggregate result 2. HIGH is a function returning the highest possible value 3. EXPECTED is a function returning the expected value Uncertain relation is encoded in the regular relational table, detail see [2] [3]. The implementation translates TriSQL queries over uncertain relations into SQL queries over encoding relation. We have implemented different aggregate variants by a simple translation from TriSQL queries to queries on the encoded table. We have implemented low aggregate functions: LCOUNT, LSUM, LAVG, LMIN and LMAX, High aggregate functions: HCOUNT, HSUM, HMIN and HMAX and Expected aggregate functions: ECOUNT and ESUM. The following represent implementation of above aggregate variants. For each translation, we show a description of what the query is computing. For each implementation, we calculate the total time required to run aggregate variants. LCOUNT: Query simply counts the number of certain xtuples. HCOUNT: Query simply counts the number of (certain and uncertain) x-tuples. LSUM: We sum the minimum values from all of the certain x-tuples. ISSN: 2231-5381 LMIN: Query simply shows the least value present in the relation. HMIN: Query simply shows the highest value present in the relation. HSUM: We sum the maximum values from all of the certain x-tuples. ECOUNT: We get the expected-count by summing all nonconfidence values in the entire relation. ESUM: We can compute the expected-sum by adding up all non-alternatives, weighted by their confidence values. LMAX: If the relation contains uncertain x-tuples only, then the lowest MAX is the lowest value present in the relation. If the relation contains one or more certain x-tuples, then the lowest MAX is the largest of the minimum values from each certain x-tuple. LAVG: The average of the minimum values from each certain x-tuple. HMAX: Query simply the highest value present in the relation. More details procedure of aggregate variants shown in [1]. B. Handling multiple aggregate variants in single query on uncertain database Above part implement all aggregate variants according to specified procedure. It produces result and required time to run. In this part, we can handle multiple aggregate variants run in single query instead of separate query. Multiple aggregation variants in single query is efficiently handled by parallel run. Separate each aggregate variant from single query. Run separated each aggregate variants according to specified procedure of aggregate variants described in details [1].We collect time required to run each aggregate. Total time to run multiple aggregations in single query is, Total_time= Sum of time required for each aggregate variants; So, The time required to run multiple aggregate variants in single query by parallel run is less than the time required to run each as separate aggregate variants. From above, multiple aggregate variants in single query are executed by using parallel run is the efficient way than handling same multiple aggregate variants as a separate aggregate variants. VII. EXPERIMENT RESULTS We conducted experiments for analyse the performance of system. In this section, we analyse the better performance of system in the case of multiple aggregate variants in same query, sequential and parallel run of sub queries and multiple queries run sequentially and by cursor. We report the performance results of system on real-life data sets. A. Multiple aggregate variants in same query The implementation translates TriSQL queries over uncertain relations into SQL queries over encoding. We have implemented different aggregate variants by a simple translation from TriSQL queries to queries on the encoded table. We have implemented low aggregate functions: LCOUNT, LSUM, LAVG, LMIN and LMAX, High http://www.ijettjournal.org Page 446 International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 8- Feb 2014 aggregate functions: HCOUNT, HSUM, HMIN and HMAX and Expected aggregate functions: ECOUNT and ESUM. Above all TRISQL aggregate variants are not directly executed as same as simple SQL query of aggregate function. These aggregate variants not run on database because they do not have standard SQL syntax. They are TRISQL queries. This aggregate variants run individually and also at the same time in same query. This part show, result analysis of time required for running each aggregate variant separately and time required running multiple aggregate variants in one query. Figure 3 shows result of various aggregate variants. It shows time required to run each aggregate variant separately in milliseconds. It shows total sum of time required to run each aggregate variants query separately and total time to run same Multiple Aggregate variants in one query in milliseconds. B. Sequential and parallel run of sub queries The implementation translates TriSQL queries over uncertain relations into SQL queries over encoding. We have implemented different aggregate variants by a simple We have implemented sequential and parallel run of sub queries. Handling sub queries by sequential and parallel run covered in section 5. This part shows, result analysis of sequential and parallel run of sub queries. It shows how parallel run is efficient for handling sub queries than sequential run. Figure 5 shows result of various sample sub queries. It shows time required to run each sample sub query in milliseconds. It shows time required to run sample sub queries by sequential run in milliseconds and time required to run same sample sub queries by parallel run in milliseconds. Fig 5: Result of sub queries by sequential run and parallel run Fig 3: Result of aggregate variants run separately and parallel in same query Fig 6: Performance of system for sub queries by sequential and parallel run Fig 4: Performance of system for multiple aggregate variants In figure 4, we show Performance of system for multiple aggregate variants. The X-axis corresponds to the different aggregate variants function and Y-axis corresponds to time required in millisecond. By observing Graph 1.1 we can say that as, the total time required to run aggregate variants separately is more than the time required to run multiple aggregate variants in one query. Multiple aggregate variants in same query provide the efficient way of handling multiple aggregate variants. ISSN: 2231-5381 In figure 6, we show Performance of system for sub queries by sequential and parallel run. The X-axis corresponds to different sample sub queries and Y-axis corresponds to time required in millisecond. The graph show time required to run sub query by sequential and parallel run. By observing graph of figure 6, we can say that as, the total time required to run sub queries by sequential run is more than the time required to run same sub queries by parallel run. Sub queries are executed by using parallel run is the efficient way of handling sub queries. http://www.ijettjournal.org Page 447 International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 8- Feb 2014 C. Multiple query run separately (without cursor) and by cursor We have implemented multiple queries run by cursor. Efficient way of handling multiple queries is covered in section 5. This part show, analysis of multiple queries runs by using cursor and without cursor. It shows how multiple queries run by cursor is efficient for handling multiple queries than without cursor (sequential run). Fig 7: Result of multiple queries by without cursor and with cursor Figure 7 shows result of various samples multiple queries. It shows time required to run each sample multiple query in milliseconds. It shows time required to run sample multiple queries by using cursor in milliseconds and time required to run same sample multiple queries by using without cursor (sequential run) in milliseconds. VIII. CONCLUSIONS AND FUTURE WORK The current system provides efficient way of handling uncertain and probabilistic database. It identifies uncertain and certain data. System produce parse tree for inputted SQL or TRISQL queries. TRISQL queries are newly defined query based on SQL but not standard SQL queries. Some of TRISQL queries are start with where, start with having, start with histogram, start with different aggregate variants etc. This parse tree is useful in the case of efficient retrieval data in multiple queries and sub queries. We show the way of handling distinct queries on uncertain database. We provide efficient way for getting better performance in the case of handling multiple aggregation variants, multiple queries and sub queries. We show how system provides efficient way and get better performance through some result analysis. From above result analysis we conclude that, Multiple aggregate variants in same query provide the efficient way of handling multiple aggregate variants instead of separately. Sub queries are executed by using parallel run is the efficient way of handling sub queries. Multiple queries are executed by using cursor is the efficient way of handling multiple queries. We identify following are some direction to the some future work, A first obvious extension to our work is a handling system with more aggregating function on uncertain database. In the current system, we have implemented only limited different aggregate variants. It need of system to handle remaining aggregate variants. Uncertain database is set of possible instances. Uncertain database is uncertain about value. This uncertainty is represented by confidence value. Uncertain database takes more space for storing such set of possible instances. It need to Reduce space usage required for uncertain database. The current system efficiently handles multiple queries by cursor and multiple aggregate variants by parallel run. There is extension to this work on execute parallel queries with distributed computing REFERENCES Fig 8: Performance of system for multiple queries by using without cursor (sequential) and with cursor In figure 8, we show Performance of system for multiple queries by using without cursor (sequential) and with cursor. The X-axis corresponds to different sample multiple queries and Y-axis corresponds to time required in millisecond. The graph show time required to run multiple queries by using without cursor (sequential) and with cursor. By observing graph of figure 8, we can say that as, the total time required to run multiple queries by without cursor (sequential run) is more than the time required to run same multiple queries by cursor. Multiple queries are executed by using cursor is the efficient way of handling multiple queries. ISSN: 2231-5381 [1] [2] [3] [4] [5] [6] Raghotham Murthy, Robert Ikeda and Jennifer Widom, “Making Aggregation Work in Uncertain and Probabilistic Databases”, IEEE Transactions on knowledge and data engineering, Aug, 2011. O. Benjelloun, A. Das Sarma, A.Y. Halevy, and J. Widom, “ULDBs: Databases with Uncertainty and Lineage”, Proc. Int’l Conf. on Very Large Data Bases (VLDB), 2006. Jennifer Widom Dept. of Computer Science Stanford University, “Trio: A System for Data, Uncertainty and Lineage”, 2009. M. Mutsuzaki, M. Theobald, A. de Keijzer, J. Widom, P. Agrawal,O. Benjelloun, A. Das sarma, R. Murthy, and T. Sugihara, ”Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS”, Proc. Conf. Innovative Data Systems Research (CIDR),2007. D. Barbara, H. Garcia-Molina, and D. Porter, “The Management of Probabilistic Data”, IEEE Transactions on Knowledge and Data Engineering, 1992. S. McClean, B. Scotney, and M. Shapcott, “Aggregation of Imprecise and Uncertain Information in Databases”, IEEE Transactions on knowledge and data engineering, Nov./Dec.2001. http://www.ijettjournal.org Page 448 International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 8- Feb 2014 [7] [8] [9] [10] L. Chen and A. Dobra, “Efficient Processing of Aggregates in Probabilistic Databases”, Technical Report REP-2008-454, Univ. Of Florida,2008. N.N. Dalvi and D. Suciu, “Efficient Query Evaluation on Probabilistic Databases”, Proc. Int’l Conference on Very Large Data Bases (VLDB), 2004. J. Widom, “Trio: A System for Integrated Management of Data, Accuracy, and Lineage”, Proc. Conf. Innovative Data Systems Research (CIDR), pp. 262-276, 2005. Anish Das Sarma, Martin Theobald, and Jennifer Widom, “Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases” , 2009. ISSN: 2231-5381 [11] [12] [13] [14] General SQL Parser User Guide Version 1.0 and http://www.sqlparser.com. PL/SQL User's Guide and Reference Trio Online Resources: TriQL Language Manual, Online Demo and Open-Source Distribution, http://www.infolab.stanford.edu/trio, 2009. R. Murthy and J. Widom, “Making Aggregation Work in Uncertain and Probabilistic Databases,” Proc. Workshop Management of Uncertain Data at Int’l Conf. Very Large Data Bases (VLDB), pp. 7690, 2007. http://www.ijettjournal.org Page 449