COP 5725 Advanced Database Systems Spring 2016, Assignment 2

advertisement
COP 5725 Advanced Database Systems
Spring 2016, Assignment 2
Instructor: Peixiang Zhao
TA: Esra Akbas
Due date: Monday, 04/18/2016, during class
Problem 1
[10 points]
Rewrite the following relational expression πL (R(a, b, c) ./ S(b, c, d, e)) by pushing
the projection operator, π, down as far as it can go if L is:
1. [5 points] b + c → x, c + d → y;
2. [5 points] a, b, a + d → z.
Problem 2
[10 points]
Consider the following query that joins four relations A, B, C and D, i.e., A ./ B ./
C ./ D,
1. [7 points] how many different orders are there for processing A ./ B ./ C ./ D?
Note, to simplify, let’s assume that join orders are symmetric, i.e., A ./ B is
equivalent to B ./ A. For instance, we consider ((A ./ B) ./ C) ./ D and
D ./ (C ./ (B ./ A)) as the same order.
2. [3 points] how many join orders are left-deep?
Problem 3
[15 points]
Below are the statistics of four relations W , X, Y and Z. Estimate the sizes of
relations that are the results of the following expressions:
1. [5 points] W ./ X ./ Y ./ Z;
2. [5 points] σc=20 (Y );
3. [5 points] σa=1
AND b>2 (W );
COP 5725: Advanced Database Systems
W (a, b)
T (W ) = 100
V (W, a) = 20
V (W, b) = 60
Problem 4
X(b, c)
T (X) = 200
V (X, b) = 50
V (X, c) = 100
Spring 2016
Y (c, d)
T (Y ) = 300
V (Y, c) = 50
V (Y, d) = 50
Z(d, e)
T (Z) = 400
V (Z, d) = 40
V (Z, e) = 100
[20 points]
For the relations of Problem 3, give the dynamic programming table entries that
evaluates all possible join orders allowing all trees. What is the best choice for each
join order?
Problem 5
[15 points]
The Apriori algorithm makes use of the prior knowledge of subset support properties.
1. [5 points] Given a frequent itemset l and a subset s of l, prove that the confidence
of the rule s0 → (l − s0 ) cannot be more than the confidence of s → (l − s),
where s0 is a subset of s;
2. [10 points] A partitioning variation of Apriori subdivides the transactions of a
database D into n nonoverlapping partitions. Prove that any itemset that is
frequent in D must be frequent in at least one partition of D.
Problem 6
[30 points]
The Apriori algorithm uses a candidate generation and frequency counting strategy
for frequent itemset mining. Candidate itemsets of size (k + 1) are created by joining
a pair of frequent itemsets of size k. A candidate is discarded if any one of its
subsets is found to be infrequent during the candidate pruning step. Suppose the
Apriori algorithm is applied to the transaction databases, as shown in Table 1 with
minsup = 30%, i.e., any itemset occurring in less than 3 transactions is considered
to be infrequent.
1. [12 points] Draw an itemset lattice representing the transaction database in
Table 1. Label each node in the lattice with the following letters:
• N: If the itemset is not considered to be a candidate itemset by the Apriori
algorithm.
• F: If the itemset is frequent;
• I: If the candidate itemset is infrequent after support counting.
2. [2 points] What is the percentage of frequent itemsets (w.r.t. all itemsets in the
lattice)?
Assignment 2
Page 2
COP 5725: Advanced Database Systems
Spring 2016
Table 1: A Sample of Marekt Basket Transactions
Transaction ID
1
2
3
4
5
6
7
8
9
10
Items Bought
{a, b, d, e}
{b, c, d}
{a, b, d, e}
{a, c, d, e}
{b, c, d, e}
{b, d, e}
{c, d}
{a, b, c}
{a, d, e}
{b, d}
3. [2 points] What is the pruning ratio of the Apriori algorithm on this database
(Pruning ratio is defined as the percentage of itemsets not considered to be a
candidate)?
4. [2 points] What is the false alarm rate (False alarm rate is the percentage of
candidate itemsets that are found to be infrequent after performing support
counting)?
5. [12 points] Redraw the itemset lattice representing the transaction database in
Table 1. Label each node with the following letter(s):
• M: if the node is a maximal frequent itemset;
• C: if it is a closed frequent itemset;
• N: if it is frequent but neither maximal nor closed;
• I: if it is infrequent.
Assignment 2
Page 3
Download